About
I'm an AI safety researcher working to create a vibrant AI safety ecosystem in Paris.
I'm a research enthusiast, developer, and graphical designer for fun. I'm always ready to wonder at the
surprising complexity of reality.
My interests include AI governance, interpretability, biology, self-organized systems, and bio-inspired AI.
Resume
Experience
AI safety researcher
December 2023 - Now
Paris, France
I am working with EffiSciences to start an AI safety research hub in Paris.
Master's thesis at Conjecture
February 2023- August 2023
Conjecture, London, UK
I was working on scalable mechanistic interpretability, looking for macroscopic universal motifs inside LLMs.
Internship at Redwood Research
August 2022- February 2023
Redwood Research, Berkeley, US
Research on mechanistic interpretability of language models. I also worked as a research manager, leading a team of 10 residents during the REMIX residency program.
Research internship at the Living Technology Lab
April-July 2021
OsloMet, Oslo, Norway
Self-organizing systems engineering. The goal of the project was to design a neural cellular automata to solve a control task.
Research internship in the Mnemosyne Team
May-July 2020
IMN Bordeaux, Bordeaux, France
Comparison and visualisation of recurrent neural networks solving a language processing task.
Education
Master of Computer Science (second year)
2021 - 2022
EPFL, Lausanne, Switzerland
Double degree program between ENS de Lyon and EPFL. Courses focused on machine learning and data science.
Master of Computer Science (first year)
2020 - 2021
ENS de Lyon, Lyon, France
Optional courses from the biology department (neurology and evolution) and in physics (dynamical systems).
Bachelor of Computer Science
2019 - 2020
ENS de Lyon, Lyon, France
Courses about theoretical computer science. Graduated with 18.13/20
Preparatory classes MPSI/MP*
2017 - 2019
Lycée Champollion, Grenoble, France
Two years of intensive courses in mathematics, physics, and computer science to prepare for engineering school competitive exams.
Publications
BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards
Diego Dorn, Alexandre Variengien, Charbel-Raphaël Segerie, Vincent Corruble
2024
This paper proposes a framework to evaluate the generalization capabilities of LLM input-output safeguards like Llama Guard in detecting unknown failure modes. We presented this work at an oral of the NextGen AI Safety Workshop at ICML 2024.
Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models
Alexandre Variengien, Eric Winsor
2023
In the search for "units of interpretability", I decided to zoom out instead of zooming in, looking for universal macroscopic motifs in LLM. In other words, are there such things as "organs" inside LLMs? This work suggests that the answer is yes! Preprint available on ArXiv. This work was part of my Master's thesis, available here. This work received a spotlight at the Mechanistic Interpretability Workshop at ICML 2024.
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna, Ollie Liu, Alexandre Variengien
2023
Paper accepted at NeurIPS 2023. I supervised this research project during the REMIX residency program.
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt
2022
Accepeted as a poster at the ICLR 2023 conference. I recommend reading the ArXiv version for an up-to-date version of this work.
Towards self-organized control: Using neural cellular automata to robustly control a cart-pole agent
Alexandre Variengien, Stefano Nichele, Tom Glover, Sidney Pontes-Filho
2021
This paper was published in the IMI journal. Inspired by the Distill thread on differentiable self-organizing systems, I also developed an interactive article.
Recurrent Neural Networks Models for Developmental Language Acquisition: Reservoirs Outperform LSTMs
Xavier Hinaut, Alexandre Variengien
2020
Poster accepted to the 12th Annual Meeting of the Society for the Neurobiology of Language.
A Journey in ESN and LSTM Visualisations on a Language Task
Alexandre Variengien, Xavier Hinaut
2020
Paper available as an arXiv preprint. We compared two architectures of recurrent neural networks on a language task. We also presented a new tool to visually grasp the inner representation of the sentences learned by the models.
Projects
AI safety distillation contest
May 2022
As part of EA UC Berkeley's contest, I wrote a distillation of the ELK report to make its core ideas easier to understand. My submission was awarded a prize.
SACCHA
2020-2021
Interdisciplinary project in collaboration with VetAgro Sup. The goal is to create an educational simulator for dog auscultation to be used by veterinary students. I'm involved in hardware and UI design.
Hackaton Hack COVID19
April 2020
Creation of an online interactive epidemiological model to evaluate the impact of social tracing applications on the propagation of SARS-CoV-2.