Alexandre Variengien

About

I'm an AI safety researcher working to create a vibrant AI safety ecosystem in Paris.

I'm a research enthusiast, developer, and graphical designer for fun. I'm always ready to wonder at the surprising complexity of reality.

My interests include AI governance, interpretability, biology, self-organized systems, and bio-inspired AI.

Resume

Experience

AI safety researcher

December 2023 - Now

Paris, France

I am working with EffiSciences to start an AI safety research hub in Paris.

Master's thesis at Conjecture

February 2023- August 2023

Conjecture, London, UK

I was working on scalable mechanistic interpretability, looking for macroscopic universal motifs inside LLMs.

Internship at Redwood Research

August 2022- February 2023

Redwood Research, Berkeley, US

Research on mechanistic interpretability of language models. I also worked as a research manager, leading a team of 10 residents during the REMIX residency program.

Research internship at the Living Technology Lab

April-July 2021

OsloMet, Oslo, Norway

Self-organizing systems engineering. The goal of the project was to design a neural cellular automata to solve a control task.

Research internship in the Mnemosyne Team

May-July 2020

IMN Bordeaux, Bordeaux, France

Comparison and visualisation of recurrent neural networks solving a language processing task.

Education

Master of Computer Science (second year)

2021 - 2022

EPFL, Lausanne, Switzerland

Double degree program between ENS de Lyon and EPFL. Courses focused on machine learning and data science.

Master of Computer Science (first year)

2020 - 2021

ENS de Lyon, Lyon, France

Optional courses from the biology department (neurology and evolution) and in physics (dynamical systems).

Bachelor of Computer Science

2019 - 2020

ENS de Lyon, Lyon, France

Courses about theoretical computer science. Graduated with 18.13/20

Preparatory classes MPSI/MP*

2017 - 2019

Lycée Champollion, Grenoble, France

Two years of intensive courses in mathematics, physics, and computer science to prepare for engineering school competitive exams.

Publications

BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards

Diego Dorn, Alexandre Variengien, Charbel-Raphaël Segerie, Vincent Corruble

2024

This paper proposes a framework to evaluate the generalization capabilities of LLM input-output safeguards like Llama Guard in detecting unknown failure modes. We presented this work at an oral of the NextGen AI Safety Workshop at ICML 2024.

Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models

Alexandre Variengien, Eric Winsor

2023

In the search for "units of interpretability", I decided to zoom out instead of zooming in, looking for universal macroscopic motifs in LLM. In other words, are there such things as "organs" inside LLMs? This work suggests that the answer is yes! Preprint available on ArXiv. This work was part of my Master's thesis, available here. This work received a spotlight at the Mechanistic Interpretability Workshop at ICML 2024.

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

Michael Hanna, Ollie Liu, Alexandre Variengien

2023

Paper accepted at NeurIPS 2023. I supervised this research project during the REMIX residency program.

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Kevin Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt

2022

Accepeted as a poster at the ICLR 2023 conference. I recommend reading the ArXiv version for an up-to-date version of this work.

Towards self-organized control: Using neural cellular automata to robustly control a cart-pole agent

Alexandre Variengien, Stefano Nichele, Tom Glover, Sidney Pontes-Filho

2021

This paper was published in the IMI journal. Inspired by the Distill thread on differentiable self-organizing systems, I also developed an interactive article.

Recurrent Neural Networks Models for Developmental Language Acquisition: Reservoirs Outperform LSTMs

Xavier Hinaut, Alexandre Variengien

2020

Poster accepted to the 12th Annual Meeting of the Society for the Neurobiology of Language.

A Journey in ESN and LSTM Visualisations on a Language Task

Alexandre Variengien, Xavier Hinaut

2020

Paper available as an arXiv preprint. We compared two architectures of recurrent neural networks on a language task. We also presented a new tool to visually grasp the inner representation of the sentences learned by the models.

Projects

AI safety distillation contest

May 2022

As part of EA UC Berkeley's contest, I wrote a distillation of the ELK report to make its core ideas easier to understand. My submission was awarded a prize.

SACCHA

2020-2021

Interdisciplinary project in collaboration with VetAgro Sup. The goal is to create an educational simulator for dog auscultation to be used by veterinary students. I'm involved in hardware and UI design.

Hackaton Hack COVID19

April 2020

Creation of an online interactive epidemiological model to evaluate the impact of social tracing applications on the propagation of SARS-CoV-2.