I joined the Learning And Signal Processing (LASP) group within the Communication and Information Systems group at UCL as a Phd student in September 2017. Prior to joining UCL, I attended the Swiss Institute of Technology in Lausanne (EPFL) where I earned my BSc and MSc degrees in Communication Systems. I was a visiting student in the Computer Science department at ETH Zurich, Switzerland from September 2016 to August 2017 and between June and September 2016 I completed an internship within the Machine Learning group at Data61 CSIRO in Canberra, Australia.
I have always had a strong fascination for learning systems (both biological and artificial), and I am very grateful to be working on cool projects in these areas with the LASP group!
The focus of my current research is reinforcement learning. I am more specifically interested advancing data efficient reinforcement learning, through learning rich and compact representations of states or Markov Decision Processes.
Graph-based Reinforcement Learning In large environment, learning the optimal behaviour is a data and compute-expensive process, often intractable. As a consequence, we often resort to function approximation. A major open problem in function approximation for reinforcement learning problems is state representations. In this project, we learn state features hat embed the geometry of the graph of states. These features lead to improved performances over state-of-the art linear value function approximation. We note that this improvements come from the fact that the features capture the structural equivalence of the states while preserving the local properties of the graph.
Representation Learning on Graphs: A Reinforcement Learning Application, S. Madjiheurem and L. Toni, Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019). PMLR: Volume 89
Learning general representations for meta reinforcement learning A major challenge in reinforcement learning (RL) is the design of agents that are able to generalize across tasks sharing common dynamics. A viable solution is to learn state representations that encode prior information from a set of tasks, and use them to generalize the value function approximation. This has been proposed in the literature as successor representation approximators. While promising, these methods do not generalize well across optimal policies, leading to sampling-inefficiency when learning a new task. In this paper, we propose state2vec, an efficient and low-complexity framework for learning successor features which (i) generalize across policies, (ii) ensure sample-efficiency during meta-test. We extend the well known node2vec framework to learn state embeddings that account for the discounted future state transitions in RL. The proposed off-policy state2vec captures the geometry of the underlying state space, making good basis functions for linear value function approximation.
[State2vec: Off-policy successor features approximators](Off-policy successor features approximators), S. Madjiheurem and L. Toni, arXiv preprint arXiv:1910.10277, 2019.
My other favourite ways of keeping busy are through indoors and outdoors sports, dancing, cooking (and eating), and podcasting!