open positions | Andrea Mauri

I am always looking for motivated Master students, interns, and PhD candidates to work at the intersection of Human-Computer Interaction and Data Management. All internship positions last approximately 6 months and include payment.

For any of the topics below, send me an email with your CV and a short description of your interest.

Open Thesis / Internship Topics

Human-guided Labeling for High-Quality Data Collection

High-quality labeled datasets are a prerequisite for reliable machine learning models. This project develops human-guided labeling methodologies that integrate domain expert knowledge with automated suggestions, improving both efficiency and data quality.

Keywords: crowdsourcing, active learning, data quality, human-in-the-loop

Brain Local-Fields Potential Time Series Clustering and Sub-Sequence Matching for Behavior Characterization

In collaboration with the Stem Cell & Brain Research Institute (SBRI) at Lyon 1, this project applies time series clustering and sub-sequence matching techniques to neural activity signals, with the goal of characterizing behavioral patterns from LFP recordings.

Keywords: time series analysis, neuroscience, clustering, signal processing

Generative AI for Graph Data Repair

Graph databases frequently contain inconsistencies that violate domain constraints. This project explores how generative AI models (LLMs, diffusion models) can assist in proposing, explaining, and applying repairs to property graphs.

Keywords: graph databases, LLMs, data repair, generative AI

Property Graph Encoding for Large Language Models

LLMs operate on text, but graph data has rich structure. This project investigates how to encode property graphs — nodes, edges, properties, and constraints — into representations that LLMs can reason over effectively, and evaluates the quality of graph-aware reasoning.

Keywords: graph databases, LLMs, representation learning, natural language processing

Semi-Automatic Generation of Dataset Descriptors from Scientific Publications

Research datasets are often poorly documented. This project develops methods to automatically extract and generate structured dataset descriptors (schema, provenance, content summary) from the scientific papers that describe them.

Keywords: information extraction, NLP, metadata, scientific literature

PhD Opportunities

If you are interested in a PhD on topics related to human-centric data management, graph databases, or empathy-centric design, contact me directly. Funded positions may be available through institutional calls (ANR, Horizon Europe) — check back or get in touch.