Neurosymbolic Reasoning and Constraint Satisfaction
This group highlights work at the intersection of artificial intelligence and logic.
Master's Thesis: Distilling Neurosymbolic Reasoning for Linear Algebra in Small Language Models
The main drawback of using generative AI models for advanced mathematics is that they are not deterministic logical reasoning engines. Because neural models are probabilistic, they can produce outputs that look convincing while still containing subtle errors that are hard to detect without careful algorithmic verification. Symbolic methods can largely address this issue by performing exact, deterministic calculations through code execution. However, solving a full problem typically requires an explicit plan: the correct sequence of tool invocations and the dependencies between intermediate results must be specified, and this often requires human intervention. Our premise is that this planning burden can be reduced by using neural models to orchestrate tool use, while symbolic solvers provide the exact computation.
In this thesis, we demonstrate an end-to-end workflow that combines neural models with symbolic solvers to solve linear algebra problems through tool-use interactions, in a controlled, verifiable setting with a small, audited tool library. Our results show that, starting from a small pre-trained base model (Qwen2.5-3B), it is possible to achieve 90% test-set accuracy (verifier-checked on a fixed held-out evaluation set) on problem traces requiring up to three tool interactions.
The pipeline includes synthetic dataset generation, distillation, supervised fine-tuning (SFT), and reinforcement learning via Group Sequence Policy Optimization (GSPO). Using parameter-efficient fine-tuning (LoRA) and on-demand cloud GPUs, the full pipeline is reproducible within a $75 budget. This provides a concrete recipe for practitioners to train self-hostable tool-using models, and a pedagogical blueprint for students learning to build tool-calling agents beyond prompt engineering.
Bachelor's Thesis: Solving Sudoku Using Propositional Logic
This project replicates algorithms and techniques used to visualize the structure of difficult Sudoku puzzles. The approach involves implementing an efficient CDCL SAT solver, incorporating well-known procedures such as Backjumping and Conflict-Driven Clause Learning, alongside two decision heuristics: Variable State Independent Decaying Sum (VSIDS) and Largest Individual Sum (LIS).
Additionally, we encode the Sudoku puzzle into SAT format and implement dynamic visualization of the resolution process. This provides fine-grained insights into the solver’s state, the sequence of logical decisions, and the propagation and conflict resolution mechanisms as they occur.
Finally, we perform an ablation study to assess the impact of the two decision heuristics (VSIDS vs. LIS) on solver performance. We also profile the application’s performance using flame graphs, comparing two separate SAT encodings: minimal and extended.
Deep Learning and training pipelines
This group focuses on deep learning including training pipelines and low-level programming.
Distributed and Parallel Techniques for Deep Neural Networks
This paper presents a systematic literature review of distributed and parallel techniques for running deep neural networks on multiple machines and GPUs. It is composed of three parts: 1) a review of the available libraries that enable distributed training across GPU clusters, 2) a review of the most popular frameworks that facilitate parallelizing the training process on GPUs, and 3) a practical section that demonstrates a proof-of-concept implementation of the training process using common frameworks, namely PyTorch DDP and cuDNN.
The review synthesizes research from the past decade, examining various approaches to distributed training, their effectiveness, and implementation challenges. The work is aimed at students and practitioners, with the goal to provide an introduction to the topic and help frame a general idea of the most common libraries in each domain.
The distributed experiments use data parallelism to accelerate the training process, while the GPU experiments use cuDNN, cuBLAS and manual kernel implementations to train a small network. The effectiveness of each approach is demonstrated and to aid reproduction and experimentation Docker environments are provided. This allows to simulate a multi-GPU setup on a single Nvidia GPU, promoting ease of use by not relying on cloud services.
Visual QA: Using Generative Models on Classification Tasks
A lot of attention has been given in recent years to the development of complex architectures designed to integrate multimodal capabilities. While extending existing infrastructures with more and more intricate and complex modules, some of the more common use-cases may not be immediately applicable. While popular libraries like Transformers offer rich APIs, they often lack direct support for common classification tasks when using multimodal models like BLIP-2. To address this limitation, I adapt the BLIP-2 architecture, which was originally designed for question-answering, to perform classification tasks within the Transformers library.
The algorithm is evaluated using two datasets: Easy-VQA and Daquar. Easy-VQA contains simple questions about geometric shapes, while Daquar is more challenging, requiring answers to questions about objects in indoor scenes. The model achieves 91% accuracy on Easy-VQA and 78% accuracy on Daquar, outperforming generative baselines on both benchmarks. UMAP visualization of the learned features confirms that the model is effectively capturing semantic distinctions, particularly for the simpler geometric cases.
Agents, vector stores and tool use
This category covers broad tool-using pipelines, where agents interact with external environments.
Git Inspector: Querying GitHub Repositories with Local LLMs
Navigating large, unfamiliar codebases can be a significant challenge for developers, often leading to a steep learning curve and inefficient debugging processes. However, recent advances in machine learning offer promising ways of addressing this problem. Building on these advances, this work presents a Retrieval Augmented Generation (RAG) pipeline designed to facilitate code retrieval and understanding within GitHub repositories. The approach empowers LLMs by combining non-parametric memory (retrieved code snippets) with parametric memory (pre-trained LLM weights) to generate insightful, context-aware answers.
The project emphasizes the engineering process, adhering to the agile methodology and documenting the development process. Notably, the system utilizes open-source technologies such as Ollama and Qdrant, enabling the utilization of various open LLMs through local indexing and retrieval of code snippets without reliance on proprietary services.
The work aims to reduce the steep learning curve associated with understanding large codebases, and provide insightful explanations for complex coding concepts. The library is built in Scala, on top of the Langchain4j framework, and facilitates integration with the LLM through interfaces built with Gradio and Scala.js. A usability study validated the interface, achieving an average SUS score of 85%.
Librarian Assistant
Large Language Models (LLMs) have gained significant popularity in recent years due to their remarkable question answering capabilities. However, when tackling a large corpus of text, the quality of the answers varies, largely due to the model’s inability to focus on contextualized information. This may lead to less accurate answers, poor handling of long-tail questions and exposure bias to the data it was pre-trained on. I present a creative approach to tackle these challenges by employing data-agents powered through LLMs.
These agents employ complex workflows to intelligently perform operations over the knowledge base. These operations can be characterized as follows: 1) decompose the task into a series of function calls (thoughts), 2) employ multiple fetch operations over the knowledge base to retrieve relevant information (actions), 3) summarize at each step the extracted information to facilitate the final aggregation (observations) and 4) synthesize a final answer by combining the results. The project supports the adoption of Open LLMs, making the library usable freely without the financial burden of using proprietary providers.
QuestLlama: An Autonomous Agent in Minecraft
This project extends QuestLlama, a Voyager-based autonomous agent capable of completing complex in-game tasks in Minecraft through retrieval-augmented generation and code execution. The system leverages open-source Large Language Models (LLMs) to generate and execute Python code, allowing for dynamic interaction with the game environment. A key contribution is the integration of local-model backends via Ollama and OpenAI-compatible APIs, enabling experimentation and deployment without reliance on proprietary cloud providers. This approach demonstrates the feasibility of using lightweight, locally-hosted models for autonomous agent tasks that traditionally require heavy, closed-source infrastructure.
Developer tools
This section illustrates software engineering work tailored at improving developer workflows.
Docker UI
This project presents a web-based interface for the management and orchestration of Docker Compose environments. The system interacts with the Docker daemon via HTTP API to enable container lifecycle management, log inspection, and environment configuration directly through a browser. Its purpose is to reduce the steep learning curve associated with container management, where a web interface abstracts command-line operations into visual controls. Usability and interface design were refined through iterative prototyping in Figma and validated via user feedback questionnaires.