<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Sara Candussio</title><link>https://gaoithee.github.io/saracandussio.github.io/</link><atom:link href="https://gaoithee.github.io/saracandussio.github.io/index.xml" rel="self" type="application/rss+xml"/><description>Sara Candussio</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Mon, 24 Oct 2022 00:00:00 +0000</lastBuildDate><image><url>https://gaoithee.github.io/saracandussio.github.io/media/icon_hu7729264130191091259.png</url><title>Sara Candussio</title><link>https://gaoithee.github.io/saracandussio.github.io/</link></image><item><title>Reading Between the Tokens: Uncovering the Semantic Minima of AI Monologues</title><link>https://gaoithee.github.io/saracandussio.github.io/event/clcg-linguistics-lunch/</link><pubDate>Fri, 24 Apr 2026 12:05:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/event/clcg-linguistics-lunch/</guid><description>&lt;p>Invited talk at the &lt;a href="https://www.rug.nl/research/clcg/colloquia_discussiongroups_linguisticevents/linguistics_lunch/" target="_blank" rel="noopener">CLCG Linguistics Lunch&lt;/a> at the University of Groningen.&lt;/p>
&lt;p>Chain-of-Thought prompting asks models to reason step by step — but most of what they write is filler. This talk presents work on identifying the semantic minima of AI reasoning: the tiny subset of tokens that actually carry predictive weight, detectable in real time from the model&amp;rsquo;s internal states. Erasing up to 95% of the output leaves a sparse set of words that still perfectly predicts the correct answer.&lt;/p></description></item><item><title>Distilling Formal Logic into Neural Spaces</title><link>https://gaoithee.github.io/saracandussio.github.io/project/stlenc/</link><pubDate>Thu, 05 Mar 2026 00:00:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/project/stlenc/</guid><description>&lt;p>Continuous neural representations of STL specifications via kernel distillation. Accepted at &lt;strong>NeSy 2026&lt;/strong>.&lt;/p></description></item><item><title>Distilling Formal Logic into Neural Spaces: A Kernel Alignment Approach for Signal Temporal Logic</title><link>https://gaoithee.github.io/saracandussio.github.io/publication/distilling-formal-logic/</link><pubDate>Thu, 05 Mar 2026 00:00:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/publication/distilling-formal-logic/</guid><description>&lt;p>Accepted at &lt;strong>NeSy 2026&lt;/strong> as an oral presentation 🎉&lt;/p>
&lt;h2 id="the-idea">The Idea&lt;/h2>
&lt;p>Typical knowledge distillation compresses a big neural model into a smaller one. We do something different: our &amp;ldquo;expert&amp;rdquo; isn&amp;rsquo;t a neural network at all — it&amp;rsquo;s a &lt;strong>mathematical kernel built on top of formal logic&lt;/strong>.&lt;/p>
&lt;p>This kernel is provably correct: it captures the true meaning of logical formulas with mathematical guarantees. The problem? It&amp;rsquo;s expensive to compute and doesn&amp;rsquo;t scale.&lt;/p>
&lt;p>So instead of distilling a big neural model into a smaller neural model, we &lt;strong>distill the geometric structure&lt;/strong> (i.e. relative positions) of a symbolic kernel into a Transformer encoder. We&amp;rsquo;re not compressing parameters — we&amp;rsquo;re transferring mathematical meaning into neural space.&lt;/p>
&lt;p>Using a teacher-student setup with a kernel-weighted geometric alignment objective, we train the encoder to mirror the semantic distances defined by the symbolic kernel. Errors are penalized proportionally to their semantic discrepancy — not just their magnitude.&lt;/p>
&lt;h2 id="why-it-matters">Why It Matters&lt;/h2>
&lt;p>The result is a model that:&lt;/p>
&lt;ul>
&lt;li>runs in a &lt;strong>single forward pass&lt;/strong>&lt;/li>
&lt;li>produces &lt;strong>semantically faithful embeddings&lt;/strong> of logical formulas&lt;/li>
&lt;li>can &lt;strong>reconstruct the original formula&lt;/strong> from its embedding&lt;/li>
&lt;li>all without sacrificing the logical guarantees of the original kernel&lt;/li>
&lt;/ul>
&lt;p>This opens up scalable, trustworthy neuro-symbolic reasoning for domains where correctness and efficiency aren&amp;rsquo;t optional:&lt;/p>
&lt;ul>
&lt;li>🚗 Autonomous driving &amp;amp; robotics (real-time monitoring of safety specs)&lt;/li>
&lt;li>🏥 Healthcare (checking patient signals against clinical guidelines)&lt;/li>
&lt;li>⚙️ Cyber-physical systems (fast verification &amp;amp; control synthesis)&lt;/li>
&lt;/ul>
&lt;p>And beyond STL, this approach generalizes to &lt;strong>any domain with a meaningful but expensive similarity function&lt;/strong> — think genomic sequences, molecular graphs, or structured data where semantics matter more than surface form.&lt;/p>
&lt;p>Co-authored with &lt;a href="https://gsarti.com/" target="_blank" rel="noopener">Gabriele Sarti&lt;/a>, &lt;a href="https://scholar.google.com/citations?user=Gaia_Saveri" target="_blank" rel="noopener">Gaia Saveri&lt;/a>, and &lt;a href="https://ai-lab.units.it/?page_id=139" target="_blank" rel="noopener">Luca Bortolussi&lt;/a>.&lt;/p>
&lt;p>If you want to explore how to transfer a slow, expensive similarity function into Transformers — let&amp;rsquo;s connect!&lt;/p></description></item><item><title>A Dialectic Pipeline for Improving LLM Robustness</title><link>https://gaoithee.github.io/saracandussio.github.io/publication/dialectic-pipeline/</link><pubDate>Wed, 28 Jan 2026 00:00:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/publication/dialectic-pipeline/</guid><description>&lt;p>Can LLMs improve their accuracy without further training, just through a dialectic way of questioning themselves — as Hegel suggested?&lt;/p>
&lt;p>This was the core question behind my Master&amp;rsquo;s thesis. The short answer: &lt;strong>yes, and by a lot&lt;/strong>.&lt;/p>
&lt;h2 id="the-idea">The Idea&lt;/h2>
&lt;p>Inspired by Hegelian dialectics, the pipeline structures reasoning into three stages:&lt;/p>
&lt;figure>&lt;img src="https://gaoithee.github.io/saracandussio.github.io/saracandussio.github.io/publication/dialectic-pipeline/dialectic.png"
alt="The thesis–antithesis–synthesis pipeline.">&lt;figcaption>
&lt;p>The thesis–antithesis–synthesis pipeline.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;ol>
&lt;li>&lt;strong>Thesis&lt;/strong> — the model produces an initial answer given the question, context, and options.&lt;/li>
&lt;li>&lt;strong>Antithesis&lt;/strong> — the model challenges its own answer, now also seeing the thesis.&lt;/li>
&lt;li>&lt;strong>Synthesis&lt;/strong> — the model produces a final answer, having seen both thesis and antithesis.&lt;/li>
&lt;/ol>
&lt;p>No fine-tuning. No domain-specific verifiers. Just structured self-dialogue.&lt;/p>
&lt;h2 id="results">Results&lt;/h2>
&lt;p>The pipeline was tested on multi-hop QA benchmarks (HotpotQA, WikiHop) across five open-source models under 20B parameters (Phi-mini, Phi-medium, Gemma-2B, Gemma-9B, LLaMA-8B).&lt;/p>
&lt;figure>&lt;img src="https://gaoithee.github.io/saracandussio.github.io/saracandussio.github.io/publication/dialectic-pipeline/results-table.png"
alt="Accuracy improvements across models on HotpotQA.">&lt;figcaption>
&lt;p>Accuracy improvements across models on HotpotQA.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;figure>&lt;img src="https://gaoithee.github.io/saracandussio.github.io/saracandussio.github.io/publication/dialectic-pipeline/results-donut.png"
alt="From 53.4% to 80.7% on HotpotQA with Phi-mini (&amp;#43;27.3%).">&lt;figcaption>
&lt;p>From 53.4% to 80.7% on HotpotQA with Phi-mini (+27.3%).&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;p>Improvements of &lt;strong>up to 30%&lt;/strong> on complex multi-hop questions — beating standard Chain-of-Thought prompting.&lt;/p>
&lt;figure>&lt;img src="https://gaoithee.github.io/saracandussio.github.io/saracandussio.github.io/publication/dialectic-pipeline/results-wikihop.png"
alt="CoT vs. pipeline on WikiHop across all models.">&lt;figcaption>
&lt;p>CoT vs. pipeline on WikiHop across all models.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;h2 id="key-takeaways">Key Takeaways&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Self-debating is the main driver&lt;/strong>: letting models reflect on and contrast their own reasoning significantly boosts performance, especially as question complexity increases.&lt;/li>
&lt;li>&lt;strong>Instruction following matters&lt;/strong>: models that strictly follow instructions (Llama, Phi) benefit more than those that get &amp;ldquo;too creative&amp;rdquo; (Gemma-2).&lt;/li>
&lt;li>&lt;strong>Smart filtering &amp;gt; summarization&lt;/strong>: when dealing with long contexts, filtering for relevant information beats summarization, which can hurt deductive reasoning.&lt;/li>
&lt;li>&lt;strong>Avoid overthinking&lt;/strong>: for simpler tasks, too much deliberation can introduce errors. A touch of &amp;ldquo;impulsivity&amp;rdquo; sometimes helps.&lt;/li>
&lt;/ul>
&lt;figure>&lt;img src="https://gaoithee.github.io/saracandussio.github.io/saracandussio.github.io/publication/dialectic-pipeline/results-context.png"
alt="Original vs. summarized vs. filtered context on WikiHop.">&lt;figcaption>
&lt;p>Original vs. summarized vs. filtered context on WikiHop.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;p>This work also received an &lt;strong>Honorable Mention at the Emanuele Pianta Award (AILC)&lt;/strong> for the best Italian NLP Master&amp;rsquo;s thesis at CLiC-it 2025. 🏆&lt;/p>
&lt;figure>&lt;img src="https://gaoithee.github.io/saracandussio.github.io/saracandussio.github.io/publication/dialectic-pipeline/clicit.png"
alt="CLiC-it 2025, Cagliari.">&lt;figcaption>
&lt;p>CLiC-it 2025, Cagliari.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;p>If you&amp;rsquo;re interested in agentic reasoning, small language models, or multi-hop QA — feel free to reach out!&lt;/p></description></item><item><title>Large Language Models: Potenzialità, Limiti e Sistemi Multi-Agent</title><link>https://gaoithee.github.io/saracandussio.github.io/event/novalia-llm-workshop/</link><pubDate>Mon, 15 Dec 2025 00:00:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/event/novalia-llm-workshop/</guid><description>&lt;figure>&lt;img src="novalia.png"
alt="Workshop at Novalia, Trieste — December 2025.">&lt;figcaption>
&lt;p>Workshop at Novalia, Trieste — December 2025.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;p>Part of a two-session seminar series on digital transformation, co-organized with &lt;a href="https://www.ip4fvg.it/" target="_blank" rel="noopener">IP4FVG&lt;/a> and the University of Trieste.&lt;/p>
&lt;p>This session covered the inner workings of Large Language Models, multi-agent architectures, and fine-tuning strategies — with an eye toward practical business applications and a frank discussion of current limitations.&lt;/p>
&lt;p>The core message: the challenge isn&amp;rsquo;t just adopting AI, but integrating it strategically to augment rather than replace human potential.&lt;/p></description></item><item><title>Bridging Logic and Learning: Decoding Temporal Logic Embeddings via Transformers</title><link>https://gaoithee.github.io/saracandussio.github.io/event/dagstuhl-rtg-symposium/</link><pubDate>Mon, 24 Nov 2025 00:00:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/event/dagstuhl-rtg-symposium/</guid><description>&lt;p>Poster presentation of our ECML-PKDD 2025 paper at the RTG Symposium, held at the legendary &lt;a href="https://www.dagstuhl.de/" target="_blank" rel="noopener">Schloss Dagstuhl&lt;/a> — a week-long gathering of PhD researchers from Max Planck Institute and Saarland University, with guest talks by &lt;a href="https://mega.seas.harvard.edu/" target="_blank" rel="noopener">Mor Geva Pipek&lt;/a> and &lt;a href="https://ai-lab.units.it/?page_id=139" target="_blank" rel="noopener">Luca Bortolussi&lt;/a>.&lt;/p></description></item><item><title>Bridging Logic and Learning: Decoding Temporal Logic Embeddings via Transformers</title><link>https://gaoithee.github.io/saracandussio.github.io/publication/bridging-logic-and-learning/</link><pubDate>Thu, 10 Jul 2025 00:00:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/publication/bridging-logic-and-learning/</guid><description>&lt;p>This work introduces a Transformer-based decoder that inverts embeddings of Signal Temporal Logic (STL) formulae. By constructing a small STL vocabulary, the model can generate valid formulae quickly, generalize across semantic structures, and simplify formulas while preserving their meaning. Our methodology is evaluated across varying formula complexity and applied to requirement mining tasks, performing optimization directly in the semantic space.&lt;/p>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">Create your slides in Markdown - click the &lt;em>Slides&lt;/em> button to check out the example.&lt;/span>
&lt;/div>
&lt;p>Add the publication&amp;rsquo;s &lt;strong>full text&lt;/strong> or &lt;strong>supplementary notes&lt;/strong> here. You can use rich formatting such as including &lt;a href="https://docs.hugoblox.com/content/writing-markdown-latex/" target="_blank" rel="noopener">code, math, and images&lt;/a>.&lt;/p>
&lt;p>If you find overlap with your work or interests, I would be glad to connect and explore possible collaborations.&lt;/p></description></item><item><title>STLDec: Decoding Temporal Logic Embeddings via Transformers</title><link>https://gaoithee.github.io/saracandussio.github.io/project/stldec/</link><pubDate>Thu, 10 Jul 2025 00:00:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/project/stldec/</guid><description>&lt;p>Transformer-based decoder for inverting semantic embeddings of Signal Temporal Logic (STL) formulae. Published at &lt;strong>ECML-PKDD 2025&lt;/strong>.&lt;/p></description></item><item><title>Probabilistic Machine Learning</title><link>https://gaoithee.github.io/saracandussio.github.io/teaching/probabilistic-ml/</link><pubDate>Sat, 01 Mar 2025 00:00:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/teaching/probabilistic-ml/</guid><description>&lt;p>Teaching assistant for the &lt;strong>Probabilistic Machine Learning&lt;/strong> MSc course at the University of Trieste (Spring 2025).&lt;/p>
&lt;p>Topics covered: ERM, PAC learnability, Probabilistic Graphical Models, Hidden Markov Models, Bayesian Classification and Regression, Sampling‑based Inference, Expectation‑Maximization, Variational Inference, Generative Modeling (VAEs, Diffusion Models), Gaussian Processes.&lt;/p></description></item><item><title>OverRef: Studying Over-Refusal in Large Language Models</title><link>https://gaoithee.github.io/saracandussio.github.io/project/overref/</link><pubDate>Wed, 01 Jan 2025 00:00:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/project/overref/</guid><description>&lt;p>Ongoing project on &lt;strong>over-refusal&lt;/strong> in LLMs: studying when and why models refuse legitimate user queries, with benchmarking and dataset resources.&lt;/p></description></item><item><title>Projects</title><link>https://gaoithee.github.io/saracandussio.github.io/projects/</link><pubDate>Sun, 19 May 2024 00:00:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/projects/</guid><description/></item><item><title>Experience</title><link>https://gaoithee.github.io/saracandussio.github.io/experience/</link><pubDate>Tue, 24 Oct 2023 00:00:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/experience/</guid><description/></item></channel></rss>