<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Thesis | Sara Candussio</title><link>https://gaoithee.github.io/saracandussio.github.io/publication_types/thesis/</link><atom:link href="https://gaoithee.github.io/saracandussio.github.io/publication_types/thesis/index.xml" rel="self" type="application/rss+xml"/><description>Thesis</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Wed, 28 Jan 2026 00:00:00 +0000</lastBuildDate><image><url>https://gaoithee.github.io/saracandussio.github.io/media/icon_hu7729264130191091259.png</url><title>Thesis</title><link>https://gaoithee.github.io/saracandussio.github.io/publication_types/thesis/</link></image><item><title>A Dialectic Pipeline for Improving LLM Robustness</title><link>https://gaoithee.github.io/saracandussio.github.io/publication/dialectic-pipeline/</link><pubDate>Wed, 28 Jan 2026 00:00:00 +0000</pubDate><guid>https://gaoithee.github.io/saracandussio.github.io/publication/dialectic-pipeline/</guid><description>&lt;p>Can LLMs improve their accuracy without further training, just through a dialectic way of questioning themselves — as Hegel suggested?&lt;/p>
&lt;p>This was the core question behind my Master&amp;rsquo;s thesis. The short answer: &lt;strong>yes, and by a lot&lt;/strong>.&lt;/p>
&lt;h2 id="the-idea">The Idea&lt;/h2>
&lt;p>Inspired by Hegelian dialectics, the pipeline structures reasoning into three stages:&lt;/p>
&lt;figure>&lt;img src="https://gaoithee.github.io/saracandussio.github.io/saracandussio.github.io/publication/dialectic-pipeline/dialectic.png"
alt="The thesis–antithesis–synthesis pipeline.">&lt;figcaption>
&lt;p>The thesis–antithesis–synthesis pipeline.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;ol>
&lt;li>&lt;strong>Thesis&lt;/strong> — the model produces an initial answer given the question, context, and options.&lt;/li>
&lt;li>&lt;strong>Antithesis&lt;/strong> — the model challenges its own answer, now also seeing the thesis.&lt;/li>
&lt;li>&lt;strong>Synthesis&lt;/strong> — the model produces a final answer, having seen both thesis and antithesis.&lt;/li>
&lt;/ol>
&lt;p>No fine-tuning. No domain-specific verifiers. Just structured self-dialogue.&lt;/p>
&lt;h2 id="results">Results&lt;/h2>
&lt;p>The pipeline was tested on multi-hop QA benchmarks (HotpotQA, WikiHop) across five open-source models under 20B parameters (Phi-mini, Phi-medium, Gemma-2B, Gemma-9B, LLaMA-8B).&lt;/p>
&lt;figure>&lt;img src="https://gaoithee.github.io/saracandussio.github.io/saracandussio.github.io/publication/dialectic-pipeline/results-table.png"
alt="Accuracy improvements across models on HotpotQA.">&lt;figcaption>
&lt;p>Accuracy improvements across models on HotpotQA.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;figure>&lt;img src="https://gaoithee.github.io/saracandussio.github.io/saracandussio.github.io/publication/dialectic-pipeline/results-donut.png"
alt="From 53.4% to 80.7% on HotpotQA with Phi-mini (&amp;#43;27.3%).">&lt;figcaption>
&lt;p>From 53.4% to 80.7% on HotpotQA with Phi-mini (+27.3%).&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;p>Improvements of &lt;strong>up to 30%&lt;/strong> on complex multi-hop questions — beating standard Chain-of-Thought prompting.&lt;/p>
&lt;figure>&lt;img src="https://gaoithee.github.io/saracandussio.github.io/saracandussio.github.io/publication/dialectic-pipeline/results-wikihop.png"
alt="CoT vs. pipeline on WikiHop across all models.">&lt;figcaption>
&lt;p>CoT vs. pipeline on WikiHop across all models.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;h2 id="key-takeaways">Key Takeaways&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Self-debating is the main driver&lt;/strong>: letting models reflect on and contrast their own reasoning significantly boosts performance, especially as question complexity increases.&lt;/li>
&lt;li>&lt;strong>Instruction following matters&lt;/strong>: models that strictly follow instructions (Llama, Phi) benefit more than those that get &amp;ldquo;too creative&amp;rdquo; (Gemma-2).&lt;/li>
&lt;li>&lt;strong>Smart filtering &amp;gt; summarization&lt;/strong>: when dealing with long contexts, filtering for relevant information beats summarization, which can hurt deductive reasoning.&lt;/li>
&lt;li>&lt;strong>Avoid overthinking&lt;/strong>: for simpler tasks, too much deliberation can introduce errors. A touch of &amp;ldquo;impulsivity&amp;rdquo; sometimes helps.&lt;/li>
&lt;/ul>
&lt;figure>&lt;img src="https://gaoithee.github.io/saracandussio.github.io/saracandussio.github.io/publication/dialectic-pipeline/results-context.png"
alt="Original vs. summarized vs. filtered context on WikiHop.">&lt;figcaption>
&lt;p>Original vs. summarized vs. filtered context on WikiHop.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;p>This work also received an &lt;strong>Honorable Mention at the Emanuele Pianta Award (AILC)&lt;/strong> for the best Italian NLP Master&amp;rsquo;s thesis at CLiC-it 2025. 🏆&lt;/p>
&lt;figure>&lt;img src="https://gaoithee.github.io/saracandussio.github.io/saracandussio.github.io/publication/dialectic-pipeline/clicit.png"
alt="CLiC-it 2025, Cagliari.">&lt;figcaption>
&lt;p>CLiC-it 2025, Cagliari.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;p>If you&amp;rsquo;re interested in agentic reasoning, small language models, or multi-hop QA — feel free to reach out!&lt;/p></description></item></channel></rss>