Well… that’s a bit much…
It’s safer to assume that modules not yet built are still uncertain in terms of actual functionality. You’ll just have to structure the discussion around the parts you’ve already built and the parts you fully understand…
Treat it as:
1 hour for results + examples →
2 hours for slides →
30–40 minutes for practice.
No new modeling, just packaging what you already have.
1. Lock your story (10–15 minutes)
Write this on a page or in a doc:
-
One-sentence project description
“I fine-tuned RoBERTa on the ContractNLI dataset to analyze NDAs as an NLI task (entailed / contradicted / neutral) for hypotheses like survival, retention, and CI sharing, and evaluated how well it works on real NDAs.”
-
What you did
- Used ContractNLI as the main training+test dataset. (datasets-benchmarks-proceedings.neurips.cc)
- Fine-tuned RoBERTa-base for 3-way NLI.
- Handled long NDAs with 512-token chunking.
- Tested on some real NDAs to see how it behaves.
-
Main finding
- On ContractNLI: OK performance (~70%).
- On your NDAs: noticeable errors and over-confidence for survival / retention / CI sharing.
-
Main idea for improvement (future work)
- Move from raw chunking to clause-level retrieval + NLI and add calibration.
If you have this clearly in front of you, making slides is easy.
2. Get one clean baseline result + 3 example NDAs (60 minutes)
2.1 One clean number on ContractNLI (20–30 minutes)
Run your current code once and write down:
- Accuracy (or macro-F1) on ContractNLI dev/test.
That’s your benchmark number. Don’t overthink it.
Guides on ML project presentations all emphasize: you must be able to say clearly
“Here is the metric, on which dataset, and how I got it” – not necessarily achieve SOTA.
2.2 3–4 “story” examples from your NDAs (30–40 minutes)
Pick just a few NDAs and focus on the three key hypotheses:
- Survival
- Retention
- CI sharing
For each hypothesis, find:
- 1 NDA where the model works.
- 1 NDA where it fails clearly (wrong label or clearly over-confident).
For each chosen example, write down:
- Hypothesis text.
- The relevant NDA clause (copy the main paragraph; highlight the key phrase in bold).
- Ground truth label (your judgment).
- Model prediction (E/C/N + confidence/probability).
- One simple sentence: “Why this is wrong / interesting.”
That’s enough material to show concrete strengths/weaknesses and to “bring the model to life,” which is exactly what good technical presentations do.
3. Build a very simple slide deck (about 10–12 slides) (90 minutes)
You don’t need fancy design. Just clear structure and minimal text, as common technical-presentation advice recommends.
Slide skeleton
-
Title
- Title, your name, course, date.
-
Motivation
- One slide: “Why NDAs?” (short bullet list: frequent, risk, boring to read manually).
-
Task = NLI
-
Data
- ContractNLI: NDAs from EDGAR, fixed hypotheses, E/C/N + evidence spans.
- Your NDAs: X documents you annotated for survival / retention / CI sharing.
-
Model & baseline
- “RoBERTa-base fine-tuned on ContractNLI.”
- “Long NDAs handled with 512-token chunking (split contract, run NLI per chunk, take max score).”
-
Results: numbers
7–9. Example slides (most important part)
For each:
-
Slide title: “Example – Survival (Failure)”.
-
Show:
- Hypothesis (1 line).
- Clause (3–6 lines, with key words bold).
- Ground truth vs model prediction (and confidence).
-
One or two bullets explaining what went wrong (e.g., clause in another section, exception language, over-confidence).
Do the same for retention and CI sharing.
(If time is short, 2 examples are enough.)
-
Why chunking isn’t enough
-
2–3 bullets:
- Splits clauses across chunks.
- Model sees irrelevant text and misses the one critical clause.
- No explicit evidence, just a label.
-
Proposed better pipeline (future work)
-
Simple boxes:
NDA PDF → sections & clauses → retrieve relevant clauses → NLI per clause → aggregate → calibration.
-
One bullet: “Inspired by ContractNLI’s span-based evidence and clause retrieval datasets like CUAD/ACORD.” (datasets-benchmarks-proceedings.neurips.cc)
-
Conclusion
-
3 bullets:
- “NDA review can be framed as NLI using ContractNLI.”
- “RoBERTa + simple chunking gives okay benchmark performance but fails on real NDAs in specific ways (survival, retention, CI sharing).”
- “A clause-level retrieval + NLI + calibration pipeline is a more realistic path forward.”
That’s it. You don’t need more slides.
4. Quick rehearsal and logistics (30–40 minutes)
Follow very basic day-before advice from standard presentation guides: focus on clarity, not perfection.
-
Run through your slides once out loud
- Aim to say 1–2 sentences per bullet.
- If you go way over time, remove text or merge slides (don’t cram more bullets).
-
Check your environment
- Slides saved in PDF and PPTX/Keynote.
- Model/demos: only if you really need them; otherwise rely on screenshots/text you already copied.
-
Pick 2–3 likely questions and think of 1–2 sentence answers
- “Why does performance drop on real NDAs?”
- “Why is 512-token chunking not ideal?”
- “What is the most important next step?”
Simple, honest answers are fine.
Super-short version (if you’re really tired)
If you want an even shorter checklist:
-
Get one clean metric on ContractNLI and one on your NDA set.
-
Choose 3 good examples (correct, wrong, over-confident) for survival/retention/CI sharing.
-
Make ~10 slides:
- Task, data, model, numbers, examples, what goes wrong, future pipeline, conclusion.
-
Practice once end-to-end.
If you do just that, you will have a clear, coherent presentation that shows:
- you understand the task,
- you built and evaluated a real model,
- you identified real failure cases, and
- you know the next steps to improve it.