Post
80
📢 For those who experimening with large language models in clinical domain, this might be relevant. So far we participated in Task 4 of the ArchEHR-QA 2026 shared task. The task requires aligning answers to patient questions with clinical evidence. Our approach encodes clinical reasoning principles directly into the prompt, evaluated in a zero-shot setting.
Results: +2% improvement with ChatGPT-5.2. No measurable gain with Llama-3 70B, indicating the approach benefits larger-scale models.
🌟 Code: https://github.com/nicolay-r/ArchEHR-QA-2026-Task-4-MedEvi-NS
📊 Shared task: https://www.codabench.org/competitions/13528/
Results: +2% improvement with ChatGPT-5.2. No measurable gain with Llama-3 70B, indicating the approach benefits larger-scale models.
🌟 Code: https://github.com/nicolay-r/ArchEHR-QA-2026-Task-4-MedEvi-NS
📊 Shared task: https://www.codabench.org/competitions/13528/