You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation

πŸ“– arXiv Paper (Accepted to EMNLP 2025 Main πŸŽ‰) | πŸ€— Model | πŸ€— Dataset | πŸ› οΈ Github |


We explore the use of large language models (LLMs) for annotating document utility in training retrieval and retrieval-augmented generation (RAG) systems, aiming to reduce dependence on costly human annotations. We address the gap between retrieval relevance and generative utility by employing LLMs to annotate document utility. Using the Qwen2.5-32B model and Qwen3-32B, we annotate utility on the MS MARCO dataset and NQ dataset.

πŸ“¦ Utility-Focused Annotation for IR and RAG Dataset

Framework

We introduce Utility-Focused Annotation for IR and RAG, a large-scale LLM-annotated retrieval data.

  • MS MARCO: About 500K queries
  • NQ: About 50K queries.

⭐️ Training

More training deatils are shown in Github. This repository shows different checkpoints trained on different annotation data.

πŸ‘‹ Citation

If you find our paper and code useful in your research, please cite our paper.

@inproceedings{zhang2025utility,
  title={Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation},
  author={Zhang, Hengran and Tang, Minghao and Bi, Keping and Guo, Jiafeng and Liu, Shihao and Shi, Daiting and Yin, Dawei and Cheng, Xueqi},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  pages={1683--1702},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for hengranZhang/Utility_focused_annotation

Base model

Shitao/RetroMAE
Finetuned
(1)
this model

Dataset used to train hengranZhang/Utility_focused_annotation