InPars+: Supercharging Synthetic Data Generation for IR
Published in Arxiv pre-print, 2025
This work revisits and extends synthetic query generation pipelines for Neural Information Retrieval (NIR) by leveraging the InPars Toolkit, a reproducible, end-to-end framework for generating training data using large language models (LLMs). We first assess the reproducibility of the original InPars, InPars-V2, and Promptagator pipelines on the SciFact benchmark and validate their effectiveness using open-source reranker and generator models. Building on this foundation, we introduce two key extensions to the pipeline:(1) fine …
Recommended citation: Krastev, M., Hamar, M., Toapanta, D., Brouwers, J., & Lei, Y. (2025). InPars+: Supercharging Synthetic Data Generation for Information Retrieval Systems. arXiv preprint arXiv:2508.13930.
Download Paper