Efficient In-Memory Inverted Indexes: Theory and Practice

¹The University of Queensland, ²University of Glasgow, ³Pinecone, ⁴MongoDB, Inc.

Date and Location: SIGIR 2025, Padua, Italy - Sunday July 13th, 09.00-12.30

Abstract

Inverted indexes are the backbone of most large-scale information retrieval systems. Although conceptually simple, high-performance inverted indexes require a deep understanding of low-level system optimizations, compression techniques, and traversal strategies. With the widespread adoption of in-memory search engines, the rise of learned sparse retrieval (LSR), and the increasing complexity of ranking pipelines, the design space for efficient indexing and retrieval systems has expanded significantly.

This tutorial addresses a critical knowledge gap between textbook-style explanations and advanced techniques required for efficient and optimized retrieval. It aims to equip researchers and practitioners with a comprehensive understanding of how modern in-memory search systems are designed, built, and optimized for high-performance retrieval across large-scale document collections.

Practical Component

For the practical component, we will need to grab some data, and have some instructions ready.

Prerequisite: You have your own machine with Docker installed.
I will step through on my own machine if you don’t have access.

You can also experiment with the tutorial at any time (later).

Tutorial: https://shorturl.at/VExpG
AKA: https://github.com/pisa-engine/pisa/blob/main/test/docker/tutorial/instructions.md

Slide Deck: https://pisa-engine.github.io/static/SIGIR2025-2.pdf


Joel Mackenzie¹	Sean MacAvaney²	Antonio Mallia³	Michal Siedlaczek⁴