Retrieval Algorithms

This is the list of the supported query processing algorithms.

Unranked

PISA implements two unranked algorithms, meaning they return a full list of documents matching the query in the order of their appearance in the posting lists.

Intersection

The intersection algorithm (and) returns only the documents that match all query terms.

Union

The union algorithm (or) returns all the documents that match any query term.

Top-k Retrieval

Top-k retrieval returns the top-k highest scored documents with respect To the given query.

Document-at-a-time (DaaT)

Document-at-a-time algorithms traverse one document at a time. They rely on posting lists being sorted by document IDs, and scan them in step to retrieve all frequencies for a document right away.

Conjunctive processing

Conjunctive processing (ranked_and) returns the top k documents that contain all of the query terms. This is an exhaustive algorithm, meaning all documents must be scored.

Disjunctive processing

Conjunctive processing (ranked_or) returns the top k documents that contain any of the query terms. This is an exhaustive algorithm, meaning all documents must be scored.

MaxScore

MaxScore (maxscore) uses precomputed maximum partial scores for each term to avoid calculating all scores. It is especially suitable for longer queries (high term count), short posting lists, or high values of k (number of returned top documents).

Howard Turtle and James Flood. 1995. Query evaluation: strategies and optimizations. Inf. Process. Manage. 31, 6 (November 1995), 831-850. DOI=http://dx.doi.org/10.1016/0306-4573(95)00020-H

WAND

Similar to MaxScore, WAND (wand) uses precomputed maximum partial scores for each term to avoid calculating all scores. Its performance is sensitive to the term count, so it may not be the best choice for long queries. It may also take a performance hit when k is very high, in which case MaxScore may prove more efficient.

Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien. 2003. Efficient query evaluation using a two-level retrieval process. In Proceedings of the twelfth international conference on Information and knowledge management (CIKM '03). ACM, New York, NY, USA, 426-434. DOI: https://doi.org/10.1145/956863.956944

BlockMax WAND

BlockMax WAND (block_max_wand) builds on top of WAND. It uses additional precomputed scores for ranges of documents in posting lists, which allows for skipping entire blocks of documents if their max score is low enough.

Shuai Ding and Torsten Suel. 2011. Faster top-k document retrieval using block-max indexes. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR '11). ACM, New York, NY, USA, 993-1002. DOI=http://dx.doi.org/10.1145/2009916.2010048

Variable BlockMax WAND

Variable BlockMax WAND is the same algorithm as block_max_wand at query time. The difference is in precomputing the block-max scores. Instead having even block sizes, each block can have a different size, to optimize the effectiveness of skips.

Antonio Mallia, Giuseppe Ottaviano, Elia Porciani, Nicola Tonellotto, and Rossano Venturini. 2017. Faster BlockMax WAND with Variable-sized Blocks. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17). ACM, New York, NY, USA, 625-634. DOI: https://doi.org/10.1145/3077136.3080780

BlockMax MaxScore

BlockMax MaxScore (block_max_maxscore) is a MaxScore implementation with additional block-max scores, similar to BlockMax WAND.

BlockMax AND

BlockMax AND (block_max_ranked_and) is a conjunctive algorithm using block-max scores.

Term-at-a-time (TaaT)

Term-at-a-time algorithms traverse one posting list at a time. Thus, they cannot rely on all frequencies for a given document being known at the time of their processing. This requires an accumulator structure to keep partial scores.

Disjunctive TaaT processing

Disjunctive TaaT (ranked_or_taat) is a simple algorithm that accumulates document scores while traversing postings one list at a time. ranked_or_taat_lazy is a variant that uses an accumulator array that initializes lazily.