compress_inverted_index

Usage

Compresses an inverted index
Usage: ../../../build/bin/compress_inverted_index [OPTIONS]

Options:
  -h,--help                   Print this help message and exit
  -c,--collection TEXT REQUIRED
                              Uncompressed index basename
  -o,--output TEXT REQUIRED   Output inverted index
  --check                     Check the correctness of the index
  -e,--encoding TEXT REQUIRED Index encoding
  -w,--wand TEXT Needs: --scorer
                              WAND data filename
  -s,--scorer TEXT Needs: --wand --quantize
                              Scorer function
  --bm25-k1 FLOAT Needs: --scorer
                              BM25 k1 parameter.
  --bm25-b FLOAT Needs: --scorer
                              BM25 b parameter.
  --pl2-c FLOAT Needs: --scorer
                              PL2 c parameter.
  --qld-mu FLOAT Needs: --scorer
                              QLD mu parameter.
  --quantize UINT Needs: --scorer
                              Quantizes the scores using this many bits
  -L,--log-level TEXT:{critical,debug,err,info,off,trace,warn} [info] 
                              Log level
  --config                    Configuration .ini file

Description

Compresses an inverted index from the uncompressed format using one of the integer encodings.

Input

The input to this command is an uncompressed version of the inverted index described here. The --collection option takes the basename of the uncompressed index.

Encoding

The postings are compressed using one of the available integer encodings, defined by --encoding. The available encoding values are:

Precomputed Quantized Scores

At the time of compressing the index, you can replace frequencies with quantized precomputed scores. To do so, you must define --quantize flag, plus some additional options:

  • --scorer: scoring function that should be used in to calculate the scores (bm25, dph, pl2, qld)
  • --wand: metadata filename path