compress_inverted_index
Usage
Compresses an inverted index
Usage: ../../../build/bin/compress_inverted_index [OPTIONS]
Options:
-h,--help Print this help message and exit
-c,--collection TEXT REQUIRED
Uncompressed index basename
-o,--output TEXT REQUIRED Output inverted index
--check Check the correctness of the index
-e,--encoding TEXT REQUIRED Index encoding
-w,--wand TEXT Needs: --scorer
WAND data filename
-s,--scorer TEXT Needs: --wand --quantize
Scorer function
--bm25-k1 FLOAT Needs: --scorer
BM25 k1 parameter.
--bm25-b FLOAT Needs: --scorer
BM25 b parameter.
--pl2-c FLOAT Needs: --scorer
PL2 c parameter.
--qld-mu FLOAT Needs: --scorer
QLD mu parameter.
--quantize UINT Needs: --scorer
Quantizes the scores using this many bits
-L,--log-level TEXT:{critical,debug,err,info,off,trace,warn} [info]
Log level
--config Configuration .ini file
Description
Compresses an inverted index from the uncompressed format using one of the integer encodings.
Input
The input to this command is an uncompressed version of the inverted
index described here.
The --collection
option takes the basename of the uncompressed
index.
Encoding
The postings are compressed using one of the available integer
encodings, defined by --encoding
. The available encoding values are:
block_interpolative
: Binary Interpolative Codingef
: Elias-Fanoblock_maskedvbyte
: MaskedVByteblock_optpfor
: OptPForDeltapef
: Partitioned Elias-Fanoblock_qmx
: QMXblock_simdbp
: SIMD-BP128block_simple8b
: Simple8bblock_simple16
: Simple16block_streamvbyte
: StreamVByteblock_varintg8iu
: Varint-G8IUblock_varintgb
: Varint-GB
Precomputed Quantized Scores
At the time of compressing the index, you can replace frequencies with
quantized precomputed scores. To do so, you must define --quantize
flag, plus some additional options:
--scorer
: scoring function that should be used in to calculate the scores (bm25
,dph
,pl2
,qld
)--wand
: metadata filename path