Querying
The command queries treats each line of the standard input (or a file if -q
is present) as a separate query. A query line contains a whitespace-delimited
list of tokens. These tokens are either interpreted as terms (if --terms is
defined, which will be used to resolve term IDs) or as term IDs (if --terms is
not defined). Optionally, a query can contain query ID delimited by a colon:
Q1:one two three
^^ ^^^^^^^^^^^^^
query ID terms
For example:
$ ./bin/queries \
-e opt \ # index encoding
-a and \ # retrieval algorithm
-i test_collection.index.opt \ # index path
-w test_collection.wand \ # metadata file
-q ../test/test_data/queries # query input file
This performs conjunctive queries (and). In place of and other operators can
be used (see Query algorithms). To run multiple algorithms,
provide -a multiple times (for example, -a and -a or -a wand).
The tool outputs a JSON with query execution statistics including mean, median
(q50), and percentiles (q90, q95, q99) for different per-query
aggregation (none, min, mean, median, max).
If the WAND file is compressed, append --compressed-wand flag.
Supported algorithms
The following algorithms are available via the -a option:
andoror_freqwandblock_max_wandblock_max_maxscoreranked_andblock_max_ranked_andranked_ormaxscoreranked_or_taatranked_or_taat_lazy
Additional options
--runs <N>: Number of runs per query (default: 3)-o, --output <FILE>: Output file for per-run query timing data--safe: Rerun if not enough results with pruning (requires--thresholds)--quantized: Quantized scores
Build additional data
To perform BM25 queries it is necessary to build an additional file containing the parameters needed to compute the score, such as the document lengths. The file can be built with the following command:
$ ./bin/create_wand_data \
-c ../test/test_data/test_collection \
-o test_collection.wand
If you want to compress the file append --compress at the end of the command.
When using variable-sized blocks (for VBMW) via the --variable-block
parameter, you can also specify lambda with the -l <float> or
--lambda <float> flags. The value of lambda impacts the mean size of the
variable blocks that are output. See the VBMW paper (listed below) for more
details. If using fixed-sized blocks, which is the default, you can supply the
desired block size using the -b <UINT> or --block-size <UINT> arguments.