Querying
Now it is possible to query the index. The command queries treats each
line of the standard input (or a file if -q is present) as a separate
query. A query line contains a whitespace-delimited list of tokens.
These tokens are either interpreted as terms (if --terms is defined,
which will be used to resolve term IDs) or as term IDs (if --terms is
not defined). Optionally, a query can contain query ID delimited by a
colon:
Q1:one two three
^^ ^^^^^^^^^^^^^
query ID terms
For example:
$ ./bin/queries \
-e opt \ # index encoding
-a and \ # retrieval algorithm
-i test_collection.index.opt \ # index path
-w test_collection.wand \ # metadata file
-q ../test/test_data/queries # query input file
This performs conjunctive queries (and). In place of and other
operators can be used (see Query algorithms), and
also multiple operators separated by colon (and:or:wand), which will
run multiple passes, one per algorithm.
If the WAND file is compressed, append --compressed-wand flag.
Build additional data
To perform BM25 queries it is necessary to build an additional file containing the parameters needed to compute the score, such as the document lengths. The file can be built with the following command:
$ ./bin/create_wand_data \
-c ../test/test_data/test_collection \
-o test_collection.wand
If you want to compress the file append --compress at the end of the
command. When using variable-sized blocks (for VBMW) via the
--variable-block parameter, you can also specify lambda with the -l <float> or --lambda <float> flags. The value of lambda impacts the
mean size of the variable blocks that are output. See the VBMW paper
(listed below) for more details. If using fixed-sized blocks, which is
the default, you can supply the desired block size using the -b <UINT> or --block-size <UINT> arguments.