Querying
Now it is possible to query the index. The command queries
treats each
line of the standard input (or a file if -q
is present) as a separate
query. A query line contains a whitespace-delimited list of tokens.
These tokens are either interpreted as terms (if --terms
is defined,
which will be used to resolve term IDs) or as term IDs (if --terms
is
not defined). Optionally, a query can contain query ID delimited by a
colon:
Q1:one two three
^^ ^^^^^^^^^^^^^
query ID terms
For example:
$ ./bin/queries \
-e opt \ # index encoding
-a and \ # retrieval algorithm
-i test_collection.index.opt \ # index path
-w test_collection.wand \ # metadata file
-q ../test/test_data/queries # query input file
This performs conjunctive queries (and
). In place of and
other
operators can be used (see Query algorithms), and
also multiple operators separated by colon (and:or:wand
), which will
run multiple passes, one per algorithm.
If the WAND file is compressed, append --compressed-wand
flag.
Build additional data
To perform BM25 queries it is necessary to build an additional file containing the parameters needed to compute the score, such as the document lengths. The file can be built with the following command:
$ ./bin/create_wand_data \
-c ../test/test_data/test_collection \
-o test_collection.wand
If you want to compress the file append --compress
at the end of the
command. When using variable-sized blocks (for VBMW) via the
--variable-block
parameter, you can also specify lambda with the -l <float>
or --lambda <float>
flags. The value of lambda impacts the
mean size of the variable blocks that are output. See the VBMW paper
(listed below) for more details. If using fixed-sized blocks, which is
the default, you can supply the desired block size using the -b <UINT>
or --block-size <UINT>
arguments.