parse_collection

Usage

parse_collection - parse collection and store as forward index.
Usage: ../../../build/bin/parse_collection [OPTIONS] [SUBCOMMAND]

Options:
  -h,--help                   Print this help message and exit
  -L,--log-level TEXT:{critical,debug,err,info,off,trace,warn} [info] 
                              Log level
  -j,--threads UINT           Number of threads
  --tokenizer TEXT:{english,whitespace} [english] 
                              Tokenizer
  -H,--html                   Strip HTML
  -F,--token-filters TEXT:{krovetz,lowercase,porter2} ...
                              Token filters
  --stopwords TEXT            Path to file containing a list of stop words to filter out
  --config                    Configuration .ini file
  -o,--output TEXT REQUIRED   Forward index filename
  -b,--batch-size INT [100000] 
                              Number of documents to process in one thread
  -f,--format TEXT [plaintext] 
                              Input format

Subcommands:
  merge                       Merge previously produced batch files. When parsing process was killed during merging, use this command to finish merging without having to restart building batches.