[][src]Crate trec_text

Parsing TREC Text format.

TREC Text is a text format containing a sequence of records: <DOC> <DOCNO> 0 </DOCNO> Text body </DOC>

Examples

Typically, document records will be read from files. Parser can be constructer from any structure that implements Read, and implements Iterator.

Because parsing of a record can fail, either due to IO error or corupted records, the iterator returns elements of Result<DocumentRecord>.

use std::io::Cursor;

let input = r#"
<DOC> <DOCNO> 0 </DOCNO> zero </DOC>
CORRUPTED <DOCNO> 1 </DOCNO> ten </DOC>
<DOC> <DOCNO> 2 </DOCNO> ten nine </DOC>
   "#;
let mut parser = Parser::new(Cursor::new(input));

let document = parser.next().unwrap()?;
assert_eq!(String::from_utf8_lossy(document.docno()), "0");
assert_eq!(String::from_utf8_lossy(document.content()), " zero ");

assert!(parser.next().unwrap().is_err());

let document = parser.next().unwrap()?;
assert_eq!(String::from_utf8_lossy(document.docno()), "2");
assert_eq!(String::from_utf8_lossy(document.content()), " ten nine ");

assert!(parser.next().is_none());

Structs

DocumentRecord

TREC Text record data.

Parser

TREC Text format parser.

Type Definitions

Result

Result<T, Error>