[−][src]Crate trec_text
Parsing TREC Text format.
TREC Text is a text format containing a sequence of records:
<DOC> <DOCNO> 0 </DOCNO> Text body </DOC>
Examples
Typically, document records will be read from files.
Parser
can be constructer from any structure that implements
Read
,
and implements Iterator
.
Because parsing of a record can fail, either due to IO error or corupted records,
the iterator returns elements of Result<DocumentRecord>
.
use std::io::Cursor; let input = r#" <DOC> <DOCNO> 0 </DOCNO> zero </DOC> CORRUPTED <DOCNO> 1 </DOCNO> ten </DOC> <DOC> <DOCNO> 2 </DOCNO> ten nine </DOC> "#; let mut parser = Parser::new(Cursor::new(input)); let document = parser.next().unwrap()?; assert_eq!(String::from_utf8_lossy(document.docno()), "0"); assert_eq!(String::from_utf8_lossy(document.content()), " zero "); assert!(parser.next().unwrap().is_err()); let document = parser.next().unwrap()?; assert_eq!(String::from_utf8_lossy(document.docno()), "2"); assert_eq!(String::from_utf8_lossy(document.content()), " ten nine "); assert!(parser.next().is_none());
Structs
DocumentRecord | TREC Text record data. |
Parser | TREC Text format parser. |
Type Definitions
Result |
|