Terrier, IR Platform v3.5 - 16/06/2011
Terrier 3.5, the next version of the open source IR platform from the University of Glasgow (Scotland) has been released.
Significant update: Added Document-at-a-time (DAAT) retrieval for large indices; Refactored tokenisation for enhanced multi-language support; Upgraded Hadoop support to version 0.20 (NB: Terrier now requires Java 1.6); Added synonym support in query language and retrieval; Added out-of-the-box support for query-biased summaries and improved example web-based interface; Added new, 2nd generation DFR models as well as other recent effective information-theoretic models; Included many more JUnit tests (now 300+). Terrier 3.0 indices are compatible with Terrier 3.5.
Indexing
- TR-117: Improve fields support by SimpleXMLCollection
- TR-120: Error loading an additional MetaIndex structure (contributed by Javier Ortega, Universidad de Sevilla)
- TR-106: Pipeline Query/Doc Policy Lifecycle (contributed by Giovanni Stilo, University degli Studi dell'Aquila and Nestor Laboratory - University of Rome "Tor Vergata")
- TR-116: Lexicon not properly renamed on Windows
- TR-118: SimpleXMLCollection - the term near the closing tag is ignored (contributed by Damien Dudognon, Institut de Recherche en Informatique de Toulouse)
- TR-123: Null pointer exception while trying to index simple document (contributed by Ilya Bogunov)
- TR-126: Logging improvements
- TR-124: When processing docid tag in MEDLINE format XML file, xml context path is needed
- TR-127: Easier refactoring of SinglePass indexers (contributed by Jonathon Hare, University of Southampton)
- TR-108: Some indexers do not set the IterablePosting class for the DirectIndex (contributed by Richard Eckart de Castilho, Darmstadt University of Technology)
- TR-136: Hadoop indexing misbehaves when terrier.index.prefix is not "data"
- TR-137: TRECCollection cannot add properties from the document tags to the meta index at indexing time
- TR-150: TRECCollection parse DOCHDR tags, including URLs should they exist (see TRECWebCollection)
- TR-138: IndexUtil.copyStructure fails when source and destination indices are same
- TR-140: Indexing support for query-biased summarisation
- TR-144: CollectionRecordReader.next should not be recursive
- TR-146, TR-148: Tokenisation should be done separately from Document parsing (the tokeniser can be set using the property tokeniser- see Non English language support in Terrier for more information on changing the tokenisation used by Terrier); Refactor Document implementations (e.g. TRECDocument and HTMLDocument are now deprecated in favour of the new TaggedDocument)
- TR-147: Allow various Collection implementations to use different Document implementations
- TR-158: Single pass indexing with default configuration doesn't ever flush memory
Retrieval
- TR-157: Remove TRECQuerying scripting files: trec.models, qemodels, trec.topics.list and trec.qrels - use properties inTRECQuerying instead.
- TR-156: Deploy a DAAT matching strategy - see org.terrier.matching.daat (partially contributed by Nicola Tonellotto, CNR)
Fuller change log at http://terrier.org/docs/current/whats_new.html
No comments:
Post a Comment