Pages

Sunday, June 19, 2011

Terrier 3.5 released

Terrier, IR Platform v3.5 - 16/06/2011

http://terrier.org/

Terrier 3.5, the next version of the open source IR platform from the University of Glasgow (Scotland) has been released.

Significant update: Added Document-at-a-time (DAAT) retrieval for large indices; Refactored tokenisation for enhanced multi-language support; Upgraded Hadoop support to version 0.20 (NB: Terrier now requires Java 1.6); Added synonym support in query language and retrieval; Added out-of-the-box support for query-biased summaries and improved example web-based interface; Added new, 2nd generation DFR models as well as other recent effective information-theoretic models; Included many more JUnit tests (now 300+). Terrier 3.0 indices are compatible with Terrier 3.5.

Indexing
  • TR-117: Improve fields support by SimpleXMLCollection
  • TR-120: Error loading an additional MetaIndex structure (contributed by Javier Ortega, Universidad de Sevilla)
  • TR-106: Pipeline Query/Doc Policy Lifecycle (contributed by Giovanni Stilo, University degli Studi dell'Aquila and Nestor Laboratory - University of Rome "Tor Vergata")
  • TR-116: Lexicon not properly renamed on Windows
  • TR-118: SimpleXMLCollection - the term near the closing tag is ignored (contributed by Damien Dudognon, Institut de Recherche en Informatique de Toulouse)
  • TR-123: Null pointer exception while trying to index simple document (contributed by Ilya Bogunov)
  • TR-126: Logging improvements
  • TR-124: When processing docid tag in MEDLINE format XML file, xml context path is needed
  • TR-127: Easier refactoring of SinglePass indexers (contributed by Jonathon Hare, University of Southampton)
  • TR-108: Some indexers do not set the IterablePosting class for the DirectIndex (contributed by Richard Eckart de Castilho, Darmstadt University of Technology)
  • TR-136: Hadoop indexing misbehaves when terrier.index.prefix is not "data"
  • TR-137: TRECCollection cannot add properties from the document tags to the meta index at indexing time
  • TR-150: TRECCollection parse DOCHDR tags, including URLs should they exist (see TRECWebCollection)
  • TR-138: IndexUtil.copyStructure fails when source and destination indices are same
  • TR-140: Indexing support for query-biased summarisation
  • TR-144: CollectionRecordReader.next should not be recursive
  • TR-146, TR-148: Tokenisation should be done separately from Document parsing (the tokeniser can be set using the property tokeniser- see Non English language support in Terrier for more information on changing the tokenisation used by Terrier); Refactor Document implementations (e.g. TRECDocument and HTMLDocument are now deprecated in favour of the new TaggedDocument)
  • TR-147: Allow various Collection implementations to use different Document implementations
  • TR-158: Single pass indexing with default configuration doesn't ever flush memory
Retrieval
  • TR-16,TR-166: Extending query language and Matching to support synonyms
  • TR-157: Remove TRECQuerying scripting files: trec.models, qemodels, trec.topics.list and trec.qrels - use properties inTRECQuerying instead.
  • TR-156: Deploy a DAAT matching strategy - see org.terrier.matching.daat (partially contributed by Nicola Tonellotto, CNR)
  • TR-113: The LGD Loglogistic weighting model (contributed by Gianni Amati, FUB)

Fuller change log at http://terrier.org/docs/current/whats_new.html

No comments: