Hi Ryan,
Why not preprocessing your documents with tools like Apache UIMA, GATE or
OpenNLP before indexing them in Lucene? GATE for instance has FST-based
gazetteers which would be perfect for your place names, AFAIK there is also
a Dictionary component for UIMA which would be a good match.
BTW I don't remember anyone on the Nutch list suggesting you to use Carrot
for this (see : http://search-lucene.com/?q=luan+carrot) or classifying at
querying time
What I suggested in http://search-lucene.com/m/JWZTj1q4lB92 was about
classifying during the parsing or indexing and generating a
Hi,
Tools like GATE (http://www.gate.ac.uk) or Apache UIMA would be good
candidates for what you are trying to achieve.
HTH
--
DigitalPebble Ltd
http://www.digitalpebble.com
2010/1/14 Ortelli, Gian Luca gianluca.orte...@truvo.com
Well, the exact definition we're going to find out
Hi,
you should also have a look at GATE (http://gate.ac.uk) which comes with a
NER application called ANNIE. You could use it to analyse your docs before
indexing them with Lucene or SOLR.
As Grant mentioned, UIMA can also be used for that as there are a number of
NER annotators available for it
Hi Thomas,
Have a look at SOLR (*lucene.apache.org/solr*). It is based on Lucene and
provides additional functionalities including faceted search.
Best,
Julien
2008/10/13 Thomas Birnbaum [EMAIL PROTECTED]
hi...
currently we are using an propetary search engine witch supports a
historam.
Bonjour Romain,
Im asking myself a few questions. Mainly about speed (indexation time) and
document parsing (way to index most of commonly used office documents). For
document parsing, I'm planning to use different open sources library. The
company Im doing this for will be indexing a few
Hi Raphael,
We initially tried to do the same but ended up developing our own API for
querying the Web 1T. You can find more details on
http://digitalpebble.com/resources.html
There could be a way to reuse elements from Lucene e.g. the Term index only
but I could not find an obvious way to