Re: Detect term occurrences

2015-09-13 Thread Francisco Andrés Fernández
Thanks again. For the moment I think it won't be a problem. I have ~500 documents. Regards, Francisco El vie., 11 de sept. de 2015 a la(s) 6:08 p. m., simon escribió: > +1 on Sujit's recommendation: we have a similar use case (detecting drug > names / disease entities /MeSH terms ) and have bee

Re: Detect term occurrences

2015-09-11 Thread simon
+1 on Sujit's recommendation: we have a similar use case (detecting drug names / disease entities /MeSH terms ) and have been using the SolrTextTagger with great success. We run a separate Solr instance as a tagging service and add the detected tags as metadata fields to a document before it is i

Re: Detect term occurrences

2015-09-11 Thread Francisco Andrés Fernández
Thanks! El vie, sep 11, 2015 14:39, Sujit Pal escribió: > Hi Francisco, > > >> I have many drug products leaflets, each corresponding to 1 product. In > the > other hand we have a medical dictionary with about 10^5 terms. > I want to detect all the occurrences of those terms for any leaflet > do

Re: Detect term occurrences

2015-09-11 Thread Sujit Pal
Hi Francisco, >> I have many drug products leaflets, each corresponding to 1 product. In the other hand we have a medical dictionary with about 10^5 terms. I want to detect all the occurrences of those terms for any leaflet document. Take a look at SolrTextTagger for this use case. https://github.

Re: Detect term occurrences

2015-09-11 Thread Alexandre Rafalovitch
Assuming the medical dictionary is constant, I would do a copyField of text into a separate field and have that separate field use: http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html with words coming from the dictionary (normalized).

Re: Detect term occurrences

2015-09-11 Thread Francisco Andrés Fernández
Many thanks pals. I will walk some of those ways (and return with new questions) ;) Best regards, Francisco El vie., 11 de sept. de 2015 a la(s) 5:41 a. m., Upayavira escribió: > It sounds to me like you are wanting to *filter* your document to only > include terms within that medical dictionar

Re: Detect term occurrences

2015-09-11 Thread Upayavira
It sounds to me like you are wanting to *filter* your document to only include terms within that medical dictionary. Or to have a keyword field based upon those of your 100k terms that appear in that doc. Synonyms are your saviour, if that's the case. Create a synonyms list for your terms, they ca

Re: Detect term occurrences

2015-09-10 Thread Erick Erickson
_Assuming_ this isn't a high throughput _and_ the leaflet text isn't too big... Index the thesaurus and fire all the terms of the query in a big OR clause against the index as a _query_. Perhaps turn highlighting on and highlight the entire leaflet text. Note, this is just "off the top of my head

Re: Detect term occurrences

2015-09-10 Thread Francisco Andrés Fernández
Yes. I have many drug products leaflets, each corresponding to 1 product. In the other hand we have a medical dictionary with about 10^5 terms. I want to detect all the occurrences of those terms for any leaflet document. Could you give me a clue about how is the best way to perform it? Perhaps, th

Re: Detect term occurrences

2015-09-10 Thread Walter Underwood
Doing a query for each term should work well. Solr is fast for queries. Write a script. I assume you only need to do this once. Running all the queries will probably take less time than figuring out a different approach. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.o

RE: Detect term occurrences

2015-09-10 Thread Markus Jelsma
If you are interested in just the number of occurences of an indexed term. The TermsComponent will give that answer. MArkus -Original message- > From:Francisco Andrés Fernández > Sent: Thursday 10th September 2015 15:58 > To: solr-user@lucene.apache.org > Subject: Detect term occurrenc

Re: Detect term occurrences

2015-09-10 Thread Alexandre Rafalovitch
Can you tell us a bit more about the business case? Not the current technical one. Because it is entirely possible Solr can solve the higher level problem out of the box without you doing manual term comparisons.In which case, your problem scope is not quite right. Regards, Alex. Solr Anal