Re: Detect term occurrences

2015-09-13 Thread Francisco Andrés Fernández
Thanks again. For the moment I think it won't be a problem. I have ~500 documents. Regards, Francisco El vie., 11 de sept. de 2015 a la(s) 6:08 p. m., simon escribió: > +1 on Sujit's recommendation: we have a similar use case (detecting drug > names / disease entities /MeSH

Re: Detect term occurrences

2015-09-11 Thread Sujit Pal
Hi Francisco, >> I have many drug products leaflets, each corresponding to 1 product. In the other hand we have a medical dictionary with about 10^5 terms. I want to detect all the occurrences of those terms for any leaflet document. Take a look at SolrTextTagger for this use case.

Re: Detect term occurrences

2015-09-11 Thread Upayavira
It sounds to me like you are wanting to *filter* your document to only include terms within that medical dictionary. Or to have a keyword field based upon those of your 100k terms that appear in that doc. Synonyms are your saviour, if that's the case. Create a synonyms list for your terms, they

Re: Detect term occurrences

2015-09-11 Thread simon
+1 on Sujit's recommendation: we have a similar use case (detecting drug names / disease entities /MeSH terms ) and have been using the SolrTextTagger with great success. We run a separate Solr instance as a tagging service and add the detected tags as metadata fields to a document before it is

Re: Detect term occurrences

2015-09-11 Thread Francisco Andrés Fernández
Many thanks pals. I will walk some of those ways (and return with new questions) ;) Best regards, Francisco El vie., 11 de sept. de 2015 a la(s) 5:41 a. m., Upayavira escribió: > It sounds to me like you are wanting to *filter* your document to only > include terms within

Re: Detect term occurrences

2015-09-11 Thread Francisco Andrés Fernández
Thanks! El vie, sep 11, 2015 14:39, Sujit Pal escribió: > Hi Francisco, > > >> I have many drug products leaflets, each corresponding to 1 product. In > the > other hand we have a medical dictionary with about 10^5 terms. > I want to detect all the occurrences of those

Re: Detect term occurrences

2015-09-11 Thread Alexandre Rafalovitch
Assuming the medical dictionary is constant, I would do a copyField of text into a separate field and have that separate field use: http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html with words coming from the dictionary (normalized).

Re: Detect term occurrences

2015-09-10 Thread Walter Underwood
:Francisco Andrés Fernández <fra...@gmail.com> >> Sent: Thursday 10th September 2015 15:58 >> To: solr-user@lucene.apache.org >> Subject: Detect term occurrences >> >> Hi all, I'm new to Solr. >> I want to detect all ocurrences of terms existing in a thesa

RE: Detect term occurrences

2015-09-10 Thread Markus Jelsma
e.org > Subject: Detect term occurrences > > Hi all, I'm new to Solr. > I want to detect all ocurrences of terms existing in a thesaurus into 1 or > more documents. > What´s the best strategy to make it? > Doing a query for each term doesn't seem to be the best way. > Many thanks, > > Francisco >

Re: Detect term occurrences

2015-09-10 Thread Alexandre Rafalovitch
Can you tell us a bit more about the business case? Not the current technical one. Because it is entirely possible Solr can solve the higher level problem out of the box without you doing manual term comparisons.In which case, your problem scope is not quite right. Regards, Alex. Solr

Detect term occurrences

2015-09-10 Thread Francisco Andrés Fernández
Hi all, I'm new to Solr. I want to detect all ocurrences of terms existing in a thesaurus into 1 or more documents. What´s the best strategy to make it? Doing a query for each term doesn't seem to be the best way. Many thanks, Francisco

Re: Detect term occurrences

2015-09-10 Thread Erick Erickson
_Assuming_ this isn't a high throughput _and_ the leaflet text isn't too big... Index the thesaurus and fire all the terms of the query in a big OR clause against the index as a _query_. Perhaps turn highlighting on and highlight the entire leaflet text. Note, this is just "off the top of my

Re: Detect term occurrences

2015-09-10 Thread Francisco Andrés Fernández
Yes. I have many drug products leaflets, each corresponding to 1 product. In the other hand we have a medical dictionary with about 10^5 terms. I want to detect all the occurrences of those terms for any leaflet document. Could you give me a clue about how is the best way to perform it? Perhaps,