Re: Detect term occurrences

Sujit Pal Fri, 11 Sep 2015 10:40:19 -0700

Hi Francisco,

>> I have many drug products leaflets, each corresponding to 1 product. In
the
other hand we have a medical dictionary with about 10^5 terms.
I want to detect all the occurrences of those terms for any leaflet
document.
Take a look at SolrTextTagger for this use case.
https://github.com/OpenSextant/SolrTextTagger


10^5 entries are not that large, I am using it for much larger dictionaries
at the moment with very good results.

Its a project built (at least originally) by David Smiley, who is also
quite active in this group.

-sujit


On Fri, Sep 11, 2015 at 7:29 AM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Assuming the medical dictionary is constant, I would do a copyField of
> text into a separate field and have that separate field use:
>
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
> with words coming from the dictionary (normalized).
>
> That way that new field will ONLY have your dictionary terms from the
> text. Then you can do facet against that field or anything else. Or
> even search and just be a lot more efficient.
>
> The main issue would be a gigantic filter, which may mean speed and/or
> memory issues. Solr has some ways to deal with such large set matches
> by compiling them into a state machine (used for auto-complete), but I
> don't know if that's exposed for your purpose.
>
> But could make a fun custom filter to build.
>
> Regards,
>    Alex.
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 10 September 2015 at 22:21, Francisco Andrés Fernández
> <fra...@gmail.com> wrote:
> > Yes.
> > I have many drug products leaflets, each corresponding to 1 product. In
> the
> > other hand we have a medical dictionary with about 10^5 terms.
> > I want to detect all the occurrences of those terms for any leaflet
> > document.
> > Could you give me a clue about how is the best way to perform it?
> > Perhaps, the best way is (as Walter suggests) to do all the queries every
> > time, as needed.
> > Regards,
> >
> > Francisco
> >
> > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
> > arafa...@gmail.com> escribió:
> >
> >> Can you tell us a bit more about the business case? Not the current
> >> technical one. Because it is entirely possible Solr can solve the
> >> higher level problem out of the box without you doing manual term
> >> comparisons.In which case, your problem scope is not quite right.
> >>
> >> Regards,
> >>    Alex.
> >> ----
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 10 September 2015 at 09:58, Francisco Andrés Fernández
> >> <fra...@gmail.com> wrote:
> >> > Hi all, I'm new to Solr.
> >> > I want to detect all ocurrences of terms existing in a thesaurus into
> 1
> >> or
> >> > more documents.
> >> > What´s the best strategy to make it?
> >> > Doing a query for each term doesn't seem to be the best way.
> >> > Many thanks,
> >> >
> >> > Francisco
> >>
>

Re: Detect term occurrences

Reply via email to