Thanks again.
For the moment I think it won't be a problem. I have ~500 documents.
Regards,

Francisco

El vie., 11 de sept. de 2015 a la(s) 6:08 p. m., simon <mtnes...@gmail.com>
escribió:

> +1 on Sujit's recommendation: we have a similar use case (detecting drug
> names / disease entities /MeSH terms ) and have been using the
> SolrTextTagger with great success.
>
> We run a separate Solr instance as a tagging  service and add the detected
> tags as metadata fields to a document before it is ingested into our main
> Solr collection.
>
> How many documents/product leaflets do you have ? The tagger is very fast
> at the Solr level but I'm seeing quite a bit of HTTP overhead.
>
> best
>
> -Simon
>
> On Fri, Sep 11, 2015 at 1:39 PM, Sujit Pal <sujit....@comcast.net> wrote:
>
> > Hi Francisco,
> >
> > >> I have many drug products leaflets, each corresponding to 1 product.
> In
> > the
> > other hand we have a medical dictionary with about 10^5 terms.
> > I want to detect all the occurrences of those terms for any leaflet
> > document.
> > Take a look at SolrTextTagger for this use case.
> > https://github.com/OpenSextant/SolrTextTagger
> >
> > 10^5 entries are not that large, I am using it for much larger
> dictionaries
> > at the moment with very good results.
> >
> > Its a project built (at least originally) by David Smiley, who is also
> > quite active in this group.
> >
> > -sujit
> >
> >
> > On Fri, Sep 11, 2015 at 7:29 AM, Alexandre Rafalovitch <
> arafa...@gmail.com
> > >
> > wrote:
> >
> > > Assuming the medical dictionary is constant, I would do a copyField of
> > > text into a separate field and have that separate field use:
> > >
> > >
> >
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
> > > with words coming from the dictionary (normalized).
> > >
> > > That way that new field will ONLY have your dictionary terms from the
> > > text. Then you can do facet against that field or anything else. Or
> > > even search and just be a lot more efficient.
> > >
> > > The main issue would be a gigantic filter, which may mean speed and/or
> > > memory issues. Solr has some ways to deal with such large set matches
> > > by compiling them into a state machine (used for auto-complete), but I
> > > don't know if that's exposed for your purpose.
> > >
> > > But could make a fun custom filter to build.
> > >
> > > Regards,
> > >    Alex.
> > > ----
> > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > http://www.solr-start.com/
> > >
> > >
> > > On 10 September 2015 at 22:21, Francisco Andrés Fernández
> > > <fra...@gmail.com> wrote:
> > > > Yes.
> > > > I have many drug products leaflets, each corresponding to 1 product.
> In
> > > the
> > > > other hand we have a medical dictionary with about 10^5 terms.
> > > > I want to detect all the occurrences of those terms for any leaflet
> > > > document.
> > > > Could you give me a clue about how is the best way to perform it?
> > > > Perhaps, the best way is (as Walter suggests) to do all the queries
> > every
> > > > time, as needed.
> > > > Regards,
> > > >
> > > > Francisco
> > > >
> > > > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre
> > Rafalovitch <
> > > > arafa...@gmail.com> escribió:
> > > >
> > > >> Can you tell us a bit more about the business case? Not the current
> > > >> technical one. Because it is entirely possible Solr can solve the
> > > >> higher level problem out of the box without you doing manual term
> > > >> comparisons.In which case, your problem scope is not quite right.
> > > >>
> > > >> Regards,
> > > >>    Alex.
> > > >> ----
> > > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > >> http://www.solr-start.com/
> > > >>
> > > >>
> > > >> On 10 September 2015 at 09:58, Francisco Andrés Fernández
> > > >> <fra...@gmail.com> wrote:
> > > >> > Hi all, I'm new to Solr.
> > > >> > I want to detect all ocurrences of terms existing in a thesaurus
> > into
> > > 1
> > > >> or
> > > >> > more documents.
> > > >> > What´s the best strategy to make it?
> > > >> > Doing a query for each term doesn't seem to be the best way.
> > > >> > Many thanks,
> > > >> >
> > > >> > Francisco
> > > >>
> > >
> >
>

Reply via email to