Re: Detect term occurrences

Alexandre Rafalovitch Fri, 11 Sep 2015 07:36:36 -0700

Assuming the medical dictionary is constant, I would do a copyField of
text into a separate field and have that separate field use:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
with words coming from the dictionary (normalized).


That way that new field will ONLY have your dictionary terms from the
text. Then you can do facet against that field or anything else. Or
even search and just be a lot more efficient.

The main issue would be a gigantic filter, which may mean speed and/or
memory issues. Solr has some ways to deal with such large set matches
by compiling them into a state machine (used for auto-complete), but I
don't know if that's exposed for your purpose.

But could make a fun custom filter to build.

Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 10 September 2015 at 22:21, Francisco Andrés Fernández
<fra...@gmail.com> wrote:
> Yes.
> I have many drug products leaflets, each corresponding to 1 product. In the
> other hand we have a medical dictionary with about 10^5 terms.
> I want to detect all the occurrences of those terms for any leaflet
> document.
> Could you give me a clue about how is the best way to perform it?
> Perhaps, the best way is (as Walter suggests) to do all the queries every
> time, as needed.
> Regards,
>
> Francisco
>
> El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
> arafa...@gmail.com> escribió:
>
>> Can you tell us a bit more about the business case? Not the current
>> technical one. Because it is entirely possible Solr can solve the
>> higher level problem out of the box without you doing manual term
>> comparisons.In which case, your problem scope is not quite right.
>>
>> Regards,
>>    Alex.
>> ----
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 10 September 2015 at 09:58, Francisco Andrés Fernández
>> <fra...@gmail.com> wrote:
>> > Hi all, I'm new to Solr.
>> > I want to detect all ocurrences of terms existing in a thesaurus into 1
>> or
>> > more documents.
>> > What´s the best strategy to make it?
>> > Doing a query for each term doesn't seem to be the best way.
>> > Many thanks,
>> >
>> > Francisco
>>

Re: Detect term occurrences

Reply via email to