Hi Roman, Following is my use case:
*Schema.xml*... <field name="name" type="text_autophrase" indexed="true" stored="true"/> <fieldType name="text_autophrase" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory" /> <filter class="com.lucidworks.analysis.AutoPhrasingTokenFilterFactory" phrases="autophrases.txt" includeTokens="false" replaceWhitespaceWith="X" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> </analyzer> </fieldType> *SolrConfig.xml...* name="/autophrase" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> <int name="rows">10</int> <str name="df">name</str> <str name="defType">autophrasingParser</str> </lst> </requestHandler> <queryParser name="autophrasingParser" class="com.lucidworks.analysis.AutoPhrasingQParserPlugin" > <str name="phrases">autophrases.txt</str> <str name="replaceWhitespaceWith">X</str> </queryParser> *Synonyms.txt....* PEG-20 SORBITAN LAURATE,POLYOXYETHYLENE 20 SORBITAN MONOLAURATE,TWEEN 20,POLYSORBATE 20 [USAN],POLYSORBATE 20 [INCI],POLYSORBATE 20 [II],POLYSORBATE 20 [HSDB],TWEEN-20,PEG-20 SORBITAN,PEG-20 SORBITAN [VANDF],POLYSORBATE-20,POLYSORBATE 20,SORETHYTAN MONOLAURATE,T-MAZ 20,POLYOXYETHYLENE (20) SORBITAN MONOLAURATE,SORBITAN MONODODECANOATE,POLY(OXY-1,2-ETHANEDIYL) DERIVATIVE,POLYOXYETHYLENE SORBITAN MONOLAURATE,POLYSORBATE 20 [MART.],SORBIMACROGOL LAURATE 300,POLYSORBATE 20 [FHFI],FEMA NO. 2915,POLYSORBATE 20 [FCC],POLYSORBATE 20 [WHO-DD],POLYSORBATE 20 [VANDF] *Autophrase.txt...* Has all the above phrases in one column *Indexed document....* <doc> <field name="id">31</field> <field name="name">Polysorbate 20</field> </doc> So when I query SOLR /autphrase for tween 20 or FEMA NO. 2915, I expect to see the record containig Polysorbate 20. i.e. http://localhost:8983/solr/collection1/autophrase?q=tween+20&wt=json&indent=true should have retrieved it; but it doesnt. What could I be doing wrong? On Wed, Apr 29, 2015 at 2:10 AM, Roman Chyla <roman.ch...@gmail.com> wrote: > I'm not sure I understand - the autophrasing filter will allow the > parser to see all the tokens, so that they can be parsed (and > multi-token synonyms) identified. So if you are using the same > analyzer at query and index time, they should be able to see the same > stuff. > > are you using multi-token synonyms, or just entries that look like > multi synonym? (in the first case, the tokens are separated by null > byte) - in the second case, they are just strings even with > whitespaces, your synonym file must contain exactly the same entries > as your analyzer sees them (and in the same order; or you have to use > the same analyzer to load the synonym files) > > can you post the relevant part of your schema.xml? > > > note: I can confirm that multi-token synonym expansion can be made to > work, even in complex cases - we do it - but likely, if you need > multi-token synonyms, you will also need a smarter query parser. > sometimes your users will use query strings that contain overlapping > synonym entries, to handle that, you will have to know how to generate > all possible 'reads', example > > synonym: > > foo bar, foobar > hey foo, heyfoo > > user input: > > hey foo bar > > possible readings: > > ((hey foo) +bar) OR (hey +(foo bar)) > > i'm simplifying it here, the fun starts when you are seeing a phrase query > :) > > On Tue, Apr 28, 2015 at 10:31 AM, Kaushik <kaushika...@gmail.com> wrote: > > Hi there, > > > > I tried the solution provided in > > > https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ > > .The mentioned solution works when the indexed data does not have alpha > > numerics or special characters. But in my case the synonyms are > something > > like the below. > > > > > > T-MAZ 20 POLYOXYETHYLENE (20) SORBITAN MONOLAURATE SORBITAN > > MONODODECANOATE POLY(OXY-1,2-ETHANEDIYL) DERIVATIVE POLYOXYETHYLENE > > SORBITAN MONOLAURATE POLYSORBATE 20 [MART.] SORBIMACROGOL LAURATE > > 300 POLYSORBATE > > 20 [FHFI] FEMA NO. 2915 > > > > They have alpha numerics, special characters, spaces, etc. Is there a way > > to implment synonyms even in such case? > > > > Thanks, > > Kaushik > > > > On Mon, Apr 20, 2015 at 11:03 AM, Davis, Daniel (NIH/NLM) [C] < > > daniel.da...@nih.gov> wrote: > > > >> Handling MESH descriptor preferred terms and such is similar. I > >> encountered this during evaluation of Solr for a project here at NLM. > We > >> decided to use Solr for different projects instead. I considered the > >> following approaches: > >> - use a custom tokenizer at index time that indexed all of the multiple > >> term alternatives. > >> - index the data, and then have an enrichment process that queries on > >> each source synonym, and generates an update to add the target synonyms. > >> Follow this with an optimize. > >> - During the indexing process, but before sending the data to Solr, > >> process the data to tokenize and add synonyms to another field. > >> > >> Both the custom tokenizer and enrichment process share the feature that > >> they use Solr's own tokenizer rather than duplicate it. The enrichment > >> process seems to me only workable in environments where you can re-index > >> all data periodically, so no continuous stream of data to index that > needs > >> to be handled relatively quickly once it is generated. The last > method > >> of pre-processing the data seems the least desirable to me from a > blue-sky > >> perspective, but is probably the easiest to implement and the most > >> independent of Solr. > >> > >> Hope this helps, > >> > >> Dan Davis, Systems/Applications Architect (Contractor), > >> Office of Computer and Communications Systems, > >> National Library of Medicine, NIH > >> > >> -----Original Message----- > >> From: Kaushik [mailto:kaushika...@gmail.com] > >> Sent: Monday, April 20, 2015 10:47 AM > >> To: solr-user@lucene.apache.org > >> Subject: Mutli term synonyms > >> > >> Hello, > >> > >> Reading up on synonyms it looks like there is no real solution for multi > >> term synonyms. Is that right? I have a use case where I need to map one > >> multi term phrase to another. i.e. Tween 20 needs to be translated to > >> Polysorbate 40. > >> > >> Any thoughts as to how this can be achieved? > >> > >> Thanks, > >> Kaushik > >> >