Two other things I noticed: 1. You probably don't want to store your copyFields. That's literally going to be the same information each time.
2. Your expectation "the pre-processed version of the text is added to the index" may be incorrect. Anything done in <analyzer type="query"> sections actually happens at query time. Not sure if that's significant for you. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions <https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> w: appinions.com <http://www.appinions.com/> On Wed, Mar 25, 2015 at 4:27 PM, Ahmet Arslan <iori...@yahoo.com.invalid> wrote: > Hi Martin, > > fq means filter query. May be you want to use qf (query fields) parameter > of edismax? > > > > On Wednesday, March 25, 2015 9:23 PM, Martin Wunderlich <martin...@gmx.net> > wrote: > Hi all, > > I am wondering what the process is for applying Tokenizers and Filter (as > defined in the FieldType definition) to field contents that result from > CopyFields. To be more specific, in my Solr instance, Iwould like to > support query expansion by two means: removing stop words and adding > inflected word forms as synonyms. > > To use a specific example, let’s say I have the following sentence to be > indexed (from a Wittgenstein manuscript): > > "Was zum Wesen der Welt gehört, kann die Sprache nicht ausdrücken.“ > > > This sentence will be indexed in a field called „original“ that is defined > as follows: > > <field name="original" type="text_original" indexed="true" stored="true" > required="true“/> > > <fieldType name="text_windex_original" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > </analyzer> > </fieldType> > > > Then, in order to create fields for the two types of query expansion, I > have set up specific fields for this: > > - one field where stopwords are removed both on the indexed content and > the query. So, if the users is searching for a phrase like „der Sprache“, > Solr should still find the segment above, because the determiners („der“ > and „die“) are removed prior to indexing and prior to querying, > respectively. This field is defined as follows: > > <field name="stopwords_removed" type="text_stopwords_removed" > indexed="true" stored="true" required="true“/> > > <fieldType name="text_stopwords_removed" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words=„stopwords_de.txt" format="snowball"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords_de.txt" format="snowball"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > > - a second field where synonyms are added to the query so that more > segments will be found. For instance, if the user is searching for the > plural form „Sprachen“, Solr should return the segment above, due to this > entry in the synonyms file: "Sprache,Sprach,Sprachen“. This field is > defined as follows: > > <field name="expanded" type="text_multiplied" indexed="true" stored="true" > required="true“/>expanded > > <fieldType name="text_expanded" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords_de.txt" format="snowball"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords_de.txt" format="snowball"/> > <filter class="solr.SynonymFilterFactory" > synonyms="synonyms_de.txt" ignoreCase="true" expand="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > Finally, to avoid having to specify three fields with identical content in > the import documents, I am defining the two fields for query expansion as > copyFields: > > <copyField source="original" dest="stopwords_removed"/> > <copyField source="original" dest="expanded“/> > > Now, my expectation would be as follows: > - during import, two temporary fields are created by copying content from > the original field > - these two temporary fields are then pre-processed as per the definitions > above > - the pre-processed version of the text is added to the index > - then, the user can search for „Sprache“, „sprache“, „Sprachen“ or „der > Sprache“ and will always get the segment above as a matching result. > > However, what happens actually is that I get matches only for „Sprache“ > and „sprache“. > > The other thing that strikes as odd, is that when I restrict the search to > one of the fields only using the „fq“ parameter, I get no results. For > instance: > > http://localhost:8983/solr/windex/select?q=Sprache&fq=original&wt=json&indent=true > < > http://localhost:8983/solr/windex/select?q=Sprache&fq=original&wt=json&indent=true > > > > will return no matches. I would expected that using the fq parameter the > user can specify what type of search (s)he would like to carry out: A > standard search (field original) or an expanded search (one of the other > two fields). > > For debugging, I have checked the analysis and results seem ok (posted > below). > Apologies for the long post, but I am really a bit stuck here (even after > doing a lot of reading and googling). It is probably something simple that > I missing. > Thanks a lot in advance for any help. > > Cheers, > > Martin > > > ST > Was > zum > Wesen > > der > Welt > gehört > kann > die > Sprache > nicht > ausdrücken > SF > Was > zum > Wesen > > Welt > gehört > kann > die > Sprache > nicht > ausdrücken > LCF > was > zum > wesen > > welt > gehört > kann > die > sprache > nicht > ausdrücken >