Sebastian, You may want to try adding autoGeneratePhraseQueries="true" to the fieldtype. With that setting, a query for 978-3-8052-5094-8 will behave just like "978 3 8052 5094 8" (with the quotes)
A few notes about autoGeneratePhraseQueries a) it used to be set to true by default, but that was changed several years ago b) does NOT require a reindex, so very easy to test c) apparently not recommended for non-whitespace delimited languages (CJK, etc), but maybe that's not an issue in your use case. d) i'm unsure how it'll impact wildcard queries on that field. E.g. will 978-3-8052* match 978-3-8052-5094-8? At the very least, partial ISBNs (e.g. 978-3-8052) would match full ISBN without needing to use the wildcard. I'm just not sure what happens if the user includes the wildcard. Josh On Thu, Jan 5, 2017 at 1:41 PM Sebastian Riemer <s.rie...@littera.eu> wrote: > Thank you very much for taking the time to help me! > > I'll definitely have a look at the link you've posted. > > @ShawnHeisey Thanks too for shedding light on the wildcard behaviour! > > Allow me one further question: > - Assuming that I define a separate field for storing the ISBNs, using the > awesome analyzer provider by Mr. Bill Dueber. How do I get that field > copied into my general text field, which is used by my QuickSearch-Input? > Won't that field be processed again by the analyser defined on the text > field? > - Should I alternatively add more fields to the q-Parameter? As for now, I > always have set q=text:<whatever_I_want_to_search_here> but I guess one > could try something like > q=text:<whatever_i_want_to_search>+isbnspeciallookupfield:<whatever_i_want_to_search> > > I don't really know about that last idea though, since the searches are > propably OR-combined which is not what I like to have. > > Third option would be, to pre-process the distinction to where to look at > in the solr in my application of course. I.e. everything being a regex > containing only numbers and hyphens with length 13 -> don't query on field > text, instead use field isbnspeciallookupfield > > > Many thanks again, and have a nice day! > Sebastian > > > -----Ursprüngliche Nachricht----- > Von: Erik Hatcher [mailto:erik.hatc...@gmail.com] > Gesendet: Donnerstag, 5. Januar 2017 19:10 > An: solr-user@lucene.apache.org > Betreff: Re: Search for ISBN-like identifiers > > Sebastian - > > There’s some precedent out there for ISBN’s. Bill Dueber and the > UMICH/code4lib folks have done amazing work, check it out here - > > https://github.com/mlibrary/umich_solr_library_filters < > https://github.com/mlibrary/umich_solr_library_filters> > > - Erik > > > > On Jan 5, 2017, at 5:08 AM, Sebastian Riemer <s.rie...@littera.eu> > wrote: > > > > Hi folks, > > > > > > TL;DR: Is there an easy way, to copy ISBNs with hyphens to the general > text field, respectively configure the analyser on that field, so that a > search for the hyphenated ISBN returns exactly the matching document? > > > > Long version: > > I've defined a field "text" of type "text_general", where I copy all > > my other fields to, to be able to do a "quick search" where I set > > q=text > > > > The definition of the type text_general is like this: > > > > > > > > <fieldType name="text_general" class="solr.TextField" > > positionIncrementGap="100"> > > > > <analyzer type="index"> > > > > <tokenizer class="solr.StandardTokenizerFactory"/> > > > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt" /> > > > > <filter class="solr.LowerCaseFilterFactory"/> > > > > </analyzer> > > > > <analyzer type="query"> > > > > <tokenizer class="solr.StandardTokenizerFactory"/> > > > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt" /> > > > > <filter class="solr.SynonymFilterFactory" > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > > > <filter class="solr.LowerCaseFilterFactory"/> > > > > </analyzer> > > > > </fieldType> > > > > > > I now face the problem, that searching for a book with > > text:978-3-8052-5094-8* does not return the single result I expect. > > However searching for text:9783805250948* instead returns a result. > > Note, that I am adding a wildcard at the end automatically, to further > > broaden the resultset. Note also, that it does not seem to matter > > whether I put backslashes in front of the hyphen or not (to be exact, > > when sending via SolrJ from my application, I put in the backslashes, > > but I don't see a difference when using SolrAdmin as I guess SolrAdmin > > automatically inserts backslashes if needed?) > > > > When storing ISBNs, I do store them twice, once with hyphens > (978-3-8052-5094-8) and once without (9783805250948). A pure phrase search > on both those values return also the single document. > > > > I learned that the StandardTokenizer splits up values from fields at > index time, and I've also learned that I can use the solrAdmin analysis and > the debugQuery to help understand what is going on. From the analysis > screen I see, that given the value 9783805250948 at index-time and > 9783805250948* query-time both leads to an unchanged value 9783805250948 at > the end. > > When given the value 978-3-8052-5094-8 for "Field Value (Index)" and > 978-3-8052-5094-8* for "Field Value (Query)" I can see how the ISBN is > tokenized into 5 parts. Again, the values match on both sides (Index and > Query). > > > > How does the left side correlate with the right side? My guess: The left > side means, "Values stored in field text will be tokenized while indexing > as show here on the left". The right side means, "When querying on the > field text, I'll tokenize the entered value like this, and see if I find > something on the index" Is this correct? > > > > Another question: when querying and investigating the single document in > solrAdmin, the contents I see In the column text represents the _stored_ > value of the field text, right? > > And am I correct that this actually has nothing to do, with what is > actually stored in the index for searching? > > > > When storing the value 978-3-8052-5094-8, are only the tokenized values > stored for search, or is the "whole word" also stored? Is there a way to > actually see all the values which are stored for search? > > When searching text:" 978-3-8052-5094-8" I get the single result, so I > guess the value as a whole must also be stored in the index for searching? > > > > One more thing which confuses me: > > Searching for text: 978-3-8052-5094-8 gives me 72 results, because it > > leads to searching for "parsedquery_toString":"text:978 text:3 > > text:8052 text:5094 text:8", but searching for text: > > 978-3-8052-5094-8* gives me 0 results, this leads to > > "parsedquery_toString":"text:978-3-8052-5094-8*", > > > > Why is the appended wildcard changing the behaviour so radically? I'd > rather expect to get something like "parsedquery_toString":"text:978 text:3 > text:8052 text:5094 text:8*", and thus even more results. > > > > Btw. I've found and read an interesting blog about storing ISBNs and > alikes here: > http://robotlibrarian.billdueber.com/2012/03/solr-field-type-for-numericish-ids/ > However, I already store my ISBN also in a separate field, of type string, > which works fine when I use this field for searching. > > > > Best regards, sorry for the enormously long question and thank you for > listening. > > > > Sebastian > >