Hi,

The goal is to provide a google-like search field for our databases, (one 
simple searchfield on a webpage) that is why we copy everything into the _text_ 
field, so that everything is searchable. (is there a better way to achieve 
something like this?)

I should have been more clear before, but the different numbers I gave as 
example are all different solr-documents, with only 1 number per solr-document, 
so there is no need (for this field) to be multi-valued. Sorry about that.

Here is my text_general definition (which is a direct copy from the DIH-example 
that comes with solr 7.2.1):
-----------8<------------
    <fieldType name="text_general" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymGraphFilterFactory" 
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.FlattenGraphFilterFactory"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" />
        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
-----------8<------------

In the analysis screen, I see that indeed the text gets broken down to 'EO' 
(alphanumeric), and '1954.53.1' (numeric).
Searching without wildcard also returns zero results...

As I mentioned before: we are testing this all, so we are not really up to 
speed with the why-does-this-do-that, although I am trying to learn.

Thanks for any other pointers you can provide.
Greetings,
Roel

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: vrijdag 16 maart 2018 17:08
To: solr-user
Subject: Re: question regarding wildcard-searches

If you goal is to search prefixes only, I'd go away from the _text_ field all 
together and use a "string" type. This will mean you need to
1> make it multiValued=true
2> split this up (either on your client or use a
FieldMutatingUpdateProcessor, probably RegexReplaceProcessorFactory) into 
separate entries, i.e.
'EO.1954.53.1', 'EO.1954.53.2', EO.1954.53.3'
becomes three separate entries in the field 'EO.1954.53.1'
'EO.1954.53.2'
'EO.1954.53.3'

At that point, searches like: 'EO.1954.53.*'

will work just fine. NOTE: String types do zero analysis, so you have to handle 
things like casing yourself. That is, 'eO.1954.53.*' would _not_ match. You can 
probably use something like KeywordTokenizerFactory + LowerCaseFilterFactory in 
that case.

All this makes _much_ more sense if you use the admin UI>>analysis page 
(probably uncheck the "verbose" checkbox, there'll be less clutter").

Best,
Erick

On Fri, Mar 16, 2018 at 8:35 AM, Emir Arnautović <emir.arnauto...@sematext.com> 
wrote:
> Hi Roel,
> As mentioned, _text_ field probably does not contain complete “EO.1954.53.1” 
> but only its parts. You can verify that using snalysis screen in admin 
> console. What you can try is searching for phrase without wildcard 
> “EO.1954.53” or if you are using WordDelimiterTokenFilter in your analysis 
> chain, you can set preserveOriginal=“1” and reindex.
>
> Can you share how your text_general looks like.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 16 Mar 2018, at 14:05, Paesen Roel <roel.pae...@africamuseum.be> wrote:
>>
>> Hi,
>>
>> Unfortunately that also gives no results (and it would not be 
>> practical, as for this example the numbering only goes up till 19 but 
>> others go up into the thousands etc)
>>
>> Anybody with a pointer on this?
>>
>> Thanks already,
>> Roel
>>
>>
>> -----Original Message-----
>> From: jagdish vasani [mailto:jagdisht.vas...@gmail.com]
>> Sent: vrijdag 16 maart 2018 12:41
>> To: solr-user@lucene.apache.org
>> Subject: Re: question regarding wildcard-searches
>>
>> Hi paesen,
>>
>> Value - EO.1954.53.1 is indexed as below Eo
>> 1954
>> 53
>> 1
>> Dot is removed.try with wildcard -?
>> Like EO.1954.53.?? If you have 2 digits only in last..
>>
>> I have not tried but you just check it.
>> Hope it will solve your problem.
>>
>> Thanks,
>> Jagdish
>> On 16-Mar-2018 3:51 pm, "Paesen Roel" <roel.pae...@africamuseum.be> wrote:
>>
>>> Hi everybody,
>>>
>>> We are experimenting with solr, and I have a (I think) basic-level
>>> question:
>>> we have a multiple fields, all copied into a generic field so we can 
>>> search everything at once.
>>> However we have a (for us) strange situation doing wildcard searches 
>>> for the contents of one specific field.
>>>
>>> Given in the schema:
>>>
>>> <field name="_text_" type="text_general" indexed="true" stored="false"
>>> multiValued="true"/>
>>>
>>> <field name="genormaliseerdInventarisnummer" type="string" indexed="true"
>>> stored="true"/>
>>> <copyField source="genormaliseerdInventarisnummer" dest="_text_" /> 
>>> and lot of other fields exactly like 'genormaliseerdInventarisnummer'.
>>>
>>>
>>> Now, we are certain that the field 'genormaliseerdInventarisnummer'
>>> contains entries like 'EO.1954.53.1', 'EO.1954.53.2', EO.1954.53.3', 
>>> all the way up to '.19', we can query these directly by passing 
>>> these exact texts to the query on field '_text_' (our default search field).
>>> Problem is: wildcard searches for these don't work, like 'EO.1954.53.*'
>>> for example returns zero results.
>>>
>>> Why is that?
>>> What needs to be adjusted? (and how?)
>>>
>>> Thanks already,
>>> Roel
>>>
>>>
>

Reply via email to