Re: question regarding wildcard-searches

Erick Erickson Fri, 16 Mar 2018 09:08:38 -0700

If you goal is to search prefixes only, I'd go away from the _text_
field all together and use a "string" type. This will mean you need to
1> make it multiValued=true
2> split this up (either on your client or use a
FieldMutatingUpdateProcessor, probably RegexReplaceProcessorFactory)
into separate entries, i.e.
'EO.1954.53.1', 'EO.1954.53.2', EO.1954.53.3'
becomes three separate entries in the field
'EO.1954.53.1'
'EO.1954.53.2'
'EO.1954.53.3'


At that point, searches like: 'EO.1954.53.*'

will work just fine. NOTE: String types do zero analysis, so you have
to handle things like casing yourself. That is, 'eO.1954.53.*' would
_not_ match. You can probably use something like
KeywordTokenizerFactory + LowerCaseFilterFactory in that case.

All this makes _much_ more sense if you use the admin UI>>analysis
page (probably uncheck the "verbose" checkbox, there'll be less
clutter").

Best,
Erick

On Fri, Mar 16, 2018 at 8:35 AM, Emir Arnautović
<emir.arnauto...@sematext.com> wrote:
> Hi Roel,
> As mentioned, _text_ field probably does not contain complete “EO.1954.53.1” 
> but only its parts. You can verify that using snalysis screen in admin 
> console. What you can try is searching for phrase without wildcard 
> “EO.1954.53” or if you are using WordDelimiterTokenFilter in your analysis 
> chain, you can set preserveOriginal=“1” and reindex.
>
> Can you share how your text_general looks like.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 16 Mar 2018, at 14:05, Paesen Roel <roel.pae...@africamuseum.be> wrote:
>>
>> Hi,
>>
>> Unfortunately that also gives no results (and it would not be practical, as 
>> for this example the numbering only goes up till 19 but others go up into 
>> the thousands etc)
>>
>> Anybody with a pointer on this?
>>
>> Thanks already,
>> Roel
>>
>>
>> -----Original Message-----
>> From: jagdish vasani [mailto:jagdisht.vas...@gmail.com]
>> Sent: vrijdag 16 maart 2018 12:41
>> To: solr-user@lucene.apache.org
>> Subject: Re: question regarding wildcard-searches
>>
>> Hi paesen,
>>
>> Value - EO.1954.53.1 is indexed as below Eo
>> 1954
>> 53
>> 1
>> Dot is removed.try with wildcard -?
>> Like EO.1954.53.?? If you have 2 digits only in last..
>>
>> I have not tried but you just check it.
>> Hope it will solve your problem.
>>
>> Thanks,
>> Jagdish
>> On 16-Mar-2018 3:51 pm, "Paesen Roel" <roel.pae...@africamuseum.be> wrote:
>>
>>> Hi everybody,
>>>
>>> We are experimenting with solr, and I have a (I think) basic-level
>>> question:
>>> we have a multiple fields, all copied into a generic field so we can
>>> search everything at once.
>>> However we have a (for us) strange situation doing wildcard searches
>>> for the contents of one specific field.
>>>
>>> Given in the schema:
>>>
>>> <field name="_text_" type="text_general" indexed="true" stored="false"
>>> multiValued="true"/>
>>>
>>> <field name="genormaliseerdInventarisnummer" type="string" indexed="true"
>>> stored="true"/>
>>> <copyField source="genormaliseerdInventarisnummer" dest="_text_" />
>>> and lot of other fields exactly like 'genormaliseerdInventarisnummer'.
>>>
>>>
>>> Now, we are certain that the field 'genormaliseerdInventarisnummer'
>>> contains entries like 'EO.1954.53.1', 'EO.1954.53.2', EO.1954.53.3',
>>> all the way up to '.19', we can query these directly by passing these
>>> exact texts to the query on field '_text_' (our default search field).
>>> Problem is: wildcard searches for these don't work, like 'EO.1954.53.*'
>>> for example returns zero results.
>>>
>>> Why is that?
>>> What needs to be adjusted? (and how?)
>>>
>>> Thanks already,
>>> Roel
>>>
>>>
>

Re: question regarding wildcard-searches

Reply via email to