Hi Edwin,
I don’t have enough knowledge in eastern languages to know what is expected 
number when you as for sting length. Maybe you can try some of regex unicode 
settings and see if you’ll get what you need: try setting unicode flag with 
(?U) or try using regex groups and ranges. If you provide example string and 
expected length, maybe we could provide you regex.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 4 Jan 2018, at 04:37, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote:
> 
> Hi Emir,
> 
> So this would likely be different from what the operating system counts, as
> the operating system may consider each Chinese characters as 3 to 4 bytes.
> Which is probably why I could not find any record with subject:/.{255,}.*/
> 
> Is there other tools that we can use to query the length for data that are
> already indexed which are not in the standard English language? (Eg:
> Chinese, Japanese, etc)
> 
> Regards,
> Edwin
> 
> On 3 January 2018 at 23:51, Emir Arnautović <emir.arnauto...@sematext.com>
> wrote:
> 
>> Hi Edwin,
>> I do not know, but my guess would be that each character is counted as 1
>> in regex regardless how many bytes it takes in used encoding.
>> 
>> Regards,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 3 Jan 2018, at 16:43, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
>> wrote:
>>> 
>>> Thanks for the reply.
>>> 
>>> I am doing the search on existing data that has already been indexed, and
>>> it is likely to be a one time thing.
>>> 
>>> This  subject:/.{255,}.*/  works for English characters. However, there
>> are
>>> Chinese characters in some of the records. The length seems to be more
>> than
>>> 255, but it does not shows up in the results.
>>> 
>>> Do you know how the length for Chinese characters and other languages are
>>> being determined?
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> 
>>> On 3 January 2018 at 23:01, Alexandre Rafalovitch <arafa...@gmail.com>
>>> wrote:
>>> 
>>>> Do that during indexing as Emir suggested. Specifically, use an
>>>> UpdateRequestProcessor chain, probably with the Clone and FieldLength
>>>> processors: http://www.solr-start.com/javadoc/solr-lucene/org/
>>>> apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html
>>>> 
>>>> Regards,
>>>>  Alex.
>>>> 
>>>> On 31 December 2017 at 22:00, Zheng Lin Edwin Yeo <edwinye...@gmail.com
>>> 
>>>> wrote:
>>>>> Hi,
>>>>> 
>>>>> Would like to check, if it is possible to query a field which has data
>> of
>>>>> more than a certain length?
>>>>> 
>>>>> Like for example, I want to query the field subject that has more than
>>>> 255
>>>>> bytes. Is it possible?
>>>>> 
>>>>> I am currently using Solr 6.5.1.
>>>>> 
>>>>> Regards,
>>>>> Edwin
>>>> 
>> 
>> 

Reply via email to