A regex search at query time would leave room for attacks (eg a regex can 
easily be designed to block the Solr server forever).

If the field is store you can also try to use a cursor to go through all 
entries using a cursor and reindex the doc based on the field:

https://lucene.apache.org/solr/guide/8_4/pagination-of-results.html

This would also imply that you have the other fields stored. Otherwise reindex.
You can do this in parallel to the existing index and once finished simply 
change the alias for the collection (that would be without any downtime for the 
users but you require of course corresponding space).

> Am 28.07.2020 um 21:06 schrieb lstusr 5u93n4 <lstusr...@gmail.com>:
> 
> Possible... yes. Agreed that this is the right approach. But if we already
> have a big index that we're searching through? Any way to "hack it"?
> 
>> On Tue, 28 Jul 2020 at 14:55, Walter Underwood <wun...@wunderwood.org>
>> wrote:
>> 
>> I’d do that at index time. Add an update request processor script that
>> does the regex and adds a field has_credit_card_number:true.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>>> On Jul 28, 2020, at 11:50 AM, lstusr 5u93n4 <lstusr...@gmail.com> wrote:
>>> 
>>> Let's say I have a text field that's been indexed with the standard
>>> tokenizer, and I want to match the docs that have credit card numbers in
>>> them (this is for altruistic purposes, not nefarious ones!). What's the
>>> best way to build a search that will do this?
>>> 
>>> Searching for "???? ???? ???? ????" seems to return inconsistent results.
>>> 
>>> Maybe a regex search? "[0-9]{4}?[0-9]{4}?[0-9]{4}?[0-9]{4}" seems like it
>>> should work, but that's not matching the docs I think it should either...
>>> 
>>> Any suggestions?
>>> 
>>> Thanks In Advance!
>> 
>> 

Reply via email to