Maybe they work properly and the regex is not as expected?
> Am 20.02.2019 um 08:12 schrieb Zheng Lin Edwin Yeo <edwinye...@gmail.com>: > > Hi, > > Thanks for the reply. > > Do you know of any regex online tool that works correctly for Java regex? > I tried to find some, but they are not working properly. > > Yes, our plan is to replace more than one \n with <br><br>, and single \n > with single <br>. > > Regards, > Edwin > >> On Wed, 20 Feb 2019 at 14:59, Jörn Franke <jornfra...@gmail.com> wrote: >> >> Solr uses Java regex matching, so i doubt there is a bug - it would then >> be in the JDK. Try out in a regex online Tool that supports Java regex for >> your solution. >> >> I believe you want to have 2 regex process factories: >> One that deals with single \n and one that deals with more than one \n >> >>> Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo <edwinye...@gmail.com >>> : >>> >>> Hi, >>> >>> We have tried with the following pattern ([ \t]*\r?\n){2,} and >>> configuration: >>> >>> <processor class="solr.RegexReplaceProcessorFactory"> >>> <str name="fieldName">content</str> >>> <str name="pattern">([ \t]*\r?\n){2,}</str> >>> <str name="replacement"><br><br></str> >>> <bool name="literalReplacement">true</bool> >>> </processor> >>> >>> However, the issue is still occurring. >>> >>> Anyone else is able to help? >>> >>> Regards, >>> Edwin >>> >>> On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo <edwinye...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> For your info, this issue is occurring in Solr 7.7.0 as well. >>>> >>>> Regards, >>>> Edwin >>>> >>>> On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo <edwinye...@gmail.com >>> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> Should we report this as a bug in Solr? >>>>> >>>>> Regards, >>>>> Edwin >>>>> >>>>> On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo <edwinye...@gmail.com >>> >>>>> wrote: >>>>> >>>>>> Hi Paul, >>>>>> >>>>>> Regarding the regex (\n\s*){2,} that we are using, when we try in on >>>>>> https://regex101.com/, it is able to give us the correct result for >> all >>>>>> the examples (ie: All of them will only have <br><br>, and not more >> than >>>>>> that like what we are getting in Solr in our earlier examples). >>>>>> >>>>>> Could there be a possibility of a bug in Solr? >>>>>> >>>>>> Regards, >>>>>> Edwin >>>>>> >>>>>> On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo < >> edwinye...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Paul, >>>>>>> >>>>>>> We have tried it with the space preceeding the \n i.e. <str >>>>>>> name="pattern">(\s*\n){2,}</str>, with the following regex pattern: >>>>>>> >>>>>>> <processor class="solr.RegexReplaceProcessorFactory"> >>>>>>> <str name="fieldName">content</str> >>>>>>> <str name="pattern">(\s*\n){2,}</str> >>>>>>> <str name="replacement"><br><br></str> >>>>>>> </processor> >>>>>>> >>>>>>> However, we are also getting the exact same results as the earlier >>>>>>> Example 1, 2 and 3. >>>>>>> >>>>>>> As for your point 2 on perhaps in the data you have other (non >>>>>>> printing) characters than \n, we have find that there are no non >> printing >>>>>>> characters. It is just next line with a space. You can refer to the >>>>>>> original content in the same examples below. >>>>>>> >>>>>>> >>>>>>> Example 1: The sentence that the above regex pattern is working >>>>>>> correctly >>>>>>> *Original content in EML file:* >>>>>>> Dear Sir, >>>>>>> >>>>>>> >>>>>>> I am terminating >>>>>>> *Original content:* Dear Sir, \n\n \n \n\n I am terminating >>>>>>> *Index content: * Dear Sir, <br><br>I am terminating >>>>>>> >>>>>>> Example 2: The sentence that the above regex pattern is partially >>>>>>> working (as you can see, instead of 2 <br>, there are 4 <br>) >>>>>>> *Original content in EML file:* >>>>>>> >>>>>>> *exalted* >>>>>>> >>>>>>> *Psalm 89:17* >>>>>>> >>>>>>> >>>>>>> 3 Choa Chu Kang Avenue 4 >>>>>>> *Original content:* exalted \n \n\n Psalm 89:17 \n\n \n\n 3 >>>>>>> Choa Chu Kang Avenue 4, Singapore >>>>>>> *Index content: *exalted <br><br>Psalm 89:17 <br><br> <br><br>3 >>>>>>> Choa Chu Kang Avenue 4, Singapore >>>>>>> >>>>>>> Example 3: The sentence that the above regex pattern is partially >>>>>>> working (as you can see, instead of 2 <br>, there are 4 <br>) >>>>>>> *Original content in EML file:* >>>>>>> >>>>>>> http://www.concordpri.moe.edu.sg/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Dec 18, 2018 at 10:07 AM >>>>>>> *Original content:* http://www.concordpri.moe.edu.sg/ \n\n \n\n >> \n >>>>>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n On Tue, >> Dec 18, >>>>>>> 2018 at 10:07 AM >>>>>>> *Index content: *http://www.concordpri.moe.edu.sg/ <br><br> >>>>>>> <br><br>On Tue, Dec 18, 2018 at 10:07 AM >>>>>>> >>>>>>> >>>>>>> Appreciate any other ideas or suggestions that you may have. >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> Regards, >>>>>>> Edwin >>>>>>> >>>>>>>> On Thu, 7 Feb 2019 at 22:49, <paul.d...@ub.unibe.ch> wrote: >>>>>>>> >>>>>>>> Hi Edwin >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 1. Sorry, the pattern was wrong, the space should preceed the \n >>>>>>>> i.e. <str name="pattern">(\s*\n){2,}</str> >>>>>>>> 2. Perhaps in the data you have other (non printing) characters >>>>>>>> than \n? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> >> für >>>>>>>> Windows 10 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com> >>>>>>>> Gesendet: Donnerstag, 7. Februar 2019 15:23 >>>>>>>> An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> >>>>>>>> Betreff: Re: RegexReplaceProcessorFactory pattern to detect >> multiple \n >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi Paul, >>>>>>>> >>>>>>>> We have tried this suggested regex pattern as follow: >>>>>>>> <processor class="solr.RegexReplaceProcessorFactory"> >>>>>>>> <str name="fieldName">content</str> >>>>>>>> <str name="pattern">(\n\s*){2,}</str> >>>>>>>> <str name="replacement"><br><br></str> >>>>>>>> </processor> >>>>>>>> >>>>>>>> But we still have exactly the same problem of Example 1,2 and 3 >> below. >>>>>>>> >>>>>>>> Example 1: The sentence that the above regex pattern is working >>>>>>>> correctly >>>>>>>> *Original content:* Dear Sir, \n\n \n \n\n I am terminating >>>>>>>> *Index content: * Dear Sir, <br><br>I am terminating >>>>>>>> >>>>>>>> Example 2: The sentence that the above regex pattern is partially >>>>>>>> working >>>>>>>> (as you can see, instead of 2 <br>, there are 4 <br>) >>>>>>>> *Original content:* exalted \n \n\n Psalm 89:17 \n\n \n\n 3 >>>>>>>> Choa >>>>>>>> Chu Kang Avenue 4, Singapore >>>>>>>> *Index content: *exalted <br><br>Psalm 89:17 <br><br> <br><br>3 >>>>>>>> Choa >>>>>>>> Chu Kang Avenue 4, Singapore >>>>>>>> >>>>>>>> Example 3: The sentence that the above regex pattern is partially >>>>>>>> working >>>>>>>> (as you can see, instead of 2 <br>, there are 4 <br>) >>>>>>>> *Original content:* http://www.concordpri.moe.edu.sg/ \n\n \n\n >>>>>>>> \n \n\n >>>>>>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n On Tue, Dec >> 18, >>>>>>>> 2018 >>>>>>>> at 10:07 AM >>>>>>>> *Index content: *http://www.concordpri.moe.edu.sg/ <br><br> >>>>>>>> <br><br>On >>>>>>>> Tue, Dec 18, 2018 at 10:07 AM >>>>>>>> >>>>>>>> Any further suggestion? >>>>>>>> >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Edwin >>>>>>>> >>>>>>>>> On Thu, 7 Feb 2019 at 22:20, <paul.d...@ub.unibe.ch> wrote: >>>>>>>>> >>>>>>>>> To avoid the «\n+\s*» matching too many \n and then failing on the >>>>>>>> {2,} >>>>>>>>> part you could try >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> <str name="pattern">(\n\s*){2,}</str> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> If you also want to match CRLF then >>>>>>>>> >>>>>>>>> <str name="pattern">(\r?\n\s*){2,}</str> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> >>>>>>>> für >>>>>>>>> Windows 10 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com> >>>>>>>>> Gesendet: Donnerstag, 7. Februar 2019 15:10 >>>>>>>>> An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org >>> >>>>>>>>> Betreff: Re: RegexReplaceProcessorFactory pattern to detect >> multiple >>>>>>>> \n >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Paul, >>>>>>>>> >>>>>>>>> Thanks for your reply. >>>>>>>>> >>>>>>>>> When I use this pattern: >>>>>>>>> <processor class="solr.RegexReplaceProcessorFactory"> >>>>>>>>> <str name="fieldName">content</str> >>>>>>>>> <str name="pattern">(\n+\s*){2,}</str> >>>>>>>>> <str name="replacement"><br><br></str> >>>>>>>>> </processor> >>>>>>>>> >>>>>>>>> It is working for some sentence within the same content and not >>>>>>>> working for >>>>>>>>> some sentences. Please see below for the one that is working and >>>>>>>> another >>>>>>>>> that is not working (partially working): >>>>>>>>> >>>>>>>>> Example 1: The sentence that the above regex pattern is working >>>>>>>> correctly >>>>>>>>> *Original content:* Dear Sir, \n\n \n \n\n I am terminating >>>>>>>>> *Index content: * Dear Sir, <br><br>I am terminating >>>>>>>>> >>>>>>>>> Example 2: The sentence that the above regex pattern is partially >>>>>>>> working >>>>>>>>> (as you can see, instead of 2 <br>, there are 4 <br>) >>>>>>>>> *Original content:* exalted \n \n\n Psalm 89:17 \n\n \n\n 3 >>>>>>>> Choa >>>>>>>>> Chu Kang Avenue 4, Singapore >>>>>>>>> *Index content: *exalted <br><br>Psalm 89:17 <br><br> <br><br>3 >>>>>>>> Choa >>>>>>>>> Chu Kang Avenue 4, Singapore >>>>>>>>> >>>>>>>>> Example 3: The sentence that the above regex pattern is partially >>>>>>>> working >>>>>>>>> (as you can see, instead of 2 <br>, there are 4 <br>) >>>>>>>>> *Original content:* http://www.concordpri.moe.edu.sg/ \n\n >> \n\n >>>>>>>> \n >>>>>>>>> \n\n >>>>>>>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n On Tue, Dec >>>>>>>> 18, 2018 >>>>>>>>> at 10:07 AM >>>>>>>>> *Index content: *http://www.concordpri.moe.edu.sg/ <br><br> >>>>>>>> <br><br>On >>>>>>>>> Tue, Dec 18, 2018 at 10:07 AM >>>>>>>>> >>>>>>>>> We would appreciate your help to see what is wrong? >>>>>>>>> >>>>>>>>> Thank you. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Edwin >>>>>>>>> >>>>>>>>>> On Thu, 7 Feb 2019 at 21:24, <paul.d...@ub.unibe.ch> wrote: >>>>>>>>>> >>>>>>>>>> You don’t say what happens, just that it is not working. I assume >>>>>>>> nothing >>>>>>>>>> is replaced? Perhaps the pattern should be >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> <str name="pattern">"(\n\s*){2,}"</str> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ?? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> >>>>>>>> für >>>>>>>>>> Windows 10 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com> >>>>>>>>>> Gesendet: Donnerstag, 7. Februar 2019 14:08 >>>>>>>>>> An: solr-user@lucene.apache.org<mailto: >> solr-user@lucene.apache.org >>>>>>>>> >>>>>>>>>> Betreff: RegexReplaceProcessorFactory pattern to detect multiple >> \n >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I am trying to use the RegexReplaceProcessorFactory to remove more >>>>>>>> than >>>>>>>>> two >>>>>>>>>> \n with any number of spaces between them (Eg: \n\n, \n \n, \n \n >>>>>>>> \n >>>>>>>>> \n), >>>>>>>>>> and replace it with two <br>. >>>>>>>>>> >>>>>>>>>> I use the following regex pattern and it is working when I test it >>>>>>>> in >>>>>>>>>> regex101.com. But it is not working when I put it inside the >>>>>>>>>> RegexReplaceProcessorFactory as below: >>>>>>>>>> >>>>>>>>>> <updateRequestProcessorChain name="removeCode"> >>>>>>>>>> <processor class="solr.RegexReplaceProcessorFactory"> >>>>>>>>>> <str name="fieldName">content</str> >>>>>>>>>> <str name="pattern">"(\\n\s*){2,}"</str> >>>>>>>>>> <str name="replacement"><br><br></str> >>>>>>>>>> </processor> >>>>>>>>>> </updateRequestProcessorChain> >>>>>>>>>> >>>>>>>>>> To explain further about my regex pattern, \s* is instructing the >>>>>>>> regex >>>>>>>>> to >>>>>>>>>> match any \n that have space after and {2,} is instructing the >>>>>>>> regex to >>>>>>>>>> match 2 or more occurrence of such pattern (\n). >>>>>>>>>> >>>>>>>>>> Please kindly let me know what is wrong and how should I do it? >>>>>>>>>> >>>>>>>>>> I am using Solr 7.6.0. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Edwin >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>