On 11/30/22 13:44, Matthew Castrigno wrote:
Using SOLR 9.0 and the ScriptUpdatProcesor, it appears SOLR is erroneously adding "  
,&#8203 " in the middle of a string field.

The script just logs the fields. If you compare the curl request with what is 
logged you see the addition of many instances of ,&#8203  in the content field.

This just happens on the logging tab of the admin UI.  In the javascript file at server/solr-webapp/webapp/js/angular/controllers/logging.js I found the following line:

          event.message = event.message.replace(/,/g, ',​');

HTML character code 8203 is the unicode "zero width space" character.  I think the admin UI code is trying to make long comma separated lists in log entries word-wrap better, and somehow the browser is treating that as literal text rather than an HTML entity.  This is NOT in the data being indexed, it is just in the log.  It's definitely a display bug, but doesn't affect the data being indexes.

Here you can see the same thing happening with my server running 9.2.0-SNAPSHOT:

https://www.dropbox.com/s/77yc9bovxwaauu6/solr-logging-html-8203.png?dl=0

I checked solr.log and that text is NOT there.  I bet if you check solr.log you will also find that it is not there.

Requests to the URL in my screenshot that do not come from specific IP addresses are blocked.  Those requests never get beyond the reverse proxy.

Thanks,
Shawn

Reply via email to