O left a small mismatch on the field type, the fields I am trying to clean are all “text_general“ (class solr.TextField)
Em ter., 20 de fev. de 2024 às 09:38, Gino Rodrigues < [email protected]> escreveu: > Hello everyone, > > I am trying to clean source fields from HTML markup before indexing, using > an Update Request Processor. > > But no variation I try seems to work, and HTML markup is still being > indexed. > > Would anyone have an idea about it? > > Thanks in advance! > > *indexing command* > curl -X POST -H "Content-Type: application/csv" --data-binary @myfile.csv " > http://localhost:8983/solr/mycore/update?commit=true" > > *managed-schema.xml* > <fieldType name="text_general" class="solr.TextField" positionIncrementGap > ="100" multiValued="true"> > <analyzer type="index"> > <tokenizer name="standard"/> > <filter words="stopwords.txt" ignoreCase="true" name="stop"/> > <filter name="lowercase"/> > </analyzer> > <analyzer type="query"> > <tokenizer name="standard"/> > <filter words="stopwords.txt" ignoreCase="true" name="stop"/> > <filter name="synonymGraph" synonyms="synonyms.txt" ignoreCase="true" > expand="true"/> > <filter name="lowercase"/> > </analyzer> > </fieldType> > <field name="body" type="text_pt" indexed="true" stored="true"/> > <copyField source="body" dest="catchall"/> > > *solrconfig.xml* > <updateRequestProcessorChain> > <processor class="solr.HTMLStripFieldUpdateProcessorFactory"> > <str name="typeClass">solr.TextField</str> > </processor> > </updateRequestProcessorChain> > > References > > https://solr.apache.org/guide/solr/9_4/configuration-guide/update-request-processors.html > > https://solr.apache.org/docs/9_4_1/core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html > > https://solr.apache.org/docs/9_4_1/core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html >
