Is there a way to force content extraction with a given encoding

lala Thu, 07 Nov 2019 22:48:10 -0800

I am using the /update/extract request handler to push documents into solr,
but some text documents, that are encoded as windows-1255 (arabic texts) are
not extracted properly, the text given is not readable.


I searched in the web, and solr documentation and found nothing. I need to
send the file encoding as a parameter if possible to let the tika parser get
to know it.

Is there a way to achieve that?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Is there a way to force content extraction with a given encoding

Reply via email to