Re: Is there a way to force content extraction with a given encoding

2019-11-07 Thread Jörn Franke
I would convert them to UTF-8 before posting and use UTF-8 in your application. Most of the web and applications use UTF-8. If you use other encodings you will always run into problems. > Am 08.11.2019 um 07:47 schrieb lala : > > I am using the /update/extract request handler to push documents

Is there a way to force content extraction with a given encoding

2019-11-07 Thread lala
I am using the /update/extract request handler to push documents into solr, but some text documents, that are encoded as windows-1255 (arabic texts) are not extracted properly, the text given is not readable. I searched in the web, and solr documentation and found nothing. I need to send the file

Re: How to start troubleshooting a content extraction issue

2011-08-10 Thread Jayendra Patil
You can test the standalone content extraction with the tika-app.jar - Command to output in text format - java -jar tika-app-0.8.jar --text file_path For more options java -jar tika-app-0.8.jar --help Use the correct tika-app version jar matching the Solr build. Regards, Jayendra On Wed, Aug

How to start troubleshooting a content extraction issue

2011-08-10 Thread Tim AtLee
es/default/files/nodefiles/533/June 30, 2011.xltm* to Solr "0" Status: Communication Error". I am looking for some help in figuring out where to troubleshoot this. I assume it's this file, but I guess I'd like to be sure - so how can I submit this file for content extr

Re: Solr Cell: Content extraction problem with ContentStreamUpdateRequest and multiple files

2011-03-09 Thread Karthik Shiraly
In case the exact problem was not clear to somebody: The problem with FileUpload interpreting file data as regular form fields is that, Solr thinks there are no content streams in the request and throws a "missing_content_stream" exception. On Thu, Mar 10, 2011 at 10:59 AM, Karthik Shiraly < karth

Solr Cell: Content extraction problem with ContentStreamUpdateRequest and multiple files

2011-03-09 Thread Karthik Shiraly
Hi, I'm using Solr 1.4.1. The scenario involves user uploading multiple files. These have content extracted using SolrCell, then indexed by Solr along with other information about the user. ContentStreamUpdateRequest seemed like the right choice for this - use addFile() to send file data, and use

Re: Content Extraction

2010-02-26 Thread Lee Smith
Hi Erik I did a post with more details yesterday with no response. I have a screen shot of what it does: http://screencast.com/t/MGRiZTU5M After running it I have done a query with 0 results and have checked to see how many docs are indexed with 0 being the value. Hope you can shed some more l

Re: Content Extraction

2010-02-26 Thread Erick Erickson
You really have to provide more details of a> what you did. b> what the results were. Have you looked at you r index with the admin page and/or Luke? Have you tried querying in the admin page? Have you examined the logs to see what they report? Best Erick On Fri, Feb 26, 2010 at 7:54 AM, Lee Smi

Content Extraction

2010-02-26 Thread Lee Smith
Hey All Hope someone can advise. I followed the example in the wiki on how to extract a html page i.e curl 'http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true' -F "myfi...@tutorial.html" And it displayed a html page but with a 404 and