one example. You
>>>> should run Tika separately as it's entirely possible for it to fail to
>>>> parse a PDF and crash - and if you're running it in DIH & Solr it then
>>>> brings down everything. Separate your PDF processing from your Solr
&
27;s JVM's memory footprint. For example, the
following will limit it to 2GB
> java -Xmx2048m -jar tika-server-1.24.jar
- H
-Original Message-
From: Jan Høydahl [mailto:jan@cominvent.com]
Sent: August 26, 2020 6:19 AM
To: solr-user
Subject: [EXT] Re: PDF extraction using Tika
W
e for it to fail to parse a PDF
>>> and crash - and if you're running it in DIH & Solr it then brings down
>>> everything. Separate your PDF processing from your Solr indexing.
>>>
>>>
>>> Cheers
>>>
>>> Charlie
>>>
>
a PDF and crash - and if you're running it in DIH & Solr it
then brings down everything. Separate your PDF processing from your
Solr indexing.
Cheers
Charlie
Thanks,
Srinivas Kashyap
-Original Message-
From: Alexandre Rafalovitch
Sent: 24 August 2020 20:54
To: solr-user
Subj
Thanks Phil,
I will modify it according to the need.
Thanks,
Srinivas
-Original Message-
From: Phil Scadden
Sent: 26 August 2020 02:44
To: solr-user@lucene.apache.org
Subject: RE: PDF extraction using Tika
Code for solrj is going to be very dependent on your needs but the beating
Admin", password);
UpdateResponse ur = req.process(solr,"prindex");
req.commit(solr, "prindex");
-----Original Message-----
From: Srinivas Kashyap
Sent: Tuesday, 25 August 2020 17:04
To: solr-user@lucene.apache.org
Subject: RE: PDF extraction usi
r
Subject: Re: PDF extraction using Tika
The issue seems to be more with a specific file and at the level way
below Solr's or possibly even Tika's:
Caused by: java.io.IOException: expected='>' actual='
' at offset 2383
at
org.apache.pdfbox.pdfpar
e Rafalovitch
Sent: 24 August 2020 20:54
To: solr-user
Subject: Re: PDF extraction using Tika
The issue seems to be more with a specific file and at the level way below
Solr's or possibly even Tika's:
Caused by: java.io.IOException: expected='>' act
from PDF and pushes into solr?
Thanks,
Srinivas Kashyap
-Original Message-
From: Alexandre Rafalovitch
Sent: 24 August 2020 20:54
To: solr-user
Subject: Re: PDF extraction using Tika
The issue seems to be more with a specific file and at the level way below
Solr's or possibly
The issue seems to be more with a specific file and at the level way
below Solr's or possibly even Tika's:
Caused by: java.io.IOException: expected='>' actual='
' at offset 2383
at
org.apache.pdfbox.pdfparser.BaseParser.readExpectedChar(BaseParser.java:1045)
Are you indexing the sa
Hello,
We are using TikaEntityProcessor to extract the content out of PDF and make the
content searchable.
When jetty is run on windows based machine, we are able to successfully load
documents using full import DIH(tika entity). Here PDF's is maintained in
windows file system.
But when jetty
11 matches
Mail list logo