spinergywmy wrote:
I have posted this question before and this time I found that it could be
pdfbox problem and this pdfbox I downloaded doesn't use the log4j.jar. To
index the app 2.13mb pdf file took me 17s and total time to upload a file is
18s.
Re: PFDBox.
I have a 2.5Mb test file that
Grant Ingersoll wrote:
On Nov 30, 2006, at 10:54 AM, spinergywmy wrote:
For my scenario will be every time the users upload the single file, I
need to index that particular file. Previously was because the previous
version of pdfbox integrate with log4j.jar file and I believe is the
log4j.j
On Nov 30, 2006, at 10:54 AM, spinergywmy wrote:
Hi Grant,
Thanks for the tips. I will take ur adviced and look into the
link that u
send to me.
For my scenario will be every time the users upload the single
file, I
need to index that particular file. Previously was because the
Hi Grant,
Thanks for the tips. I will take ur adviced and look into the link that u
send to me.
For my scenario will be every time the users upload the single file, I
need to index that particular file. Previously was because the previous
version of pdfbox integrate with log4j.jar file and
http://lucene.apache.org/java/docs/contributions.html lists several
PDF alternatives, but I can't speak to their performance. I am sure
if you googled PDF converters you could find a fair number of hits.
Perhaps w/ some more details about your app we might be able to find
a workaround. We
spinergywmy wrote:
Hi,
I having this indexing the pdf file performance issue. It took me more
than 10 sec to index a pdf file about 200kb. Is it because I only have a
segment file? How can I make the indexing performance better?
If you're using the log4j PDFBox jar file, you must make sure
You may want to use something like pdftotext part of XPDF
(http://www.foolabs.com/xpdf/download.html). It will produce a text
extract for a PDF. Indexing will work like a breeze, without memory
consumption of PDFBox.
Regards,
Ioan
spinergywmy wrote:
Hi,
I having this indexing the pdf file
Have you measured to see how much of your time is spent indexing and how
much is just parsing the file? You need to do this before having a clue what
you need to make faster
Erick
On 11/10/06, Daniel Naber <[EMAIL PROTECTED]> wrote:
On Friday 10 November 2006 12:18, spinergywmy wrote:
> I
On Friday 10 November 2006 12:18, spinergywmy wrote:
> I having this indexing the pdf file performance issue. It took me more
> than 10 sec to index a pdf file about 200kb. Is it because I only have a
> segment file? How can I make the indexing performance better?
PDFBox (which I assume you are