Re: indexing performance issue

2006-11-30 Thread Antony Bowesman
spinergywmy wrote: I have posted this question before and this time I found that it could be pdfbox problem and this pdfbox I downloaded doesn't use the log4j.jar. To index the app 2.13mb pdf file took me 17s and total time to upload a file is 18s. Re: PFDBox. I have a 2.5Mb test file that

Re: indexing performance issue

2006-11-30 Thread Antony Bowesman
Grant Ingersoll wrote: On Nov 30, 2006, at 10:54 AM, spinergywmy wrote: For my scenario will be every time the users upload the single file, I need to index that particular file. Previously was because the previous version of pdfbox integrate with log4j.jar file and I believe is the log4j.j

Re: indexing performance issue

2006-11-30 Thread Grant Ingersoll
On Nov 30, 2006, at 10:54 AM, spinergywmy wrote: Hi Grant, Thanks for the tips. I will take ur adviced and look into the link that u send to me. For my scenario will be every time the users upload the single file, I need to index that particular file. Previously was because the

Re: indexing performance issue

2006-11-30 Thread spinergywmy
Hi Grant, Thanks for the tips. I will take ur adviced and look into the link that u send to me. For my scenario will be every time the users upload the single file, I need to index that particular file. Previously was because the previous version of pdfbox integrate with log4j.jar file and

Re: indexing performance issue

2006-11-30 Thread Grant Ingersoll
http://lucene.apache.org/java/docs/contributions.html lists several PDF alternatives, but I can't speak to their performance. I am sure if you googled PDF converters you could find a fair number of hits. Perhaps w/ some more details about your app we might be able to find a workaround. We

Re: Indexing Performance issue

2006-11-16 Thread Antony Bowesman
spinergywmy wrote: Hi, I having this indexing the pdf file performance issue. It took me more than 10 sec to index a pdf file about 200kb. Is it because I only have a segment file? How can I make the indexing performance better? If you're using the log4j PDFBox jar file, you must make sure

Re: Indexing Performance issue

2006-11-10 Thread Ioan Cocan
You may want to use something like pdftotext part of XPDF (http://www.foolabs.com/xpdf/download.html). It will produce a text extract for a PDF. Indexing will work like a breeze, without memory consumption of PDFBox. Regards, Ioan spinergywmy wrote: Hi, I having this indexing the pdf file

Re: Indexing Performance issue

2006-11-10 Thread Erick Erickson
Have you measured to see how much of your time is spent indexing and how much is just parsing the file? You need to do this before having a clue what you need to make faster Erick On 11/10/06, Daniel Naber <[EMAIL PROTECTED]> wrote: On Friday 10 November 2006 12:18, spinergywmy wrote: > I

Re: Indexing Performance issue

2006-11-10 Thread Daniel Naber
On Friday 10 November 2006 12:18, spinergywmy wrote: >  I having this indexing the pdf file performance issue. It took me more > than 10 sec to index a pdf file about 200kb. Is it because I only have a > segment file? How can I make the indexing performance better? PDFBox (which I assume you are