Re: Need advice: what pdf lib to use?

2004-10-25 Thread iouli . golovatyi
Ben, many thanks for your complrehensive answer. Unfourtunatly I can not send the problem pdfs cause they are the property of company and are of top secrecy:) Regards, J. Ben Litchfield <[EMAIL PROTECTED]> 22.10.2004 14:40 Please respond to "Lucene Users List" To: Lucene Users

Re: Need advice: what pdf lib to use?

2004-10-25 Thread iouli . golovatyi
PDFbox stumbles also with "class java.io.IOException with message: - You do not have permission to extract text" in case the doc is copy/print protected. I tested now the snowtide commercial product and it looks like it could process these files as well. Performance was also not so bad. Unfortu

Re: Need advice: what pdf lib to use?

2004-10-25 Thread sergiu gordea
[EMAIL PROTECTED] wrote: Hi Iouli, If you don't think is illegal, you can hack the pdfbox code to remove the protection ... Sergiu PDFbox stumbles also with "class java.io.IOException with message: - You do not have permission to extract text" in case the doc is copy/print protected. I tes

Re: Search PDF ???

2004-10-25 Thread Zilverline info
Hi Eric, Try zilverline Michael Eric Chow wrote: Hello, 1. Is it possibleto use Lucene to search PDF contents ? 2. Can it search Chinese contents PDF files ??? Eric - To unsubscribe, e-mail: [EMAIL PROTE

Re: Need advice: what pdf lib to use?

2004-10-25 Thread Ben Litchfield
PDFBox does not 'stumble' when it gives that message, that is correct functionality if that permission is not allowed. If your company is willing to pay a 'fortune' why not sponsor a change to an open source project for half a fortune. Ben http://www.pdfbox.org On Mon, 25 Oct 2004 [EMAIL PROTEC

Re: Need advice: what pdf lib to use?

2004-10-25 Thread iouli . golovatyi
Yes Ben, You are right. This would be correct functionality from technical perspective. But look it my way with application programmer eyes reporting to big boss that c. 30% of doc we cope with could not be indexed because of this stupid limitation. Neither he or me have any influence on pdf ow

Re: Need advice: what pdf lib to use?

2004-10-25 Thread iouli . golovatyi
As far as > > I need a piece of advice/experience.. > > > > What pdf parser (written in java) u'd recommend? > > > > I played now with PDFBox-0.6.7a and would not say I was satisfied too > much > > with it > > > > On certain pdf's (not well formated but anyway readable with acrobate) > it > > run

Re: Need advice: what pdf lib to use?

2004-10-25 Thread iouli . golovatyi
Ben, As far as as dead loop problem is concerned it looks like I experienced a bit different problem. I published it under the same tracking path Regards J. > > I need a piece of advice/experience.. > > > > What pdf parser (written in java) u'd recommend? > > > > I played now with PDFBox-0.6

Need advice: what Word/Excel/PowerPoint lib to use?

2004-10-25 Thread iouli . golovatyi
Hello all, I need a piece of advice/experience again.. What ms Word/Excel/PowerPoint parsers (written in java) u'd recommend? Thanks in advance J. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: Need advice: what Word/Excel/PowerPoint lib to use?

2004-10-25 Thread sergiu gordea
of course POI, for open source. There are some commercial products based on POI also. for WORD consider textmining.org for XLS, POI does anything you need for powerpoint there is one commercial (it's about 1000$), but you can also find some source code in archives. All the best, Sergiu [EMAIL P

Re: Need advice: what pdf lib to use?

2004-10-25 Thread Ben Litchfield
In order to write software that consumes PDF documents you must agree to a list of conditions. One of those conditions is that permissions specified by the author of the PDF document are respected. PDFBox complies with this statement, if there is software that does not then they are in violation

Re: Need advice: what pdf lib to use?

2004-10-25 Thread sergiu gordea
Ben Litchfield wrote: In order to write software that consumes PDF documents you must agree to a list of conditions. One of those conditions is that permissions specified by the author of the PDF document are respected. PDFBox complies with this statement, if there is software that does not then t

about snowball and *

2004-10-25 Thread Wermus Fernando
Luceners, If I have a spanish word as "oportunidad", Lucene indexed as "oportuni" using an Spanish snowball analyzer. When I search for a document it works well. But If the user writes oportunidad* looking up n>1 words, it doesn't match with the document I have indexed. Analyzer instead

Re: Need advice: what Word/Excel/PowerPoint lib to use?

2004-10-25 Thread Genty Jean-Paul
At 17:05 25/10/2004, you wrote: of course POI, for open source. There are some commercial products based on POI also. for WORD consider textmining.org for XLS, POI does anything you need for powerpoint there is one commercial (it's about 1000$), but you can also find some source code in archives.

Re: Need advice: what Word/Excel/PowerPoint lib to use?

2004-10-25 Thread Sergiu Gordea
Genty Jean-Paul wrote: At 17:05 25/10/2004, you wrote: of course POI, for open source. There are some commercial products based on POI also. for WORD consider textmining.org for XLS, POI does anything you need for powerpoint there is one commercial (it's about 1000$), but you can also find some s

Re: Need advice: what Word/Excel/PowerPoint lib to use?

2004-10-25 Thread Genty Jean-Paul
At 19:42 25/10/2004, you wrote: At 17:05 25/10/2004, you wrote: of course POI, for open source. There are some commercial products based on POI also. for WORD consider textmining.org for XLS, POI does anything you need for powerpoint there is one commercial (it's about 1000$), but you can also fi

Re: Need advice: what Word/Excel/PowerPoint lib to use?

2004-10-25 Thread Ryan Ackley
Their API is amazing. However, you run into the same problems that you do when you automate MS Office using VBA. Which is instability and everything is single-threaded. Your are basically automating a gui application. -Ryan - Original Message - From: "Genty Jean-Paul" <[EMAIL PROTECTED]

Re: Need advice: what Word/Excel/PowerPoint lib to use?

2004-10-25 Thread Andrzej Bialecki
Ryan Ackley wrote: Their API is amazing. However, you run into the same problems that you do when you automate MS Office using VBA. Which is instability and everything is single-threaded. Your are basically automating a gui application. AFAIK they don't provide a separate converters' API, which

Exception in thread "main" java.lang.NoClassDefFoundError

2004-10-25 Thread Rob Hailey
I am using lucene version 1.4.2 but am consistently getting an error when I run this: java -verbose -classpath /Users/rob/Desktop/lucene/lucene.jar:/Users/rob/Desktop/lucene/lucene- demos.jar:. org.apache.lucene.demos.IndexFiles /Users/rob/Desktop/lucene/src/ The error I get is: Exception

BooleanQuery - TooManyClauses

2004-10-25 Thread Angelov, Rossen
Hi, Why there is a limit on the number of clauses? and is there any harm in setting MaxClauseCount to Integer.MAX_VALUE? I'm using a Range Query on a field that represents dates and getting BooleanQuery$TooManyClauses exception. This is the query - +/article/createddateiso8601:[2003010100 TO

Re: Need advice: what pdf lib to use?

2004-10-25 Thread Chris Fraschetti
I recently started to work on a project which needed to parse many documents, including pdfs, very quickly and on a large scale. PDF Box seems to look like the best choice except for it's obvious speed issue. Eventually I took the time to go into the pdf box source and rip out the individual string