AW: Lucene parsing for PDF

2005-12-29 Thread Klaus
Hi, I think the easiest way is ro exclude the pages while you are parsing the pdf document. So you will provide just the necessary pages to lucene. Another solution is to create for each site an own document, this should hafe a field "pagenumber" or, und you can delete the document from the index

Re: Lucene parsing for PDF

2005-12-29 Thread Erik Hatcher
Shyam - I moderated your message through, so please subscribe to the list to send to it in the future. Please provide us with some details - a standalone RAMDirectory-using JUnit TestCase is the most ideal way to share an issue like this and have someone else take a look at it. And frequen

Lucene parsing for PDF

2005-12-29 Thread Shyam Bhaskaran
Hi, I am working on a search project using Lucene and currently I am working on parsing PDF documents. I was successful in implementing my parser using Lucene and PDFBox. I have a doubt on how to exclude or (maybe delete) pages from the index. I am not sure how to do this.. I mean when exactly it