Beginner: Best way to index and display orginal text of pdfs in search results

2008-12-12 Thread maxmil
Hi, This is the first time i am using Lucene. I need to index pdf's with very few fields, title, date and body (long field) for a web based search. The results i need to display have to show not only the documents found but for each document a snapshot of the text where the search term has been

Re: Beginner: Best way to index and display orginal text of pdfs in search results

2008-12-12 Thread Ian Lea
Hi Lucene can store the original text of the document. You make the lucene fields to do what you need. Have a look at the apidocs for Field.Store and you'll see that you've got three choices: Yes, No or Compress. For your display snapshots, have a look at the lucene highlighter package. And a

Re: Beginner: Best way to index and display orginal text of pdfs in search results

2008-12-12 Thread maxmil
Thanks very much. Looks like Field.Store.COMPRESS is what i want. I'll also have a look at the search highlight stuff and getting Lucene in Action. Ian Lea wrote: > > Hi > > > Lucene can store the original text of the document. You make the > lucene fields to do what you need. Have a look

Re: Beginner: Best way to index and display orginal text of pdfs in search results

2008-12-12 Thread Paul Libbrecht
I also encountered these options of the Field constructor but I never found a way to be sure that the field is really not loaded in RAM and only return with Field.reader(). There seems to be no contract in the javadoc. Moreover the reader access methods went away between 1.9 and 2.2 if I

RE: Beginner: Best way to index and display orginal text of pdfs in search results

2008-12-12 Thread Sudarsan, Sithu D.
-user@lucene.apache.org Subject: Beginner: Best way to index and display orginal text of pdfs in search results Hi, This is the first time i am using Lucene. I need to index pdf's with very few fields, title, date and body (long field) for a web based search. The results i need to display