Re: WildcardQuery and SpanQuery

2007-07-17 Thread Paul Elschot
On Wednesday 18 July 2007 05:58, Cedric Ho wrote: > Hi everybody, > > We recently need to support wildcard search terms "*", "?" together > with SpanQuery. It seems that there's no SpanWildcardQuery available. > After looking into the lucene source code for a while, I guess we can > either: > > 1

Re: Does Index have a Tokenizer Built into it

2007-07-17 Thread Chris Hostetter
: After indexing I have been able to retrieve the TermPositionVector from the : index and it has all of the data, but I cannot find a way where given a : position I can retrieve the term at that position. Which is how I was hoping : to create my contextual snippets. there is no easy way to go fro

Re: getting problem while indexing pdf files with pdfbox

2007-07-17 Thread neetika
Hi Erick, I am able to get the result fine. The problem was, I forgot to close the writer and so the index file (.cfs) was not getting generated. Thanks a lot for the timely help. Regards, Neetika Erick Erickson wrote: > > You have NOT supplied an example of the text you extracted > from th

WildcardQuery and SpanQuery

2007-07-17 Thread Cedric Ho
Hi everybody, We recently need to support wildcard search terms "*", "?" together with SpanQuery. It seems that there's no SpanWildcardQuery available. After looking into the lucene source code for a while, I guess we can either: 1. Use SpanRegexQuery, or 2. Write our own SpanWildcardQuery, and

Re: getting problem while indexing pdf files with pdfbox

2007-07-17 Thread Erick Erickson
You have NOT supplied an example of the text you extracted from the document. But let's assume that the interesting string is exactly what you expect. Have you looked at your index with Luke to see if the data is there? I *strongly* suggest you get a copy of Luke (google lucene luke) to examine i

Re: getting problem while indexing pdf files with pdfbox

2007-07-17 Thread neetika
Hi Erick, Befoe indexing I have printed the doc, and I have given the output also.It is printing well. Kindly please check my post again following... " System.out.println(doc); //Following code is for making index" and the corresponding output is... Document > Offhand I'd ass

Re: Does Index have a Tokenizer Built into it

2007-07-17 Thread John Paul Sondag
Hi, I've been looking into the indexing documents with the vectors for terms and positions on to solve my problem. However, I've run into a bit of a snag. After indexing I have been able to retrieve the TermPositionVector from the index and it has all of the data, but I cannot find a way where g

Re: getting problem while indexing pdf files with pdfbox

2007-07-17 Thread Erick Erickson
Offhand I'd assume that your problem is using PDFbox. Have you tried printing out the docText string you get back from docText = stripper.getText(new PDDocument(cosDoc))? I'd recommend you assure yourself that you get valid text back from the PDF document before worrying about indexing it. Bes

getting problem while indexing pdf files with pdfbox

2007-07-17 Thread neetika
http://www.nabble.com/file/p11647342/DRra0026.pdf DRra0026.pdf hi all, i am able to convert a pdf in to a text file using pdfbox. and this is the code that I used, but I am not able to index it // code for parsing and making index public Document getDocument(InputStream is

Re: search through all fields

2007-07-17 Thread Mathieu Lecarme
http://www.opensymphony.com/compass/ The project is free, following Lucene version quickly, the forum is great, and the lead developer is quick reacting. M. Mohammad Norouzi a écrit : > Mathieu, > I need an object mapper for lucene would you please give me the > Compass web > site? is it open so

Re: search through all fields

2007-07-17 Thread Mohammad Norouzi
Mathieu, I need an object mapper for lucene would you please give me the Compass web site? is it open source? thanks On 7/17/07, Mathieu Lecarme <[EMAIL PROTECTED]> wrote: Sorry, I use Compass, an object mapper for Lucene, and it provides a special field "all", I thought it was a Lucene featur

Re: search through all fields

2007-07-17 Thread Mathieu Lecarme
Sorry, I use Compass, an object mapper for Lucene, and it provides a special field "all", I thought it was a Lucene feature. M. Renaud Waldura a écrit : > Often documents can be divided in "metadata" and "contents" sections. Say > you're indexing Web pages, you could index them with HEAD data all