Re: Supported File Formats - PDF, MHT

2008-02-12 Thread Jan Peter Stotz
Naman Gupta wrote: Does lucene support the files in pdf and mht file formats. I wasnt able to retrieve any results after creating an index of such files. Well, the answer is simple: Lucene itself does not support any file format. You need a file parser that converts your files to a plain text

Supported File Formats - PDF, MHT

2008-02-12 Thread Naman Gupta
Hey Does lucene support the files in pdf and mht file formats. I wasnt able to retrieve any results after creating an index of such files. This is the first time i am using lucene. Thanks Naman K Gupta

Re: Lucene multiple field search performance

2008-02-12 Thread Michael Stoppelman
Did your index size increase drastically? As a first step I would recommend optimizing your index if you haven't already. -M On Feb 12, 2008 7:42 PM, Cesar Ronchese <[EMAIL PROTECTED]> wrote: > > I was doing normal queries happily, seeing the results statistics come in > about 0.02 seconds. > >

Lucene multiple field search performance

2008-02-12 Thread Cesar Ronchese
I was doing normal queries happily, seeing the results statistics come in about 0.02 seconds. But then, I added a extra field to seach togheter with the normal query, then the statistic pulled up to 0.35 seconds. That was a lot. example: normal query: some test (it returns quick) extra field que

Re: update field boost

2008-02-12 Thread Jay
My bad. Thanks for the link! Jay Chris Hostetter wrote: : Do you know why FieldNormModifier is removed from Lucene 2.3? : thanks. it wasn't... http://lucene.apache.org/java/2_3_0/api/contrib-misc/org/apache/lucene/index/FieldNormModifier.html ...it's in the "miscellaneous" contrib though so

Re: Getting the number of indexed fields in an index

2008-02-12 Thread Grant Ingersoll
I think this is what you are asking: http://lucene.apache.org/java/2_3_0/api/core/org/apache/lucene/index/IndexReader.html#getFieldNames(org.apache.lucene.index.IndexReader.FieldOption) On Feb 12, 2008, at 11:13 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED] > wrote: Hi, Does anyone have a

Re: update field boost

2008-02-12 Thread Chris Hostetter
: Do you know why FieldNormModifier is removed from Lucene 2.3? : thanks. it wasn't... http://lucene.apache.org/java/2_3_0/api/contrib-misc/org/apache/lucene/index/FieldNormModifier.html ...it's in the "miscellaneous" contrib though so you'll need to use that jar explicitly. -Hoss --

RE: Lukes document hitlist display

2008-02-12 Thread spring
OK, understood. Maybe a little hint in the legend, like "Only for stored fields". > -Original Message- > From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] > Sent: Dienstag, 12. Februar 2008 19:13 > To: java-user@lucene.apache.org > Subject: Re: Lukes document hitlist display > > [EMAIL PR

Re: Lukes document hitlist display

2008-02-12 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote: Hi, using Luke 0.7.1. The document hitlist has a column header ITSVop0LBC. When I add a field like this: new Field("CONTENT", contentReader, TermVector.WITH_OFFSETS) Luke shows only "--". Why? Shouldn't it be "IT-Vo-"? It should, but this information i

Lukes document hitlist display

2008-02-12 Thread spring
Hi, using Luke 0.7.1. The document hitlist has a column header ITSVop0LBC. When I add a field like this: new Field("CONTENT", contentReader, TermVector.WITH_OFFSETS) Luke shows only "--". Why? Shouldn't it be "IT-Vo-"? Thank you -

Re: update field boost

2008-02-12 Thread Jay
Do you know why FieldNormModifier is removed from Lucene 2.3? thanks. Jay Chris Hostetter wrote: : I read the doc for the api indexreader.setNorm() after I posted the question : earlier. To use that setNorm() to modify the field boost, it seems to me that : one has to know how the boost is fold

Re: update field boost

2008-02-12 Thread Jay
It'd be helpful if there is an api for getting the norm of a given field in a given doc. Thanks for the pointers. Jay Chris Hostetter wrote: : I read the doc for the api indexreader.setNorm() after I posted the question : earlier. To use that setNorm() to modify the field boost, it seems to me

Re: Inverted letters

2008-02-12 Thread Patrick
Did you take a look at the org.apache.lucene.analysis.ngram.NGramTokenFilter? Or other ngram implementation? Works great for us. Patrick Ulrich Vachon wrote: Hi all, It's possible to use simplely (without java preprocessing, if possible) Lucene to find items with this constraints: I have

Getting the number of indexed fields in an index

2008-02-12 Thread marc.dumontier
Hi, Does anyone have a code snippet which would allow me to ask my index how many instances of a field are indexed? Thanks, Marc Dumontier Manager, Software Development Thomson Scientific (Canada) 1 Yonge Street, Suite 1801 Toronto, Ontario M5E 1W7 Direct +1 416 214 3448 Mobile +

RE: TermPositionVector

2008-02-12 Thread spring
This would be really nice! > -Original Message- > From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] > Sent: Dienstag, 12. Februar 2008 16:41 > To: java-user@lucene.apache.org > Subject: Re: TermPositionVector > > [EMAIL PROTECTED] wrote: > > Hi, > > > > could somebody please explain wha

Re: TermPositionVector

2008-02-12 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote: Hi, could somebody please explain what the difference between positions and offsets is? And: Is there a trick to show theses infos in luke? Not yet :) Funny thing, I've been thinking about adding this to Luke, but ran out of time before the last release. Perhaps I'l

Re: Has SpanRegexQuery been deprecated in lucene 2.3.0?

2008-02-12 Thread Erik Hatcher
Erica - it has never been in the core JAR.It should be available in the lucene-regex-2.3.0.jar Erik On Feb 12, 2008, at 10:01 AM, Mitchell, Erica wrote: Hi, I've downloaded lucene 2.3.0 and the jar lucene-core-2.3.0.jar does not contain the SpanRegexQuery class. Has this bee

Has SpanRegexQuery been deprecated in lucene 2.3.0?

2008-02-12 Thread Mitchell, Erica
Hi, I've downloaded lucene 2.3.0 and the jar lucene-core-2.3.0.jar does not contain the SpanRegexQuery class. Has this been deprecated? Thanks, Erica IONA Technologies PLC (registered in Ireland) Registered Number: 171387 Registered Address: The IONA Building, Shel

Re: Inverted letters

2008-02-12 Thread Erick Erickson
You should probably think about synonym analyzers, both at index time and query time. Because I think you have a problem here Let's say you can do what you ask, at query time transform any of your three options into "clamoxyle". Would it really be satisfactory to your users to then NOT get any

RE: TermPositionVector

2008-02-12 Thread spring
TermA TermB TermA has position 0 and offset 0 TermB has position 1 and offset 6 Right? > -Original Message- > From: Grant Ingersoll [mailto:[EMAIL PROTECTED] > Sent: Dienstag, 12. Februar 2008 15:16 > To: java-user@lucene.apache.org > Subject: Re: TermPositionVector > > Position is jus

Re: TermPositionVector

2008-02-12 Thread Grant Ingersoll
Position is just relative to other tokens (Token.getPositionIncrement()), offsets are character offsets (Token.startOffset(), Token.endOffset()) -Grant On Feb 12, 2008, at 8:31 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote: Hi, could somebody please explain what the difference between

TermPositionVector

2008-02-12 Thread spring
Hi, could somebody please explain what the difference between positions and offsets is? And: Is there a trick to show theses infos in luke? Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail

Inverted letters

2008-02-12 Thread Ulrich Vachon
Hi all, It's possible to use simplely (without java preprocessing, if possible) Lucene to find items with this constraints: I have indexed this word : clamoxyle I want to find it with this queries : claomxyle, clamoxile, camoxyle. It is possible? Thank you, Ulrich.