phrase query highlighter spans matching

2010-05-18 Thread Li Li
hi all, I read lucene in action 2nd Ed. It says SimpleSpanFragmenter will "make fragments that always include the spans matching each document". And also a SpanScorer existed for this use. But I can't find any class named SpanScorer in lucene 3.0.1. And the result of HighlighterTest class in c

Re: Stemming Problem

2010-05-18 Thread Erick Erickson
You can construct your own analyzer by creating it from a pre-existing Tokenizer (e.g. WhiteSpaceTokenizer) and any number of TokenfFilters (e.g. TokenFilter). You can string any number of TokenFilters together to get many different effects. But I have to ask, why you want to keep capitalization?

RE: Stemming Problem

2010-05-18 Thread Christopher Condit
Hi Larry- > Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having > problems with stemming. Does anyone have a recommendation for other > text analyzers that handle stemming and also keep capitalization, stop words, > and punctuation? Have you tried the SnowballFilter? You co

Re: How to achive this kind of document ordering

2010-05-18 Thread Erick Erickson
I just skimmed your message, but Lucene provides for multiple sorts. You can construct a Sort object from an arbitrary number of fields, and any documents that all sort equally for fields 1..k will be resolved by considering field k+1. The performance impact when searching is mostly upon the very

Re: Sorting and Empty (non-existing) Fields

2010-05-18 Thread Rob Bygrave
BTW: Saw this in the SOLR docs... - If sortMissingLast="false" and sortMissingFirst="false" (the default), * then default lucene sorting will be used which places docs without the field first in an ascending sort and last in a descending sort.* On Wed, May 19, 2010 at 4

Re: Sorting and Empty (non-existing) Fields

2010-05-18 Thread Rob Bygrave
I'm not a Lucene Guru so hopefully you get a more definitive response. I believe this means you want a way to specify ... "Nulls High" / "Nulls Low" for your field (in this case you want Nulls High I believe). I haven't seen support for that (but it might exist). Looking at StringValComparator I'

Stemming Problem

2010-05-18 Thread Larry Hendrix
Hi, Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having problems with stemming. Does anyone have a recommendation for other text analyzers that handle stemming and also keep capitalization, stop words, and punctuation? Thanks, Larry Larry A. Hendrix, Graduate Student C

Sorting and Empty (non-existing) Fields

2010-05-18 Thread comparis . ch - Roman Baeriswyl
Hi All I've got a problem I'm trying to solve the whole day: Let's say I have an index with two fields, the first one is always filled and the second one only sometimes. Now I want to search something on the first field and want the results sorted by relevance, then by the first field, then by

How to achive this kind of document ordering

2010-05-18 Thread Dragan Jotanovic
|Hi, I need to sort results by two fields. First one is numeric and sorting should be in ascending order. Second one should be ordered in a "levels" structure. Here is the example: Unsorted: DocId SortFieldA SortFieldB 1101A 2102B 3102A

Re: Will doc ids ever change if nothing is deleted?

2010-05-18 Thread Michael McCandless
If you never delete docs, then w/ the default merge policy, the docIDs should never change. But... this should be considered an impl detail of Lucene. In theory someday this could change. EG there's an issue open (LUCENE-1076) to allow a merge policy to select out-of-order merges, which they can

Re: Deciding memory requirements for Lucene indexes proactively -- How to?

2010-05-18 Thread Ian Lea
> Is there a way (perhaps a formulae) to accurately > judge  the memory requirement for a Lucene index? > (May be based on number of documents or index > size etc?) The short answer is no, although there are some things you can estimate based on the number of fields, terms etc. Sorting will use m

Re: Lock obtain timed out

2010-05-18 Thread Saurabh Agarwal
ummm i am just toying with Lucene and katta, so to have apples to apples comparison I am using NFS mount for lucene and same FS as a filesystem for Katta/HADOOP Saurabh Agarwal On Tue, May 18, 2010 at 1:46 PM, Uwe Schindler wrote: > Then why use NFS? > > - > Uwe Schindler > H.-H.-Meier-Alle

RE: Using Lucene to Query File properties in Windows

2010-05-18 Thread Uwe Schindler
Hi, this works fine with Lucene. Use NumericField and NumericRangeQuery to index the file date (File.lastModified as NumericField.setLongValue()). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: vijay r

Re: Using Lucene to Query File properties in Windows

2010-05-18 Thread Ian Lea
Sure. Create an index with fields like name: somefile.whatever creator: james lastmod: 20100518 created: 20100518 ... Make sure that the fields that you want to search on are indexed, create some queries and away you go. You'll need range queries for the before and since tests. Good

RE: Lock obtain timed out

2010-05-18 Thread Uwe Schindler
Then why use NFS? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Saurabh Agarwal [mailto:srbh.g...@gmail.com] > Sent: Tuesday, May 18, 2010 10:13 AM > To: java-user@lucene.apache.org > Subject: Re: Lock

Re: Lock obtain timed out

2010-05-18 Thread Saurabh Agarwal
Thanks :) i am using only one server to create the index Saurabh Agarwal On Tue, May 18, 2010 at 1:41 PM, Ian Lea wrote: > Use SimpleFSLockFactory. The default, NativeFSLockFactory, doesn't > play well with NFS. > > And a warning: lucene does work on NFS but you may run into problems > if you

Re: Lock obtain timed out

2010-05-18 Thread Ian Lea
Use SimpleFSLockFactory. The default, NativeFSLockFactory, doesn't play well with NFS. And a warning: lucene does work on NFS but you may run into problems if your index has a lot of modifications and/or is accessed from different servers. -- Ian. On Tue, May 18, 2010 at 6:56 AM, Saurabh Aga