Re: Sharding Techniques

2011-05-09 Thread Ganesh
We are using similar technique as yours. We keep smaller indexes and use ParallelMultiSearcher to search across the index. Keeping smaller indexes is good as index and index optimzation would be faster. There will be small delay while searching across the indexes. 1. What is your search time?

Non-English Languages Search

2011-05-09 Thread Provalov, Ivan
We are planning to ingest some non-English content into our application. All content is OCR'ed and there are a lot of misspellings and garbage terms because of this. Each document has one primary language with a some exceptions (e.g. a few English terms mixed in with primarily non-English docu

Question on boosting result while searching

2011-05-09 Thread Saurabh Gokhale
Hi All, Can some one pls direct me on how to boost the result when specific keywords are found while searching the document? example: 1. While indexing the documents A, B and C, I do not boost any of these documents. (Field.Store.YES, Field.Index.ANALYZED) and setBoost(1.0) 2. Now I read documen

Expected Behavior from QueryParser and Standard Analyzer with Version.LUCENE_*

2011-05-09 Thread Chris Currens
Hello, I have some questions about what kind of behavior is expected when passing Version.LUCENE_24/29/30 to QueryParser and the StandardAnalyzer when parsing a query. I know that passing the Version to the constructors make Lucene act that like version, with all features and bugs intact. The be

Re: Sharding Techniques

2011-05-09 Thread Ian Lea
> ... > 1. I've not tested my application with single index as initially (a few > years back) we thought smaller the index size (7 indexes for default 80% > searches) the faster the search time would be ... Possibly. Maybe it will be acceptable to make some searches a bit slower in order to make

Question on the use of Synonym Filter while searching using MoreLikeThis

2011-05-09 Thread Saurabh Gokhale
Hi All, This is my first question for this forum. I am fairly familiar with Lucene and using 2.9.4 in my project (not using Solr). I have a following question for the use of Synonym filter. While indexing contents, I am using following analyzer setup [Analyzer1] == StandardTokenizer --> Stand

Re: Sharding Techniques

2011-05-09 Thread Samarendra Pratap
Hi Ian, Thanks for sharing your knowledge and to-the-point answers. 1. I've not tested my application with single index as initially (a few years back) we thought smaller the index size (7 indexes for default 80% searches) the faster the search time would be. Anyway i'll give it a try and share t

Re: Sharding Techniques

2011-05-09 Thread Ian Lea
30Gb isn't that big by lucene standards. Have you considered or tried just having one large index? If necessary you could restrict searches to particular "indexes", or groups thereof, by a field in the combined index, preferably used as a filter. If the slow searches have to search across 63 sep

Sharding Techniques

2011-05-09 Thread Samarendra Pratap
Hi list, We have an index directory of 30 GB which is divided into 3 subdirectories (idx1, idx2, idx3) which are again divided into 21 sub-subdirectories (idx1-1, idx1-2, , idx2-1, , idx3-1, , idx3-21). We are running with java 1.6, lucene 2.9 (going to upgrade to 3.1 very soon), linu

RE: SpanNearQuery - inOrder parameter

2011-05-09 Thread Gregory Tarr
Attachment didn't work - test below: import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.I

SpanNearQuery - inOrder parameter

2011-05-09 Thread Gregory Tarr
I attach a junit test which shows strange behaviour of the inOrder parameter on the SpanNearQuery constructor, using Lucene 2.9.4. My understanding of this parameter is that true forces the order and false doesn't care about the order. Using true always works. However using false works fine when

RE: Problems with RangeQueries

2011-05-09 Thread Uwe Schindler
Hi, Luke cannot search NumericFields correctly, as the official Lucene QueryParser does not produce numeric ranbge queries, as it does not know that the field is numeric. It uses a TermRangeQuery and that may hit random documents. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http

Re: How do I sort lucene search results by relevance and time?

2011-05-09 Thread Ian Lea
Well, you can use one of the sorting search methods and pass multiple sort keys including relevance and a timestamp. But I suspect the Google algorithm may be a bit more complex than that. One technique is boosting: set an index time document boost on recent documents. Of course what is recent t

AW: Problems with RangeQueries

2011-05-09 Thread Kolhoff, Jacqueline - ENCOWAY
Oh sorry, you're right, these are hibernate search classes. But I tried to search inside the Luke Tool, and didn't find the right data with the given query (the query is in fact an org.apache.lucene.search.Query, which will be wrapped to hibernate search queries), so I thought this might be a Lu

RE: Problems with RangeQueries

2011-05-09 Thread Uwe Schindler
Hallo Jacqueline, I have no idea what classes inside Lucene you use, the term "FieldBridge" relates more to Hibernate Search, right? So maybe you ask this question on their mailing list. NumericFieldUtils is also not a Lucene class, to create a numeric query use NumericRangeQuery.newDoubleRange(fi

Problems with RangeQueries

2011-05-09 Thread Kolhoff, Jacqueline - ENCOWAY
Hi, I have indexed some numeric properties (double) by adding numeric fields like this in a custom FieldBridge: NumericField field = new NumericField(propertyName, Store.YES, true); field.setDoubleValue(propertyValue); document.add(field); This works fine and with my RangeQueries I g

AW: Is there kind of a "NullAnalyzer" ?

2011-05-09 Thread Clemens Wyss
> The same functionality can be achieved per field using > Field.INDEX_NOT_ANALYZED. ;) > -Ursprüngliche Nachricht- > Von: Uwe Schindler [mailto:u...@thetaphi.de] > Gesendet: Montag, 9. Mai 2011 10:07 > An: java-user@lucene.apache.org > Betreff: RE: Is there kind of a "NullAnalyzer" ? > >

RE: Is there kind of a "NullAnalyzer" ?

2011-05-09 Thread Uwe Schindler
Hi, KeywordTokenizer and KeywordAnalyzer. The same functionality can be achieved per field using Field.INDEX_NOT_ANALYZED. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Clemens Wyss [mailto:clemens...

AW: Is there kind of a "NullAnalyzer" ?

2011-05-09 Thread Clemens Wyss
Thx! > -Ursprüngliche Nachricht- > Von: Federico Fissore [mailto:feder...@fissore.org] > Gesendet: Montag, 9. Mai 2011 09:52 > An: java-user@lucene.apache.org > Betreff: Re: Is there kind of a "NullAnalyzer" ? > > Clemens Wyss, il 09/05/2011 09:42, ha scritto: > > i.e. an analyzer which t

Re: Is there kind of a "NullAnalyzer" ?

2011-05-09 Thread Federico Fissore
Clemens Wyss, il 09/05/2011 09:42, ha scritto: i.e. an analyzer which takes the field to be analyzed as is into the index...? The fields I am trying to index have a max length of 3 words and I don't want to match sub terms of these fields. keyword analyzer? https://lucene.apache.org/java/3_0

Is there kind of a "NullAnalyzer" ?

2011-05-09 Thread Clemens Wyss
i.e. an analyzer which takes the field to be analyzed as is into the index...? The fields I am trying to index have a max length of 3 words and I don't want to match sub terms of these fields. - To unsubscribe, e-mail: java-user