Re: Scale out design patterns

2011-01-20 Thread Anshum
Hi Ganesh, I'd suggest, if you have a particular dimension/field on which you could shard your data such that the query/data breakup gets predictable, that would be a good way to scale out e.g. if you have users which are equally active/searched then you may want to split their data on a simple mod

Re: Please Help

2011-01-20 Thread Anshum
Hi, You could just try the following code to print the term freq for individual terms. public static void printTermFreq(String indexPath) throws CorruptIndexException, IOException{ IndexReader ir = IndexReader.open(new NIOFSDirectory(new File(indexPath))); TermEnum

Scale out design patterns

2011-01-20 Thread Ganesh
Hello all, Could you any one guide me what all the various ways we could scale out? 1. Index: Add data to the nodes in round-robin. Search: Query all the nodes and cluster the results using carrot2. 2.Horizontal partitioning and No shared architecture, Index: Split the data based on

Re: Wildcard Case Sensitivity

2011-01-20 Thread Jack Krupansky
Wildcards only work for a single term. At index time the underscore in TEST_TYPE is treated as if it were a space separator, producing two terms. At query time the existence of the wildcard suppresses ALL analysis of the term (although that behavior may vary between query parsers), so that the

Re: Best practices for multiple languages?

2011-01-20 Thread Paul Libbrecht
Isn't this approach somewhat bad for term-frequency? Words that would appear in several languages would be a lot more frequent (hence less significative). I'm still preferring the split-field method with a proper query expansion. This way, the term-frequency is evaluated on the corpus of one lan

Wildcard Case Sensitivity

2011-01-20 Thread Amin Mohammed-Coleman
Hi Apologies up front if this question has been asked before. I have a document which contains a field that stores an untokenized value such as TEST_TYPE. The analyser used is StandardAnalyzer and I pass the same analyzer into the query. I perform the following query : fieldName:TEST_*, howe

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-20 Thread Tomás Fernández Löbbe
On Tue, Jan 18, 2011 at 6:04 PM, Grant Ingersoll wrote: > As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really > don't have a good sense of how people get Lucene and Solr for use in their > application. Because of this, there has been some talk of dropping Maven > support for

Please Help

2011-01-20 Thread Ashish Pancholi
Using Lucene_3.0.3. we would like to implement following: The number of occurrences of the term in the entire index. For Example : If we have indexed following text : amazon, amazon s3, amazon simpledb, amazon aws; Then we are supposed to get results : amazon

WARNING: re-index all trunk indices

2011-01-20 Thread Michael McCandless
If you are using Lucene's trunk (to be 4.0) builds, read on... I just committed LUCENE-2872, which is a hard break on the index file format. If you are living on Lucene's trunk then you have to remove any previously created indices and re-index, after updating. The change cuts over to a faster o

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-20 Thread Brendan Grainger
> On Tue, Jan 18, 2011 at 6:04 PM, Grant Ingersoll wrote: > >> As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really >> don't have a good sense of how people get Lucene and Solr for use in their >> application. Because of this, there has been some talk of dropping Maven >> su

Re: AW: Best practices for multiple languages?

2011-01-20 Thread Bill Janssen
Dominique Bejean wrote: > Hi, > > During a recent Solr project we needed to index document in a lot of > languages. The natural solution with Lucene and Solr is to define one > field per languages. Each field is configured in the schema.xml file > to use a language specific processing (tokenizin

Re: NOT_ANALYZED... should be an analyzer

2011-01-20 Thread Robert Muir
On Thu, Jan 20, 2011 at 11:29 AM, Paul Libbrecht wrote: > > Hello list, > > I am hitting a stupid bug where a unit test shows me that QueryParser > analyzes fierciely anything it finds hence... I have to tune the analyzer to > not decompose the terms with fields that should be non-analyzed. > >

NOT_ANALYZED... should be an analyzer

2011-01-20 Thread Paul Libbrecht
Hello list, I am hitting a stupid bug where a unit test shows me that QueryParser analyzes fierciely anything it finds hence... I have to tune the analyzer to not decompose the terms with fields that should be non-analyzed. For indexing, you can choose to have something not_analyzed. For query

Re: Phrase query on multiple fields

2011-01-20 Thread Ian Lea
No and No. Alternative approaches might include building a general "contents" field holding any/all searchable fields or building up the query yourself. The latter is quite straightforward: BooleanQuery bq = new BooleanQuery(); PhraseQuery pq1 = ...; PhraseQuery pq2 = ...; bq.add(pq1, ...);

RE: Filter Performance

2011-01-20 Thread comparis . ch - Roman Baeriswyl
Thanks for the answer. That does make sense. It first gets thru all (not only those which could pass the filter) terms available and investigates all terms which match any of the wildcard queries. And that could take quite some time if I got leading wildcard queries. Guess I'll try another appro

RE: Filter Performance

2011-01-20 Thread Uwe Schindler
The reason for this is that the filters and other boolean clauses are applied during result collection. But wildcard query first needs to investigate all terms that match and this is done before the results are collected. And this step takes the time (especially before Lucene 4.0). There is no way

Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch

2011-01-20 Thread Paul Taylor
Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch in NormalizeCharMap (currently the singleMatch just has to be found in the token I want ut to match the whole token). Can this be done it sounds simple enough but I c

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-20 Thread Stefan Trcek
On Tuesday 18 January 2011 22:04:01 Grant Ingersoll wrote: Where do you get your Lucene/Solr downloads from? [x] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [] I/we build them from source via an

Filter Performance

2011-01-20 Thread comparis . ch - Roman Baeriswyl
Hi all I've got an Index with a few 100k documents and I want to run a rather complex wildcard (incl. leading wildcards) query on it. The wildcard query takes about 2 seconds to complete. Now, I want to limit the items on which the wildcard query will be executed. Let's say, I want to limit the i

Re: AW: Best practices for multiple languages?

2011-01-20 Thread Dominique Bejean
Hi, During a recent Solr project we needed to index document in a lot of languages. The natural solution with Lucene and Solr is to define one field per languages. Each field is configured in the schema.xml file to use a language specific processing (tokenizing, stop words, stemmer, ...). Th

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-20 Thread Jürgen Jakobitsch
Where do you get your Lucene/Solr downloads from? [] ASF Mirrors (linked in our release announcements or via the Lucene website) [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [X] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors