Hi Ganesh,
I'd suggest, if you have a particular dimension/field on which you could
shard your data such that the query/data breakup gets predictable, that
would be a good way to scale out e.g. if you have users which are equally
active/searched then you may want to split their data on a simple mod
Hi,
You could just try the following code to print the term freq for individual
terms.
public static void printTermFreq(String indexPath) throws
CorruptIndexException, IOException{
IndexReader ir = IndexReader.open(new NIOFSDirectory(new
File(indexPath)));
TermEnum
Hello all,
Could you any one guide me what all the various ways we could scale out?
1. Index: Add data to the nodes in round-robin.
Search: Query all the nodes and cluster the results using carrot2.
2.Horizontal partitioning and No shared architecture,
Index: Split the data based on
Wildcards only work for a single term. At index time the underscore in
TEST_TYPE is treated as if it were a space separator, producing two terms.
At query time the existence of the wildcard suppresses ALL analysis of the
term (although that behavior may vary between query parsers), so that the
Isn't this approach somewhat bad for term-frequency?
Words that would appear in several languages would be a lot more frequent
(hence less significative).
I'm still preferring the split-field method with a proper query expansion.
This way, the term-frequency is evaluated on the corpus of one lan
Hi
Apologies up front if this question has been asked before.
I have a document which contains a field that stores an untokenized value such
as TEST_TYPE. The analyser used is StandardAnalyzer and I pass the same
analyzer into the query. I perform the following query : fieldName:TEST_*,
howe
On Tue, Jan 18, 2011 at 6:04 PM, Grant Ingersoll wrote:
> As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really
> don't have a good sense of how people get Lucene and Solr for use in their
> application. Because of this, there has been some talk of dropping Maven
> support for
Using Lucene_3.0.3. we would like to implement following:
The number of occurrences of the term in the entire index.
For Example :
If we have indexed following text : amazon, amazon s3, amazon
simpledb, amazon aws;
Then we are supposed to get results :
amazon
If you are using Lucene's trunk (to be 4.0) builds, read on...
I just committed LUCENE-2872, which is a hard break on the index file format.
If you are living on Lucene's trunk then you have to remove any
previously created indices and re-index, after updating.
The change cuts over to a faster o
> On Tue, Jan 18, 2011 at 6:04 PM, Grant Ingersoll wrote:
>
>> As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really
>> don't have a good sense of how people get Lucene and Solr for use in their
>> application. Because of this, there has been some talk of dropping Maven
>> su
Dominique Bejean wrote:
> Hi,
>
> During a recent Solr project we needed to index document in a lot of
> languages. The natural solution with Lucene and Solr is to define one
> field per languages. Each field is configured in the schema.xml file
> to use a language specific processing (tokenizin
On Thu, Jan 20, 2011 at 11:29 AM, Paul Libbrecht wrote:
>
> Hello list,
>
> I am hitting a stupid bug where a unit test shows me that QueryParser
> analyzes fierciely anything it finds hence... I have to tune the analyzer to
> not decompose the terms with fields that should be non-analyzed.
>
>
Hello list,
I am hitting a stupid bug where a unit test shows me that QueryParser analyzes
fierciely anything it finds hence... I have to tune the analyzer to not
decompose the terms with fields that should be non-analyzed.
For indexing, you can choose to have something not_analyzed.
For query
No and No.
Alternative approaches might include building a general "contents"
field holding any/all searchable fields or building up the query
yourself. The latter is quite straightforward:
BooleanQuery bq = new BooleanQuery();
PhraseQuery pq1 = ...;
PhraseQuery pq2 = ...;
bq.add(pq1, ...);
Thanks for the answer. That does make sense.
It first gets thru all (not only those which could pass the filter) terms
available and investigates all terms which match any of the wildcard queries.
And that could take quite some time if I got leading wildcard queries.
Guess I'll try another appro
The reason for this is that the filters and other boolean clauses are
applied during result collection. But wildcard query first needs to
investigate all terms that match and this is done before the results are
collected. And this step takes the time (especially before Lucene 4.0).
There is no way
Trying to extend MappingCharFilter so that it only changes a token if
the length of the token matches the length of singleMatch in
NormalizeCharMap (currently the singleMatch just has to be found in the
token I want ut to match the whole token). Can this be done it sounds
simple enough but I c
On Tuesday 18 January 2011 22:04:01 Grant Ingersoll wrote:
Where do you get your Lucene/Solr downloads from?
[x] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an
Hi all
I've got an Index with a few 100k documents and I want to run a rather complex
wildcard (incl. leading wildcards) query on it.
The wildcard query takes about 2 seconds to complete.
Now, I want to limit the items on which the wildcard query will be executed.
Let's say, I want to limit the i
Hi,
During a recent Solr project we needed to index document in a lot of
languages. The natural solution with Lucene and Solr is to define one
field per languages. Each field is configured in the schema.xml file to
use a language specific processing (tokenizing, stop words, stemmer,
...). Th
Where do you get your Lucene/Solr downloads from?
[] ASF Mirrors (linked in our release announcements or via the Lucene website)
[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[X] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors
21 matches
Mail list logo