-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Daniel, thank you very much for the hint!
I stepped through the code and tried some scenarios.
when I type in with whitespace delimiters
~ termA termB
this will result into two invocations of getFieldQuery, one for each term.
when I type
~
Super! Thanks for testing this posting...
Mike
[EMAIL PROTECTED] wrote:
I don't think creating an IndexWriter is very expensive at all.
Ah ok. I tested it. Creating an IndexWriter on an index with 10.000
docs
(about 15 MB) takes about 200 ms.
This is a very cheap operation for me ;)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
So, I stepped throw the QueryParser code further, and I now
have found the source for this behaviour: the QueryParserTokenManager
~System.out.println(This one returns the whole String:);
~String strQuery = home/reuschling;
~
See TermDocs/TermEnum. Or perhaps TermFreqVector. I admit I haven't
used that last, but that family of methods ought to fix you up.
What problem are you trying to solve? Perhaps there are better
solutions to suggest
Best
Erick
On Mon, Feb 25, 2008 at 6:04 PM, Itamar Syn-Hershko [EMAIL
Implementing something like MoreLikeThis for Hebrew. Non-Hebrew
implementations are relevant, but much less accurate since a word like PURIM
can show up in the actual document with initials (LPURIM, BPURIM etc.) or
even with 1-4 letters after it which all reffer to the same term, and then
the
Hi,
I have a very large amount of documents indexed, one field is Brand
(untokenized), now I need to find the most popular brand (which brand
is used by most Docs), one way is:
1) open IndexReader.
2) call terms() to get all terms, then filter out terms in field Brand.
3) call termDocs(Term) to
Hi
I am looking for a way to improve the search performance of my
application. I've followed every suggestion in the Lucene Wiki but the
search is still too slow with large indexes. I was wondering whether
there was a way to restrict a search to a specific time period and in
doing so
Hi,
In Lucene, I'm trying to perform word-level bi-gram query parsing using
NGramAnalyzerWrapper. I'm couldn't get any word pairs in the parsed query
and I was wondering what I should do to make this work. I'm using Lucene
2.2.0
I'm using the files from:
Hi List,
I am pretty new to Lucene. Certainly, it is very exciting. I need to
implement a new Similarity class based on the Term Vector Space Model given
in http://www.miislita.com/term-vector/term-vector-3.html
Although that model is similar to Lucene’s model
Sorry slight correction for the code below: I was actually using the
WhitespaceAnalyzer, not the StandardAnalyzer in constructing the
NGramAnalyzerWrapper.
On Tue, 26 Feb 2008, Stanley Xinlei Wang wrote:
Hi,
In Lucene, I'm trying to perform word-level bi-gram query parsing using
Hi,
I am using a simple java program to test the search speed. The index file is
about 1.93G in size. I initiated an indexsearcher and built a query using
the query parser: parser.parse(entity:fail). The initial run took more
than 60 seconds, but the subsequent runs only took 1.5 seconds. This
Hi Stanley,
I modernized the files in LUCENE-400 a bit - you can see the details in
comments I made on the issue. The results, including all files needed to
address the issue, are in the file attached to the issue named
LUCENE-400.patch.
I can tell you aren't using the modernized version
Yes, I've found a tester!
A patch was submited for this kind of job :
https://issues.apache.org/jira/browse/LUCENE-1190
And here is the svn work in progress :
https://admin.garambrogne.net/subversion/revuedepresse/trunk/src/java/lexicon
And the web version :
Hi Michael
Perhaps this will help. We are using Lucene to index emails and provide
a search interface to search through those emails. Many of our customers
have 3-5 TB's or more of email data. The index size tends to be around 5
GB per million messages. On a 3 GHZ intel core duo with standard
Not to ruin your party, but I'm not sure exactly what this Lexicon object is
for and how it should work. Plus, the requirements I have for analyzing
Hebrew (not only for the MoreLikeThis functionality) are far more demanding
than what is needed for French.
But I'm open to any suggestion on this
So you're saying searches are taking 10 seconds on a 5G index? If so that
seems ungodly slow.
If you're on *nix, have you watched your iostat statistics? Maybe something
is hammering your hds.
Something seems amiss.
What lucene methods were pointed to as hotspots by YourKit?
-M
On Tue, Feb 26,
The first call loads various data structures into memory. The second
takes advantage of those structures being in memory. What you want to
do is warm the searcher by sending some queries to it before making
it available.
-Grant
On Feb 26, 2008, at 3:49 PM, fangz wrote:
Hi,
I am
: Thanks for the advice Chris. What I am working on now is extracting the
: matching phrases. The current code for MultiPhraseQuery and SpanQueries
: just return all matching terms, not matching phrases. I implemented some
: code matching up the TermPositions, but this is pretty slow. Is
Hi Michael,
I guess the hotspot of lucene is
org.apache.lucene.search.IndexSearcher.search()
Hi Jamie,
What's the original text size of a million emails?
I estimate the size of an email is around 100k, is this true?
When you doing search, what kind keywords did you input, words or short
sentence?
Did you use the keywords in two calls?
2008/2/27, fangz [EMAIL PROTECTED]:
Hi,
I am using a simple java program to test the search speed. The index file
is
about 1.93G in size. I initiated an indexsearcher and built a query using
the query parser: parser.parse(entity:fail). The initial
The Lucene prime directive: dont iterate through all of Hits! Its
horribly inefficient. You must use a hitcollector. Even still, getting
your field values will be slow no matter what if you get for every hit.
You don't want to do this for every hit in a search. But don't loop
through Hits.
Hi Jamie,
Are you running concurrent searches on the index i.e. spawning multiple
threads and not handling them?
I have been having similar issues and I am planning to try out a
workaround for it using Java's Interface Executor.
: 1) open IndexReader.
: 2) call terms() to get all terms, then filter out terms in field Brand.
: 3) call termDocs(Term) to get Docs having each special Brand.
: 4) count which term is used by most docs from above result.
:
: Is this the most efficient way?
pretty much ... take a look at the
I guess you can implement createBitSet() more effciently by using
Filer,but not BooleanQuery
2008/2/25, Gabriel Landais [EMAIL PROTECTED]:
Gabriel Landais a écrit :
How to create a Filter for a field in CollectionString?
First, split Collection in CollectionCollection with
24 matches
Mail list logo