I don't think we have a public API for that, but the index is considered
optimized when it contains only a single segment.
Then, we could add the following to IndexReader:
public boolean isOptimized() {
return segmentInfos.size() == 1;
}
I think that should do it.
Otis
- Original Messag
Hi,
For those in or near New York, this coming weekend there will be a geeky event
called BarCampNYC:
http://barcamp.org/index.cgi?BarCampNYC
A few people will be presenting Lucene, Ferret, and related stuff.
I'll be giving away a few copies of Lucene in Action and also presenting Simpy
at ht
I had looked at the document you had listed as well as used a Hex editor to
look at the segment files. .That is how I came to know about the lexicographic
sorting. But was not sure if BTree is used. If I understand correctly a Binary
tree (i.e each node only 2 children) or a high order Ba
Kan,
Some (all?) of what you described will typically be handled for you by the file
system. Yes, the JVM would blow up with a OOM error if the index is too big to
fit in RAM.
Otis
- Original Message
From: Kan Deng <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Cc: Kan Deng <[EM
Hi, there,
In "Lucene in action", it mentions in Section 3.2.3
"reading indexes into memory" that,
"...RAMDirectory's constructor can be used to read a
file system-based index into memory, allowing the
application that accesses it to benefit from the
superior speed of the RAM:
RAMDirectory
A phrase query with slop scores matching documents higher when the
terms are closer together.
"a b c"~1
-Yonik
On 1/10/06, Eric Jain <[EMAIL PROTECTED]> wrote:
> Is there an efficient way to determine if two or more terms frequently
> appear next to each other sequence? For a query like:
>
>
: BooleanQuery query = new BooleanQuery();
: for(Term t: terms)
: {
: query = new TermQuery(t);
: query.add(t, false, false); // ist his wrong?
: }
:
: If I construct the query as a string like "A a OR B b OR C" I get much more
: results. I assume that the Boolean query uses an AND oper
: If you can express each phrase as a SpanNearQuery, the occurrences
: of the phrases can be easily obtained by iterating over the result of
: getSpans() on SpanNearQuery.
: It's not as efficient as a specialized PhraseQuery, though.
I think you are missunderstanding his goal.
(Assuming *I* unde
Hi,
I have got another question... How do I construct a BooleanQuery, where the
terms with the query a connected with OR?
I have a list of term, representing to high scored terms in a document. Here
is my code
BooleanQuery query = new BooleanQuery();
for(Term t: terms)
{
query = new Ter
On Wednesday 11 January 2006 11:33, Eric Jain wrote:
> Paul Elschot wrote:
> > One way that might be better is to provide your own Scorer
> > that works on the term positions of the three or more terms.
> > This would be better for performance because it only uses one
> > term positions object per
Excellent!! Thank you so much!
- Original Message -
From: "Grant Ingersoll" <[EMAIL PROTECTED]>
To:
Sent: Wednesday, January 11, 2006 12:07 PM
Subject: Re: top n words within a results set?
Hey Chris,
There is just such an analyzer, called the PerFieldAnalyzerWrapper. The
trick i
Hey Chris,
There is just such an analyzer, called the PerFieldAnalyzerWrapper. The
trick is the Analyzer always passes in the Field name when it gets the
TokenStream,
-Grant
Chris Brown wrote:
Bear with me, I might be missing something My documents get
indexed ( writer.addDocument(doc
Click on "Source Repository" off of the main Lucene page.
Here is a pointer to the search package containing TermQuery/Weight/Scorer
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/lucene/search/?sortby=file#dirlist
Look in TermQuert for TermWeight (it's an inner class).
Bear with me, I might be missing something My documents get indexed (
writer.addDocument(doc) ) with one IndexWriter created using one Analyzer
(the SnowballAnalyzer). So unless you can somehow use a different Analyzer
per field I don't see how the second field will help. If I get the
TermF
Thx, but where can I find this classes?
>If you really want to understand how scoring works, I'd suggest also
>looking at TermWeight/TermScorer.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL
On 1/11/06, Klaus <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> do you know how the tf und idf values are computed by the default
> similarity? I mean the exact mathematical equation.
Well, here is the default Similarity:
/** Expert: Default scoring implementation. */
public class DefaultSimilarity ex
Hi all,
do you know how the tf und idf values are computed by the default
similarity? I mean the exact mathematical equation.
Thx,
Klaus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PRO
Harini - you won't find a custom analyzer that does exactly what
you've described, but building custom analyzers is pretty
straightforward. You can learn a lot about it by looking at the
pieces within Lucene's source code or the examples (and text) from
Lucene in Action.
Reading from an
On Jan 11, 2006, at 7:23 AM, shailesh kumar wrote:
Does Lucene use a BTree kind of structure for storing the index
(atleast in the memory) .? or is it just a list. Based on the file
format in the index directory ( where in the terms are are
lexicographically sorted in one of the files ) I
Hi Erik,
I had a look at the SpansExtractor class by Mark, that can convert any
Query to spans. But I think ultimately the analyzer that is used to
convert the text in to TokenStream is what is more important. I am using
the StandardAnalyzer and it seems to return a stream of Tokens where
each to
Hello dear Lucene users!
Is their an easy way to check, whether index is optimized or not?
Best regards,
Max
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
I believe the usual solution is to have a separate field on the same
document for display purposes (I am assumming you are trying to display
the values of the indexed field) that is not stemmed. The tradeoff is
in disk space, of course.
Chris Brown wrote:
Okay, I've taken Grant's advice an
Does Lucene use a BTree kind of structure for storing the index (atleast in the
memory) .? or is it just a list. Based on the file format in the index
directory ( where in the terms are are lexicographically sorted in one of the
files ) I am not sure if BTree is used. ( Because constructing a
Okay, I've taken Grant's advice and aggregated the TermFreqVector's for
each term in the applicable field. It works quite well, there's just one
glitch.
Some words like "party" and "picture" appear as "parti" and "pictur". I am
using the SnowballAnalyzer, I suspect that's what's changing the word
Paul Elschot wrote:
One way that might be better is to provide your own Scorer
that works on the term positions of the three or more terms.
This would be better for performance because it only uses one
term positions object per query term (a, b, and c here).
I'm trying to extract the actual phr
Paul Elschot wrote:
On Wednesday 11 January 2006 00:09, Eric Jain wrote:
Is there an efficient way to determine if two or more terms frequently
appear next to each other sequence? For a query like:
a b c
one or more of the following suggestions could be generated:
"a b c"
"a b" c
a "b c"
26 matches
Mail list logo