Erik Hatcher wrote:
When Doug is cool with re-enabling the redirect, it's fine with me.
I'm cool with it if it works. Why not re-enable it, search for
"site:apache.org lucene" on Google, Yahoo! and MSN, and click on the
first few links. If these work, then I'm okay with the redirect.
As we cha
Henri Yandell wrote:
Redirect of jakarta.apache.org/lucene to lucene.apache.org/java/docs/index.html
I noticed there's a commented out redirect in the .htaccess, so after
adding my own I deleted it again and left the redirect off for the
moment. Unsure if there's a reason the commented out bit is t
Kevin A. Burton wrote:
Doug Cutting wrote:
Wolf Siberski wrote:
So, if anything at all, I would rather opt for making these constants
private :-).
I agree. In general, fields should either be final, or private with
accessor methods. So, we could change this to:
private static int
Kevin A. Burton wrote:
Wolf Siberski wrote:
Kevin A. Burton wrote:
I see following issues with your patch:
- you changed the DEFAULT_... semantics from constant to modifiable,
but didn't adjust the names according to Java conventions
(default_...).
Java doesn't have any naming conventions which
Garrett Rooney wrote:
Actually, currently we've got both lucene4c and java commits going to
[EMAIL PROTECTED], and there was some talk of just leaving it
that way, since it isn't that much traffic and it encourages people to
keep an eye on what's going on in other languages.
I think that's a bad
Henri Yandell wrote:
Your download page is already separate, you're using the global closer.cgi file.
So we need to:
- rename Lucene Java's mailing lists, with forwards put into place.
- add a mailing list page to Lucene Java's website, modelled after
http://jakarta.apache.org/site/mail2.html#Luce
Attached is a patch which delays reading of index terms until it is
first accessed. The cost of this is another file descriptor, until the
terms are accessed, when it is closed. The benefit is that operations
that do not require access to index terms are much faster and use much
less memory.
Author: cutting
Date: Fri Feb 25 09:39:02 2005
New Revision: 155349
URL: http://svn.apache.org/viewcvs?view=rev&rev=155349
Log:
Added accessor methods, as suggested by Kevin Burton.
Modified:
lucene/java/trunk/src/java/org/apache/lucene/search/IndexSearcher.java
lucene/java/trunk
Doug Cutting wrote:
public static int getDefaultMergeFactor() {
return mergeFactor;
}
Oops. That should be 'return defaultMergeFactor'.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-ma
Wolf Siberski wrote:
So, if anything at all, I would rather opt for making these constants
private :-).
I agree. In general, fields should either be final, or private with
accessor methods. So, we could change this to:
private static int defaultMergeFactor =
Integer.parseInt(System.getPropert
Kevin A. Burton wrote:
I *realize* that there are other ways to do this but we have some legacy
code that can't be rewritten right now. Thus the change to protected
and using a reloadable implementation.
Changing Lucene's API to be back-compatible with an altered version of
Lucene is not a com
Kevin A. Burton wrote:
Also, I assume that the reason you make the reader field protected is
because getReader() is not sufficient, i.e., you want to set the
reader. This would stylistically be better done with a setReader()
method, no? Do you only change it at construction, or at runtime? If
Kevin A. Burton wrote:
You know ... the javadoc on the site doesn't include non-public classes
like TermInfosWriter. Confused me for a second.
That's because it's not public. The javadoc on the site is to document
the public api. This is not a bug, but a feature.
Also.. the site doesn't hav
If you add things that will appear in the javadoc then they need javadoc
comments.
Also, I assume that the reason you make the reader field protected is
because getReader() is not sufficient, i.e., you want to set the reader.
This would stylistically be better done with a setReader() method, n
[EMAIL PROTECTED] wrote:
Log:
fix broken links to source
- FileDocument.java
contains
+ http://svn.apache.org/repos/asf/lucene/java/trunk/src/demo/org/apache/lucene/demo/FileDocument.java";>FileDocument.java
contains
This makes the docs point to the current version of the code, rather
than a ver
Wolf Siberski wrote:
The price is an extension (or modification) of the
Searchable interface. I've added corresponding search(Weight...) methods
to the existing search(Query...) methods and deprecated the latter.
I think this is the right solution.
If Searchable is meant to be Lucene internal, then
Paul Elschot wrote:
Would you mind if some pieces of your reply end up in the
javadocs?
Not at all.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Wolf Siberski wrote:
Now I found another solution which requires more changes, but IMHO is
much cleaner:
- when a query computes its Weight, it caches it in an attribute
- a query can be 'frozen'. A frozen query always returns the cached
Weight when calling Query.weight().
Orignally there was no
George Aroush wrote:
Any thoughts on Lucene.Net/dotLucene package name are welcome.
I agree that Lucene.Net is a better name. It's more consistent with
Lucene Java and Lucene4c, the names for other ports of Lucene. I think
it's okay to reclaim the name of an abandonded project, especially if
t
+1
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
+1
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Daniel Naber wrote:
could someone (Doug?) make me an administrator for the old Lucene project
at sourceforge?
Done.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Henri Yandell wrote:
On names, Lucene Java might hit trademark issues I guess. So potential
worry there.
Good point. Although I note that Apache already has projects called
"Xerces Java" and "Xalan Java". Sun says:
http://www.sun.com/policies/trademarks/#20c
So, technically, the fullname of the
[ Please ignore my previous message. I somehow hit "Send" before typing
anything! ]
Oscar Picasso wrote:
However with a relatively high number of random insertions, the cost of the
"new IndexWriter / index.close()" performed for each insertion is two high.
Did you measure that? How much slower
Oscar Picasso wrote:
Hi,
I am currently implementing a Directory backed by a Berkeley DB that I am
willing to release as an open source project.
Besides the internal implementation, it differs from the one in the sandbox in
that it is implemented with the Berkeley DB Java Edition.
Using the Java E
Erik Hatcher wrote:
I've amended my request for e-mail lists here with Doug's preference:
http://issues.apache.org/jira/browse/INFRA-195
Do others agree this is the best approach? I don't mean to be
autocratic. Do we imagine different pools of users and developers for
different Lucene sub-p
Bernhard Messer wrote:
Doug, you placed a copy of the website in the "java" directory. In both,
the original and the java directory the "api" directory is missing. I
can't copy it into because of the access rights :-(
Argh. The group protection is 'lucene', as it should be, but you're not
in 'l
Doug Cutting wrote:
And we also want to try not to break URLs when we move things. For this
reason it's best to move things as few tims as possible, so that we
don't end up with a confusing set of redirects.
More to the point, we also want to try not to break email addresses. So
Erik Hatcher wrote:
It also might be a good time to think about mailing list names. There
was a request on infrastructure@ to move [EMAIL PROTECTED] to
[EMAIL PROTECTED], would it make more sense to move it to [EMAIL PROTECTED]
NOW you tell me :)
I think until we have these elusive other l
Erik Hatcher wrote:
I'm really at the limit of my bandwidth - I've got the sandbox
restructuring effort on my plate right now and would like it if someone
could pick up the ball on the web site side of things.
Then perhaps you shouldn't have redirected everything to
lucene.apache.org...
We need
Garrett Rooney wrote:
Agreed. Java Lucene is a subproject of the Lucene TLP, leaving the
existing Java Lucene site there for the time being seems ok, just so we
have something there, but we should endeavour to put up something more
permanent ASAP.
I think, for the present, http://lucene.apache.
Garrett Rooney wrote:
Additionally it would be good to work on updating the disk format
documentation, I've found several cases where the docs are quite out of
date compared to the current code. It's hard to expect the various
different ports to maintain compatibility when the formats are only
Otis Gospodnetic wrote:
lucene.apache.org seems to work now.
Here is the query syntax:
http://lucene.apache.org/queryparsersyntax.html
We should be cautious in promoting lucene.apache.org urls until we have
this structured correctly. Let's stick with calling this
http://jakarta.apache.org/luce
Erik Hatcher wrote:
I have checked out our current site to the lucene.apache.org area, and
I've also set up a redirect from the jakarta.apache.org/lucene area.
Keep in mind, there are two projects here:
1. Porting Java Lucene's site to Forrest. This should be structured as
a sub-project of luce
Erik Hatcher wrote:
Doug - do you have your Forest work handy? Or has anyone else stepped
up to build the web site?
I don't have anything reusable. I converted Nutch from a different (not
Anakia) XML-based site to Forrest with little difficulty (mostly using
string replace in Emacs).
I start
Paul Elschot wrote:
I learned a lot by adding some javadocs to such classes. I suppose Doug
added the Expert markings, but I don't know their precise purpose.
The "Expert" declaration is meant to indicate that most users should not
need to understand the feature. Lucene's API seeks to be both sim
Erik Hatcher wrote:
Also, we should package a lucene-XX-all.zip/.tar.gz that includes all
the contrib pieces also allowing someone to simply download Lucene and
all the packaged contrib pieces at once.
I'll go further: that should be the only download. We should avoid
having a bunch of differen
So, now that we've got the sandbox in the same source tree let's decide
what we want to do with it. I have previously argued that we should
make sure that sandbox code should be tagged and released in parallel
with core code (http://tinyurl.com/5d6tx). Now that should be easy.
But how should
Author: cutting
Date: Fri Feb 4 11:09:53 2005
New Revision: 151390
URL: http://svn.apache.org/viewcvs?view=rev&rev=151390
Log:
Fix for bug #32847. Use uncached access to norms when merging to
minimize RAM usage.
Modified:
lucene/java/trunk/CHANGES.txt
lucene/java/trunk/src/java
Erik Hatcher wrote:
Hmmm good point. I hadn't considered access control. A migration
will be performed later today, and I think it will initially be a test
migration for me to verify. I'll double-check with Justin, who's doing
the conversion, on how access control will be initially config
Erik Hatcher wrote:
On Feb 1, 2005, at 3:13 PM, Doug Cutting wrote:
I think we want Java Lucene to be a sub-project of Lucene. So the
repository should be something like:
https://svn.apache.org/repos/asf/lucene/java
I already put in the request for this initial svn structure:
/asf/lucene
Mario Alejandro M. wrote:
What is necesary to be part of this? I'm porting Lucene to Delphi ...
Existing projects generally enter Apache through the incubator:
http://incubator.apache.org/
Doug
-
To unsubscribe, e-mail: [EMAIL PROT
Erik Hatcher wrote:
We need website work. Doug mentioned (maybe this was in regards to
Nutch, though) that he's used Forest.
Yes, I used Forrest for the new Nutch site.
The source for the site is in:
http://svn.apache.org/repos/asf/incubator/nutch/trunk/src/site/
The generated site is at:
http://
Erik Hatcher wrote:
The decision was a bit slow to get out, but Lucene has been approved for
TLP.
Thanks for pushing this through!
I
propose we simply import our two CVS repositories in with all of
jakarata-lucene as the root of the repository and jakarta-lucene-sandbox
under "sandbox" in the r
David Spencer wrote:
Let's start with the issue that's been raised so much: whether idf is
better defined with log() or sqrt(log()).
I can redo my page and rebuild indexes if necessary, I just need it
clarified what we want to do, esp -> does the index need to be rebuilt?
The index needs to be r
Chuck Williams wrote:
> So I think this can be implemented using the expansion I proposed
> yesterday for MultiFieldQueryParser, plus something like my
> DensityPhraseQuery and perhaps a few Similarity tweaks.
I don't think that works unless the mechanism is limited to default-AND
(i.e., all
David Spencer wrote:
+(f1:t1^2.0 t1) +(f1:t2^2.0 t2) f1:"t1 t2"~5^3.0 "t1 t2"~2^1.5
(f1:t1^2.0 t1) (f1:t2^2.0 t2) f1:"t1 t2"~5^3.0 "t1 t2"~2^1.5
(f1:t1^2.0 t1) (f1:t2^2.0 t2) (f1:t3^2.0 t3) (f1:t4^2.0 t4) (f1:t5^2.0
t5) f1:"t1 t2 t3 t4 t5"~5^3.0 "t1 t2 t3 t4 t5"~2^1.5
This looks great to me! I'd
Chuck Williams wrote:
Doug Cutting wrote:
> What did you think of my DensityPhraseQuery proposal?
It is a step in the direction of what I have in mind, but I'd like to go
further. How about a query class with these properties:
1. Inputs are:
a. F = list of fields
b. B =
Chuck Williams wrote:
That expansion is scalable, but it only accounts for proximity of all
query terms together. E.g., it does not favor a match where t1 and t2
are close together while t3 is distant over a match where all 3 terms
are distant. Worse, it would not favor a match with t1 and t2 in
David Spencer wrote:
But what is right if there are > 2 terms in terms of the phrases - does
it have a phrase for every pair of terms like this (ignore fields and
boosts and proximity for a sec):
search for "t1 t2 t3" gives you these phrases in addition to the direct
field matches:
"t1 t2"
"t2
Chuck Williams wrote:
I think the differences are pretty clear as the systems stands. Notice
a substantial difference in the idf's in the respective explanations. I
continue to think the current mechanism weights these too high,
primarily due to its squaring.
The other big difference occurs when
Doug Cutting wrote:
It would translate a query "t1 t2" given fields f1 and f2 into
something like:
+(f1:t1^b1 f2:t1^b2)
+(f2:t1^b1 f2:t2^b2)
Oops. The first term on that line should be "f1:t2", not "f2:t1":
+(f1:t2^b1 f2:t2^b2)
f1:"
David Spencer wrote:
I worked w/ Chuck to get up a test page that shows search results with 2
versions of Similarity side by side.
David,
This looks great! Thanks for doing this.
Is the default operator AND or OR? It appears to be OR, but it should
probably be AND. That's become the industry s
Christoph Goller wrote:
The similarity specified for the search has to be modified so that both
idf(...) AND queryNorm(...) always return 1 and as you say everything
except for tf(term,doc)*docNorm(doc) could be precompiled into the boosts
of the rewritten query. coord/tf/sloppyFreq computation wo
Chuck Williams wrote:
Christoph Goller writes:
> You may be right. But I am not completely convinced. I think
> this should be decided based on the proposed benchmark evaluation.
Is that still happening?
Like anything else in an all-volunteer operation, it will only happen if
folks volunteer t
Maybe we should just call it lucene.apache.org, and move the current
Lucene project to lucene.apache.org/java? The other projects we imagine
adding (Nutch, DotLucene, CLucene, etc.) are all Lucene-related, no?
Lucene has a pretty good brand name...
Doug
Otis Gospodnetic wrote:
ir.apache.org is
Erik Hatcher wrote:
The questions still remain, though, and lawyers do want to know the
answers:
- How did JDK code get into Lucene's codebase to begin with?
I put it there in a moment of ignorance way back as a hack in order to
make things run in an older version of the JVM.
http://cvs.sourc
Chuck Williams wrote:
Doug Cutting wrote:
> It would indeed be nice to be able to short-circuit rewriting for
> queries where it is a no-op. Do you have a proposal for how this
could
> be done?
First, this gets into the other part of Bug 31841. I don't believe
MultiSearche
Wolf Siberski wrote:
Doug Cutting wrote:
So, when a query is executed on a MultiSearcher of RemoteSearchables,
the following remote calls are made:
1. RemoteSearchable.rewrite(Query) is called
After that step, are wildcards replaced by term lists?
Yes.
I haven't taken a look at the re
Chuck Williams wrote:
If auto-filters can provide an effective implementation for RangeQuery's
that avoids rewriting, and we can give up MultiTermQuery and PrefixQuery
in the distributed environment, then how about something like this
refinement:
1. No rewriting is done.
It would indeed be nice
Chuck Williams wrote:
It just seems like a lot of IPC activity for each query. As things
stand now, I think you are proposing this?
1. MultiSearcher calls the remote node to rewrite the query,
requiring serialization of the query.
2. The remote node returns the rewritten query to the dispatc
Chuck Williams wrote:
I think there is another problem here. It is currently the Weight
implementations that do rewrite(), which requires access to the index,
not just to the idf's. E.g., RangeQuery.rewrite() must find the terms
in the index within the range. So, the Weight cannot be computed in
Wolf Siberski wrote:
Yes, I agree. I just wanted to point out that the current Weight
implementations need to be modified heavily to introduce the
behaviour you describe above. For example, take a look at
TermQuery.TermWeight.scorer():
[...]
return new TermScorer(this, termDocs, getSimilarity
Wolf Siberski wrote:
Chuck Williams wrote:
This is a nice solution! By having MultiSearcher create the Weight, it
can pass itself in as the searcher, thereby allowing the correct
docFreq() method to be called. This is similar to what I tried to do
with topmostSearcher, but a much better way to do
Chuck Williams wrote:
There needs to be a way to create the aggregate docFreq table and keep
it current under incremental changes to the indices on the various
remote nodes.
I think you're getting ahead of yourself. Searchers are based on
IndexReaders, and hence doFreqs don't change until a new S
Chuck Williams wrote:
I was thinking of the aggressive version with an index-time solution,
although I don't know the Lucene architecture for distributed indexing
and searching well enough to formulate the idea precisely.
Conceptually, I'd like each server that owns a slice of the index in a
distri
Chuck Williams wrote:
This is a nice solution! By having MultiSearcher create the Weight, it
can pass itself in as the searcher, thereby allowing the correct
docFreq() method to be called.
Glad to hear it at least makes sense... Now I hope it works!
I'm still left wondering if having MultiSearcher
Chuck Williams wrote:
As Wolf does, I hope a committer with deep knowledge of Lucene's design
in this area will weigh in on the issue and help to resolve it.
The root of the bug is in MultiSearcher.search(). This should construct
a Weight, weight the query, then score the now-weighted query.
Her
Terry Steichen wrote:
Would it be
possible to optimize the operation to use 1.4 runtime features but
retain the option, if desired to run in a legacy (1.3) environment,
perhaps in a degraded mode?
Lucene 1.4.3 is a "degraded" mode, no?
There are still back-compatibility issues. To be safe, Luce
Sigh. This stuff would get a lot simpler if we were able to use Java
1.4's FileLock. Then locks would be automatically cleared by the OS if
the JVM crashes.
Should we upgrade the JVM requirements to 1.4 for Lucene's 1.9/2.0
releases and update the locking code?
Doug
Luke Shannon wrote:
Here
Paul Elschot wrote:
Filters are more efficient than query terms for many
I think there are two reasons for the peformance gain:
- having things in RAM, eg. the bits of a filter after it is computed once,
- being able to search per field instead of per document.
Also, bit-vectors are constant-time
[EMAIL PROTECTED] wrote:
--- Additional Comments From [EMAIL PROTECTED] 2005-01-06 20:13 ---
Patch to IndexSearcher.java to use FilteredQuery
I like where this is going, and want to take it further!
Why not patch Searcher.java instead of IndexSearcher.java? Once that's
done, Filters coul
Bernhard Messer wrote:
Why not implementing a small utility class, f.e CompoundFileUtil.java
within the org.apache.lucene.index Package ? This class could be public
and implement the necessary functionality. This is what i would prefer,
because we don't have to change the visibility of CompoundF
Bernhard Messer wrote:
I understand the technical reason for main() there, but logically this
belongs to an external utility class, I think.
Otis you are right, i already thought about it. It could be simply moved
to a newly created class in org.apache.lucene.util package. But then we
have to cha
Erik Hatcher wrote:
I think Forrest is the right way to go - but I've not experience with it
myself.
I recently developed a Nutch site for the Incubator using Forrest (not
yet published). It was easy and pleasant. A nice feature is 'forrest
run' which permits one to edit xml source and then vi
markharw00d wrote:
If we intend to make more use of filters this may be an appropriate time
to raise a general question I have on their use. Is there a danger in
tieing them to a specific implementation (java.util.BitSet)?
I do not object in principal to replacing BitSet with an interface, e.g.
Filters are more efficient than query terms for many things. For
example, a RangeFilter is usually more efficient than a RangeQuery and
has no risk of triggering BooleanQuery.TooManyClauses. And Filter
caching (e.g., with CachingWrapperFilter) can make otherwise expensive
clauses almost free, after
It would be useful to have a command-line utility (i.e., a static
main(String[]) method somewhere) that lists the files and sizes
contained inside a CFS file, and perhaps even an option to unpack it.
Anyone care to contribute this method?
Doug
---
David Spencer wrote:
And my feeling is that in the context of machine-generated pages, Page
Rank doesn't help that much.
It's better than random. It correctly identified overview-summary as
the best "home page" for the collection in both cases. It also
identified some core classes (IndexReader
Garrett Rooney wrote:
The "least effort" way of doing that would be to include both the core
and sandbox under the same trunk, but again, that implies that you
ALWAYS tag and branch them together, and sometimes you may not want to
do that.
I think we should always branch these together. To my t
Chuck Williams wrote:
Finally, I'd suggest picking content that has multiple fields and allow
the individual implementations to decide how to search these fields --
just title and body would be enough. I would like to use my
MaxDisjunctionQuery and see how it compares to other approaches (e.g.,
th
Chuck Williams wrote:
Another issue will likely be the tf() and idf() computations. I have a
similar desired relevance ranking and was not getting what I wanted due
to the idf() term dominating the score. [ ... ]
Chuck has made a series of criticisms of the DefaultSimilarity
implementation. Unfo
Dan Climan wrote:
Shouldn't the call to Similarity.decodeNorm be replaced with a call to
Similarity.getDefault().decodeNorm
decodeNorm is a static method.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-
Daniel Naber wrote:
I'm aware that the "Wildcard" name won't
fit well anymore, suggestions for a better name are welcome.
"Expanded"?
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECT
Murray Altheim wrote:
I thought I'd have a go at the
Lucene logo, not to change it markedly but clean it up so that it
is based on an existing font. This potential Lucene logo is based
on an ITC font called Magneto Bold Extended, which you can see here:
http://www.identifont.com/show?72W
I modi
Christoph Goller wrote:
I think we should change BooleanScorer. An easy way would be to sort the
bucket
list before it is used. Do you think that would affect performance
dramatically?
I think it would make it slower.
Otherwise we should reimplement BooleanScorer. I haven't looked into the
Disjun
Christoph Goller wrote:
Doug, could you please move api/ to api.old/ and api.new/ to api/
Done.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Christoph Goller wrote:
I think i should finally make Release 1.4.3.
Great!
I presume the default.properties does no longer exist. I just fill in
"1.4.3" as version in the build.xml before building it. Is this ok?
I build releases with something like:
ant -Dversion=1.4.3 clean dist
So that it doe
Guillermo Payet wrote:
The fact that Lucene stores and indexes (or seems it seems) all terms
as Strings and that there is no NumericTerm makes me think that I
might be missing something and that this migh be a much bigger deal
than I think?
You could write a HitCollector that uses
FieldCache.get
Erik Hatcher wrote:
On Oct 20, 2004, at 12:14 PM, Doug Cutting wrote:
The advantages of a zero-character prefix default are that it's
back-compatibile and that it will find more matches, when spelling
differences are in the first characters.
I prefer this default.
Anyone using QueryParser
Chuck Williams wrote:
However, I'm not sure this analysis is completely correct due to MultiSearcher.docFreq() which appears to be trying to redefine the tf's to be the global value across all indices. It wasn't clear to me how this code is ever reached, e.g. from TermQuery --> SegmentTermDocs. I
Dan Climan wrote:
TermEnum terms = ir.terms();
int numTerms = 0;
while (terms.next())
{
Term t = terms.term();
if (t.field().equals("FullText"))
numTerms++;
}
Daniel Naber wrote:
On Tuesday 12 October 2004 17:22, Doug Cutting wrote:
Which is worse: a person who searches for Photokopie~ in a 1000 document
collection does not find documents containing Fotokopie; or a person who
searches for Photokopie~ in a 1M document collection doesn't find
any
Paul Elschot wrote:
I have a DisjunctionScorer based on a PriorityQueue lying around,
but I can't benchmark it myself at the moment. In case there is
interest, I'll gladly adapt it to org.apache.lucene.search and
add it in bugzilla.
This should look a lot like SpanOrQuery.getSpans().
On a related
Christoph Goller wrote:
With the current scorer API one could get rid of buckettable and
advance all subscores only by one document each time. I am not sure
whether the bucketable implementation is really much more efficient.
I only see the advantage of inlining some of the scorer.next and
score.sc
Daniel Naber wrote:
Searching for Photokopie~ on a 230,000 document corpus takes 2.3 seconds here
(AMD Athlon 2600+; other fuzzy terms get similar performance). As the number
of terms doesn't increase so fast with more documents, it will not take 10
seconds for 1 million documents. So fuzzy sear
Daniel Naber wrote:
On Tuesday 12 October 2004 17:22, Doug Cutting wrote:
Which is worse: a person who searches for Photokopie~ in a 1000 document
collection does not find documents containing Fotokopie; or a person who
searches for Photokopie~ in a 1M document collection doesn't find
any
+1
Christoph Goller wrote:
I would like to propose Bernhard as Lucene committer.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Chuck Williams wrote:
That's a good point on how the standard vector space inner product
similarity measure does imply that the idf is squared relative to the
document tf. Even having been aware of this formula for a long time,
this particular implication never occurred to me. Do you know if
anyb
Bernhard Messer wrote:
Christoph Goller wrote:
Bernhard Messer wrote:
Currently there are 3 different methods available to get the field
names from an index.
a) getFieldNames();
b) getFieldNames(boolean indexed);
c) getIndexedFieldNames(boolean storedTermVector);
my proposal is to deprecate a), b
1 - 100 of 815 matches
Mail list logo