Terry Steichen wrote:
1) Is there a way to set the query boost factor depending not on the presence of a term, but on the presence of two specific terms? For example, I may want to boost the relevance of a document that contains both "iraq" and "clerics", but not boost the relevance of documents t
Francesco Bellomi wrote:
I agree that synchronization in Vector is a waste of time if it isn't
required,
It would be interesting to see if such synchronization actually impairs
overall performance significantly. This would be fairly simple to test.
but I'm not sure if LinkedList is a better (fas
Andrzej Bialecki wrote:
Karl Koch wrote:
I actually wanted to add a large amount of text from an existing
document to
find a close related one. Can you suggest another good way of doing
this.
You should try to reduce the dimensionality by reducing the number of
unique features. In this case, you
Nicolas Maisonneuve wrote:
in the Similarity Javadoc
score(q,d) =Sum [tf(t in d) * idf(t) * getBoost(t.field in d) *
lengthNorm(t.field in d) * coord(q,d) * queryNorm(q) ]
in the FAQ
score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) *
coord_q_d
In FAQ | In Javadoc
1 / no
setMaxClauseCount determines the maximum number of clauses, which is not
your problem here. Your problem is with required clauses. There may
only be a total of 31 required (or prohibited) clauses in a single
BooleanQuery. If you need more, then create more BooleanQueries and
combine them wit
Try calling IndexReader.getFieldNames().
Karl Koch wrote:
How can I get a list of all fields in an index from which I know only the
directory string?
Karl
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-m
Yes, this is correct.
Peter Keegan wrote:
So they are sorted by reverse document number. Is this the 'external'
document number (the one that is adjusted for the segment's base)? If so,
then this means that documents with equal score are returned in the order in
which they were added to the index.
Nicolas Maisonneuve wrote:
i would like to know
in the IndexReader.document(int i)
what is this number i ?
if the the first document is the oldest document indexed
and the last the youngest ? (so we can sort by date easyly) ?
Yes, documents with lower numbers were indexed earlier. As documen
Chong, Herb wrote:
what effect and what recommendations are valid for Lucene 1.3?
Same as always: use the defaults and call optimize() only when you know
you won't be changing the index for a while.
If you have lots of RAM, increasing minMergeDocs may increase indexing
speed, but raising it too
[EMAIL PROTECTED] wrote:
I would like to get a word frequency list from a text. How can I archive
this in the most direct way using Lucene classes?
Can I do it without generating an index?
No, if you want Lucene to compute frequencies, then you need to create
an index.
Doug
---
A new Lucene release is available.
It can be downloaded from:
http://cvs.apache.org/dist/jakarta/lucene/v1.3-final/
Release notes are at:
http://cvs.apache.org/viewcvs.cgi/*checkout*/jakarta-lucene/CHANGES.txt?rev=1.65
Happy Holidays!
Doug
Jochen,
Someone else recently made a similar, reasonable complaint. I agree
that this should be fixed. The fastest way to get it fixed would be to
submit a patch to lucene-dev, with a test case, etc.
Doug
Jochen Frey wrote:
Hi!
I hope this is the right forum for this post.
I was wondering
Doug Cutting wrote:
That's true. If you're doing updates (as opposed to just additions)
then you probably want to do something like:
1. keep a single open IndexReader used by all searches
2. Every few minutes, process updates as follows:
a. open a second IndexReader
b.
Dror Matalon wrote:
There are two issues:
1. Having new searches start using the new index only when it's ready,
not in a "half baked" state, which means that you have to synchronize
the switch from the old index to the new one.
That's true. If you're doing updates (as opposed to just additions)
If you wish to boost the title field for every query then it would be
easiest to boost the title clause of your query, with Query.setBoost().
Field.setBoost() should only be used when you want to give a field
different boosts in different documents, but since you want to boost all
titles by th
It sounds like you want the value of a stored field (a summary) to be
built from the tokens of another field of the same document. Is that
right? This is not presently possible without tokenizing the field
twice, once to produce its summary and once again when indexing.
Doug
Gregor Heinrich
Your setup sounds good to me.
Scott Smith wrote:
I'm not having a problem. The question is whether I picked a reasonable set
of parameters for what I'm doing.
I have an application which receives messages. Each message averages around
4k bytes and I get, on average, 0-10 every minute. So my app
Stored fields should be able to hold values with up to Integer.MAX_VALUE
characters, as should an indexed term. Can you please provide a
complete, self-contained test case?
Doug
Chong, Herb wrote:
i had an UnIndexed field which was 300 bytes. i changed it to 10,000 bytes. i also had a Text fie
I agree. One should provide Lucene with a unique path in the
filesystem, one that is not intended to be used for any other purpose.
All access to that path should be through Lucene's API. The fact that
Lucene decides to create a directory there rather than a single file is
an implementation d
Position increments are for relative token positions. A position
increment of zero means that a token is logically at the same position
as the previous token. A position increment of one means that a token
immediately follows the preceding token in the stream, it's the next
token to the right
jt oob wrote:
Can I safely delete those files which do not have the prefix listed in
the segments file?
Have a look at the index file format documentation:
http://jakarta.apache.org/lucene/docs/fileformats.html
The only file besides segments that should exist is the "deleteable"
file, and the
Tatu Saloranta wrote:
Also, shouldn't there be at least 3 methods that take Readers; one for
Text-like handling, another for UnStored, and last for UnIndexed.
How do you store the contents of a Reader? You'd have to double-buffer
it, first reading it into a String to store, and then tokenizing t
This looks like a bug. I think your query contains a term in a field
that is not indexed, and hence has no norm value. Perhaps (as Brisbart
Franck) suggests, it is indexed in some documents, but not in others.
But, in a single IndexReader, if it is indexed in any document, it
should have a no
Otis Gospodnetic wrote:
There was discussion about it, yes. I don't think we ever reached any
conclusions, and the powered.html still says 'include the logo'.
Actually, I think we decided to scrap the requirement, but then we never
updated the web site. Here's the message I found:
http://nagoya
Dion Almaer wrote:
Interesting. I implemented an approach which boosted based on the number of months in
the past, and
after tweaking the boost amounts, it seems to do the job. I do a fresh reindex every
night (since
the indexing process takes no time at all... unlike our old search solution!)
I
Kevin A. Burton wrote:
Would there be any performance improvement in query throughput and
latency if locking were disabled for readonly indexes?
The locks are only consulted when opening a new IndexReader. I doubt
very much that you're doing this often enough for this to be significant.
Doug
-
Karsten Konrad wrote:
Now hell would be the place for me where I would have to prove that Lucene's ranking is
exactly equivalent to some transformation of vector space and then using the *cosine* for the
ranking. Can't be really, as Lucene sometimes returns results > 1.0 and only some ruthless
no
Dion Almaer wrote:
The only real item that I still want to tweak more is getting recent results higher in the list.
I was wondering if something like this could work (or if there is a better solution)
At index time, I have the date of the content. I could do some math where the higher
the date
A new Lucene release is available.
It can be downloaded from:
http://cvs.apache.org/dist/jakarta/lucene/v1.3-rc3/
Release notes are at:
http://cvs.apache.org/viewcvs.cgi/*checkout*/jakarta-lucene/CHANGES.txt?rev=1.58
Enjoy!
Doug
---
Andrzej Bialecki wrote:
Now, I'm wondering how do I encode the weight of keywords... If I do the
following:
Field f = Field.Keyword("kw", "value1");
f.setBoost(10.0);
doc.add(f);
f = Field.Keyword("kw", "value2");
f.setBoost(20.0);
doc.add(f);
Now the question is: what is the boost value for the
Tun Lin wrote:
These are the steps I took:
1) I compile all the files in a particular directory using the command:
java org.apache.lucene.demo.IndexHTML -create -index c:\\index ..
, putting all the indexed files in c:\\index.
2) Everytime, I added an additional file in that directory. I need to
ssage-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: 17 November 2003 19:51
To: Lucene Users List
Subject: Re: Which operations change document ids?
Tate Avery wrote:
My first question is: should I steer clear of this all together?
No, I think this is appropriate.
If not, I need to kn
bably won't substantially alter the
ranking. Is 100 long enough? Perhaps not. But 1000 is certainly
plenty long.
Doug
Chong, Herb wrote:
any arbitrary number you pick will be broken by some document someone puts into the system.
Herb
-Original Message-----
From: Doug Cutting [mai
Karsten Konrad wrote:
I was wondering whether we could, while indexing, make a use of this by
increasing the position counter by a large number, let's say 1000,
whenever we encounter a sentence separator (Note, this is not trivial;
not every '.' ends a sentence etc. etc. etc.). Thus, searching
Tate Avery wrote:
My first question is: should I steer clear of this all together?
No, I think this is appropriate.
If not, I need to know which Lucene operations can cause document ids to change.
I am assuming that the following can cause potential changes:
1) Add document
2) Op
Leo Galambos wrote:
There are other (more trivial) problems as well. One geek from UFAL (our
NLP lab) reported, that it was a hard problem to find the boundaries, or
rather, to say whether a dot is a dot or something else, i.e. "blah,
i.e. blah" "i.b.m." "i.p. pavlov" "3.14" "28.10.2003" etc.
O
t have typed capital gains tax. there is psychology of query creation too and that is one thing i am taking advantage of.
Herb
-----Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Friday, November 14, 2003 3:15 PM
To: Lucene Users List
Subject: Re: inter-term correlati
Chong, Herb wrote:
since i am working now on financial news, here is an example:
capital gains tax
if i just run this query against a million document newswire index, i know i am going to get lots of hits. the phrase "capital gains tax" hits a lot fewer documents, but is overrestrictive. the fact
Jie Yang wrote:
In this case, probably using a single RAMDirectory
would allow me to run parallel searching without worry
about disk access. Well, anyone tried to have a
RAMDirectory of 5G in size?
I don't know of a Java implementation which lets you have a heap larger
than 2GB. In my experience,
Jie Yang wrote:
--- Erik Hatcher <[EMAIL PROTECTED]> wrote:
Well, not quite, User normally enters a search string
A that normally returns 1000 out of 2 millions docs. I
then append A with 500 OR conditions... A AND (B or C
or ... or x500).
Are you adding the same 500 terms to each query? Or even
Dan Quaroni wrote:
name:Bob's Discount Furniture AND state:California AND city:San Diego
Now, that query is going to retrieve EVERY Bob's discount furniture, EVERY
company in California, and EVERY city in San Diego and then join them. That
makes the memory requirements for this query far higher t
Dror Matalon wrote:
In there a reason why RODirectory shouldn't just be rolled into lucene?
http://www.csita.unige.it/software/free/lucene/
This just looks like a version of FSDirectory with lock files disabled.
I think it would be better to just make it easier to disable lock
files. Currently
William W wrote:
If I have two indexes and use the MultiSearcher will it be faster than
only one index with all the documents ?
No, in fact it would be slower. However it could be faster if (a)
someone contributes a parallel version of MultiSearcher and (b) you're
either running on a multiple-
Kevin A. Burton wrote:
When I first read this changelog entry:
> 2. Changed file locking to place lock files in
>System.getProperty("java.io.tmpdir"), where all users are
>permitted to write files. This way folks can open and correctly
>lock indexes which are read-only to them.
I
First, note that the approaches you describe will only improve
performance if you have multiple CPUs and/or multiple disks holding the
indexes.
Second, MultiSearcher is currently implemented to search indexes
serially, not each in a separate thread. To implement multi-threaded
searching one c
There was a bug (recently fixed) when creating indexes with over a
couple hundred million documents. So you should use 1.3 RC2, which has
a fix for this bug.
The biggest indexes I've personally created have around 30M documents.
I maintain these as a set of separately updated indexes, then mer
Wilton, Reece wrote:
Does Lucene support exact matching on a tokenized field?
So for example... if I add these three phrases to the index:
- "The quick brown fox"
- "The quick brown fox jumped"
- "brown fox"
I want to be able to do an exact field match so when I search for "brown
fox" I only get t
petite_abeille wrote:
Quick question regarding release note number 11:
What's the difference between IndexWriter.addIndexes(IndexReader[]) and
IndexWriter.addIndexes(Directory[]) beside the fact that one takes an
array of IndexReader and the other an array of Directory? Any functional
differenc
A new Lucene release is available.
It can be downloaded from:
http://cvs.apache.org/dist/jakarta/lucene/v1.3-rc2/
Release notes are at:
http://cvs.apache.org/viewcvs.cgi/*checkout*/jakarta-lucene/CHANGES.txt?rev=1.56
Enjoy!
Doug
---
Erik Hatcher wrote:
Just for fun, I've written a simple stop filter that bumps the position
increments to account for the stop words removed:
But its practically impossible to formulate a Query that can take
advantage of this. A PhraseQuery, because Terms don't have positional
info (only the t
Tate Avery wrote:
You might have trouble with "too many open files" if you set your mergeFactor too high. For example, on my Win2k, I can go up to mergeFactor=300 (or so). At 400 I get a too many open files error. Note: the default mergeFactor of 10 should give no trouble.
Please note that it is
Wilton, Reece wrote:
The index directory that Lucene created has 2,322 files in it. When I
try to open it I get the dreaded "Too Many Open Files" problem:
java.io.FileNotFoundException: C:\Index\_1lvq.f107 (Too many open
files)
The index has about 50,000 docs in it. It was created with a merg
ginal Message -
From: "Doug Cutting" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, September 17, 2003 5:51 PM
Subject: Re: Lucene Scoring Behavior
Terry Steichen wrote:
0.03125 = fieldNorm(field=pub_date, doc=90992)
1.0 = fi
Terry Steichen wrote:
0.03125 = fieldNorm(field=pub_date, doc=90992)
1.0 = fieldNorm(field=pub_date, doc=90970)
It looks like the fieldNorm's are what differ, not the IDFs. These are
the product of the document and/or field boost, and 1/sqrt(numTerms)
where numTerms is the number of terms in
If you're using RangeQuery to do date searching, then you'll likely see
unusual scoring. The IDF of a date, like any other term, is inversely
related to the number of documents with that date. So documents whose
dates are rare will score higher, which is probably not what you intend.
Using a
Killeen, Tom wrote:
My query would look something like this: LongTitle:killeen AND
LongTitle:state AND StateDistrict:id AND FiledDate:["1997-01-01" TO
"2002-04-04"] and it returned in
5.7 seconds
Does anyone have any suggestions for searching date ranges. Our ranges will
generally be between a 3
[EMAIL PROTECTED] wrote:
Are the endOffset, startOffset fields of a Token used in
proximity search and phrase search?
No. There are not used by indexing or search. Their intent is only to
aid the extraction of matching text snippets when displaying results.
Doug
--
Jon Pither wrote:
I have a requirement whereupon I'd like to pull search results back and
split them up based on some keyword field. So for example, says there's
a field named 'category', I'd like to be able to have the results
displayed as such:
Search Results for Category A:
1,
2,
3,
Search
Leo Galambos wrote:
Example: I use this notation: inverted_list_term:{list of W values, "-"
denotes W=0, for 12 documents in a collection}
A:{23[16]--27}
B:{--[38]}
C:{18[2-]45239812}
If your first query is B, the subset of documents (denoted by brackets -
namely, the 3rd and 4th doc)
Erik Hatcher wrote:
Yes, you're right. Getting the scores of a second query based on the
scores of the first query is probably not trivial, but probably possible
with Lucene. And that combined with a QueryFilter would do the trick I
suspect. Somehow the scores of the first query could be reme
Terry Steichen wrote:
PS: If there is general interest in doing some documentation enhancement,
I'd be happy to participate/contribute.
I think there's always room for better documentation. If you have ideas
about this, and, more importantly, time to contribute, please have a go
at it.
Doug
--
Luke Francl wrote:
According to the jGuru FAQ, QueryParser is not thread safe:
http://www.jguru.com/faq/view.jsp?EID=492389
However, this information is several years old. Is this still true?
The answer to the question suggests using a new parser for every thread,
but the QueryParser.parse(Strin
Leo Galambos wrote:
Isn't it better for Dan to skip the optimization phase before merging? I
am not sure, but he could save some time on this (if he has enough file
handles for that, of course).
It depends. If you have 10 machines, each with a single disk, that you
use for indexing in parallel,
The index should be fine. Lucene index updates are atomic.
Doug
Dan Quaroni wrote:
My index grew about 7 gigs larger than I projected it would, and it ran out
of disk space during optimize. Does lucene have transactions or anything
that would prevent this from corrupting an index, or do I need
As the index grows, disk i/o becomes the bottleneck. The default
indexing parameters do a pretty good job of optimizing this. But if you
have lots of CPUs and lots of disks, you might try building several
indexes in parallel, each containing a subset of the documents, optimize
each index and
ter.close()
or IndexReader.close() for deletions) then this will not be a problem.
Doug
Morus Walter wrote:
Doug Cutting writes:
Can I have a lucene index on a NFS filesystem without problems
(access is readonly)?
So long as all access is read-only, there should not be a problem. Keep
in mind however that
is not
thread safe in this regard. Here is a quote from Doug Cutting, the creator
of Lucene:
The problems are only when you add documents or optimize an index, and then
search with an IndexReader that was constructed before those changes to the
index were made.
A possible work around is to perform the
Loo-seen.
Danny Sofer wrote:
...and where does the name come from?
It's my wife's middle name, and her maternal grandmother's first name.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL P
Aviran Mordo wrote:
Is it possible and safe to search an index while another thread adds
documents or optimizes the same index?
Yes.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Morus Walter wrote:
Can I have a lucene index on a NFS filesystem without problems
(access is readonly)?
So long as all access is read-only, there should not be a problem. Keep
in mind however that lock files are known to not work correctly over NFS.
Doug
---
Roger Ford wrote:
I do have another problem: running multi-user tests - four "users"
all firing off queries one after the other - I hit this exception
at the start of one run:
caught a class java.io.IOException
with message: Timed out waiting for
[EMAIL PROTECTED]:\Lucene_Index\Index0001\comm
Wilton, Reece wrote:
Three questions:
- Is it safe to have two IndexWriters open on the same index?
No. It is not safe, and the code makes every attempt to prohibit it.
- Is it safe to have two IndexWriters adding a document concurrently?
No, but you can have two threads adding documents to a sin
Ryan Clifton wrote:
You seem to by implying that it is possible to optimize very large indexes. My index has a couple million records, but more importantly it's about 40 gigs in size. I have tried many times to optimize it and this always results in hitting the Linux file size limit. Is there a
Armbrust, Daniel C. wrote:
If you set your mergeFactor back down to something closer to the default (10) - you probably wouldn't have any problems with file handles. The higher you make it, the more open files you will have. When I set it at 90 for performance reasons, I would run out of file han
Marc Dumontier wrote:
I'm indexing 500 XML files each ~150Mb on an 8 CPU machine.
I'm wondering what the best strategy for making maximum use of resources is. I have the tweaked the single process indexer to index 5000 records (not files) in memory before writing out to disk.
Should i create an I
Jim Hargrave wrote:
I've defined my own collector (I want the raw score before it is normalized between 1.0 and 0.0). For each document I need to know the the matching term positions in the document. I've seen the methods in IndexReader, but how can I access them inside my collect method? Are ther
The HitCollector-based search API is not meant to work remotely. To do
so would involve an RPC-callback for every non-zero score, which would
be extremely expensive. Also, just making HitCollector serializable
would not be sufficient. You'd also need to pass in a HitCollector
implementation
This can be done more efficiently if you only want to enumerate the
terms of a particular field. Term enumerations are ordered first by
field, then by the term text. You can also specify the initial position
of a term enumeration. Thus an efficient enumeration of the terms in
"myField" can b
Eric Jain wrote:
Has anyone ever considered storing binary data into an index? In
particular, serialized objects? This would seem to be a natural solution
in certain situations, and avoids many problems that arise when using a
seperate object store (e.g. Jisp): inconsistencies while updating, and
a
Konrad Kolosowski wrote:
If the index grows to hundred thousand documents, with users simultaneously
searching indexes for different locales, what is the best way to cup the
memory requirement? Limiting number of terms, or number of terms
containing wild cards, or eliminating wild card searches al
Ulrich Mayring wrote:
does anyone know of good stopword lists for use with Lucene? I'm
interested in English and German lists.
The Snowball project has good stop lists.
See:
http://snowball.tartarus.org/
http://snowball.tartarus.org/english/stop.txt
http://snowball.tartarus.org/german/stop
Lixin Meng wrote:
Therefore, it would be preferable to treat all hyphen in the same way.
Either as a delimiter or as part of the word (maybe with a flag at the API).
If we change StandardTokenizer in this way then we risk breaking all the
applications that currently use it and depend on its curren
You should look at the output of your analyzer. Just write a simple
test program, something like:
public static void main(String[] args) throws Exception {
System.out.println("Tokenizing " + args[0]);
Analyzer analyzer = new MyAnalyzer(...);
TokenStream ts = analyzer.tokenStream(ne
Rob Outar wrote:
public synchronized String[] getDocuments() throws IOException {
IndexReader reader = null;
try {
reader = IndexReader.open(this.indexLocation);
int numOfDocs = reader.numDocs();
String[] docs = new String[numOfDocs];
Maik Schreiber wrote:
In an index I have documents with a field that has been constructed using
Field.UnIndexed(). Now I want to switch to Field.Keyword() so I can search for those
fields,
too.
Does it cause any harm if I'm mixing field types like that?
I think this used to throw an exception, bu
There's a new Lucene release available for download.
See the website for details:
http://jakarta.apache.org/lucene/docs/index.html
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTE
Morus Walter wrote:
Searches must be able on any combination of collections.
A typical search includes ~ 40 collections.
Now the question is, how to implement this in lucene best.
Currently I see basically three possibilities:
- create a data field containing the collection name for each document
Rishabh Bajpai wrote:
I am getting a long value between 1(included) and 0(excluded-I think), and it makes sense to me logically as well - I wouldnt know what a value of greater than 1 would mean, and why should a term that has a score of 0 be returned in the first place! But just to be sure, I want
Ching-Pei Hsing wrote:
Even if we boost the Name by 10 like the following query, It's still the
same.
query = (NAME:inn NAME:comfort NAME:shampoo)^10 (MMNUM:inn MMNUM:shampoo
MMNUM:comfort) (SMNUM:shampoo SMNUM:comfort SMNUM:inn)
In the 1.2 release, I don't think this sort of boosting (of a compl
Joseph Ottinger wrote:
Then this means that my IndexReader.delete(i) isn't working properly. What
would be the common causes for this? My log shows the documents being
deleted, so something's going wrong at that point.
Are you closing the IndexReader after doing the deletes? This is
required for
Joseph Ottinger wrote:
I've got a versioning content system where I want to replace documents in
a lucene repository. To do so, according to the FAQ and the mailing list
archives, I need to open an IndexReader, look for the document in
question, delete it via the IndexReader, and then add it.
This
'll start
experimenting shortly.
Regards,
Terry
- Original Message -
From: "Doug Cutting" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, February 10, 2003 1:57 PM
Subject: Re: Computing Relevancy Differently
Terry Steichen
These sort of tricks can help things some if index i/o is really your
bottleneck. Are you convinced that it is? When i/o is a bottleneck the
CPU typically spends a large portion of its time idle. Do you see this?
From your description (indexing ~300k 5k documents takes over 24 hours)
I would
I doubt this will make Lucene much faster, since Lucene already
implements buffering in its InputStream and OutputStream classes. So
Lucene already has this optimization built-in.
Doug
Andrzej Bialecki wrote:
Hello,
Since you are trying this anyway, and looking for ways to improve
indexing t
Andrzej Bialecki wrote:
Do you think it would be possible/feasible to modify the
searching classes so that they create Explanations at the same time I'm
running the query?
That's not feasable because it would slow down query execution too much.
Doug
-
Check out the new Explanation API in the latest CVS sources. It permits
one to get a detailed explanation of how a query was scored against a
document. Note that these explanations are designed for user perusal,
not for further computation, and are as expensive to construct as
re-running the
Please send Lucene-related messages to just one of lucene-user or
lucene-dev, *not* both.
The lucene-dev list should be considered a subset of the lucene-user
list that is concerned with the development of lucene. Things that
should be sent to this list are:
. reproducible bug reports, compl
Mailing Lists Account wrote:
Doug Cutting wrote:
That's because Google and most internet search engines never do any
stemming.
Generally speaking, are there any advantages not to apply the stemmer ?
Except for certain keywords,I found use of stemmers helpful.
Generally speaking, ste
Mailing Lists Account wrote:
I use PorterStemmer with my analyzer for indexing the documents.
And I have been using the same analyzer for searching too.
When I search for a phrase like "security" AND database, I would like to
avoid matches for
terms like "secure" or "securities" . I observed tha
The RemoteSearchable class (in the latest CVS) will let you do this. It
uses Java RMI to let you search indexes on other machines. With a
MultiSearcher you can then search a number of independently maintained
indexes on different machines. MultiSearcher searches indexes serially,
but it woul
201 - 300 of 458 matches
Mail list logo