On Mar 26, 2004, at 2:20 AM, Morus Walter wrote:
Erik Hatcher writes:
Why not do the unique sequential number replacement at index time
rather than query time?
how would you do that? This requires to know the ids that will be added
in future.
Let's say you start with strings 'a' and 'b'. Later you
.
Would these recommendations work for this or should I upgrade to
lucene 1.3.
In doing so, I'm not sure if a rewrite of the docSearcher will be
necessary or
not.
Daniel Naber wrote on 3/26/04:
Try IndexWriter.setUseCompoundFile(true) to limit the number of files.
Erik Hatcher 3/26/2004 2:32
On Mar 26, 2004, at 1:33 PM, Chad Small wrote:
Is this :) serious?
This is open-source. I'm only as serious as it would take for someone
to push it through. I don't know what the timeline is, although lots
of new features are available.
Because we have a need/interest in the new field
(true) to limit the number of files.
Erik Hatcher 3/26/2004 2:32:16 AM
If you are using Lucene 1.3, try using the index in compound format.
You will have to rebuild (or convert) your index to this format. The
handy utility Luke will convert an index easily.
Erik
On Mar 25, 2004, at 9:34 PM
On Mar 26, 2004, at 7:20 PM, Kevin A. Burton wrote:
Chad Small wrote:
Is this :) serious? Because we have a need/interest in the new field
sorting capabilities
URL to documentation for field sorting?
Geez, you want documentation also? :)
Try the JUnit test cases for starters. That is the
So far so good, Stephane, on the wiki changes - looks good!
As for our book - at this point, early summer seems like when it'll
actually be on the shelves. By the end of April we should have mostly
everything complete, reviewed, and entirely in the publishers hands.
*ugh* - this process
On Mar 26, 2004, at 8:16 PM, Stephane James Vaucher wrote:
Erik, maybe Otis and yourself should slow down on development. You
wouldn't want your book to discuss lucene-1.3 if you release a version
1.5
before it hits the stores... unless that's your master plan;)
It will cover the new Lucene 1.4
Why not do the unique sequential number replacement at index time
rather than query time?
Erik
On Mar 25, 2004, at 6:26 PM, Eric Jain wrote:
I will need to have a look at the code, but I assume that in
principal it should be possible to replace the strings with
sequential integers once the
On Mar 24, 2004, at 5:58 PM, Morris Mizrahi wrote:
I think the custom analyzer I created is not properly doing what a
KeywordAnalyzer would do.
Erik, could you please post what KeywordAnalyzer should look like?
It should simply tokenize the entire input as a single token. Incze
Lajos posted a
QueryParser and Field.Keyword fields are a strange mix. For some
background, check the archives as this has been covered pretty
extensively.
A quick answer is yes you can use MFQP and QP with keyword fields,
however you need to be careful which analyzer you use.
PerFieldAnalyzerWrapper is a
How exactly would you take advantage of a subclassable Hits class?
On Mar 21, 2004, at 6:01 AM, Terry Steichen wrote:
Does anyone know why the Hits class is final (thus preventing it from
being subclassed)?
Regards,
Terry
-
removing the final
attribute(s)?
Regards,
Terry
- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, March 22, 2004 7:06 AM
Subject: Re: Final Hits
How exactly would you take advantage of a subclassable Hits class?
On Mar 21, 2004
On Mar 16, 2004, at 8:39 PM, [EMAIL PROTECTED] wrote:
My experience tells me that CJKAnalyzer needs to be improved
somehow
For example, single word X* search works perfectly, however,
multiple words wildcard XX* never works.
Well, in this case it is QueryParser, not the analyzer, as the
Try setting the slop factor on your phrase query. This should
accomplish what you want. Set it to something like 10 and see what you
get.
Erik
On Mar 16, 2004, at 8:55 PM, Supun Edirisinghe wrote:
I have a field called buisnessname and this field contains keywords
like
Georgian House
Have a look at the Ant index task in the Lucene sandbox. You're on
your own, currently, to build this and understand it, but I use it
frequently. In fact, the sample index from our book is generated with
this:
index index=${build.dir}/index
To be honest, I'm way out of the loop of the demo and needs to be
re-written. It is on my to-do list!
But, date range and proximity searches most definitely work. Can you
be more specific about what you index and how you searched? Perhaps
even a working test case?
Erik
On Mar 14, 2004,
: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Sunday, March 14, 2004 4:00 PM
Subject: Re: Date Range and proximity search
To be honest, I'm way out of the loop of the demo and needs to be
re-written. It is on my to-do list!
But, date range and proximity searches most
On Mar 13, 2004, at 6:02 AM, Morus Walter wrote:
Otis Gospodnetic writes:
Field.Keyword is suitable for storing data like Url. Give that a try.
Hmm. I don't think keyword fields can be used with query parser,
which is probably one of the problems here.
He did try keyword fields.
Look in the
, 2004, at 12:04 PM, Doug Cutting wrote:
Erik Hatcher wrote:
Yes, I saw it. But is there a reason not to just expose HashSet
given that it is the data structure that is most efficient? I bought
into Kevin's arguments that it made sense to just expose HashSet.
Just the general principal that one
is important.
Erik
On Mar 11, 2004, at 5:22 PM, Kevin A. Burton wrote:
Erik Hatcher wrote:
I will refactor again using Set with no copying this time (except for
the String[] and Hashtable) constructors. This was my original
preference, but I got caught up in the arguments by Kevin and lost my
I would think your best bet is to index each section as a separate
Document, with a field that refers to the HTML file itself somehow.
Erik
On Mar 11, 2004, at 7:43 PM, Ashwin Shripathi Raj wrote:
Hi,
I have a large HTML document broken up into sections.
On a search, I need to retrieve only
On Mar 9, 2004, at 10:23 PM, Kevin A. Burton wrote:
You need do make it a HashSet:
table = new HashSet( stopTable.keySet() );
Done.
Also... while you're at it... the private variable name is 'table'
which this HashSet certainly is *not* ;)
Well, depends on your definition of 'table' I suppose
On Mar 10, 2004, at 2:59 PM, Kevin A. Burton wrote:
I refuse to expose HashSet... sorry! :) But I did wrap what is
passed in, like above, in a HashSet in my latest commit.
Hm... You're doing this EVEN if the caller passes a HashSet directly?!
Well it was in the ctor. But I guess I'm not seeing
It means we screwed up the timing somehow and changed the build file
version after we built the binary version, is my guess.
We'll be more careful with the 1.4 release and make sure this doesn't
happen then.
Erik
On Mar 10, 2004, at 8:34 PM, Jeff Wong wrote:
Hello,
I noticed that Lucene
On Mar 10, 2004, at 9:45 PM, Doug Cutting wrote:
Jeff Wong wrote:
I noticed that Lucene 1.3-final source builds a JAR file whose version
number is 1.4-rc1-dev. What does this mean? Will 1.4-final build
as
1.5-rc1-dev?
Probably. If you modify the sources of a 1.3-final release, and build
them,
On Mar 10, 2004, at 10:28 PM, Doug Cutting wrote:
Erik Hatcher wrote:
Also... you're HashSet constructor has to copy values from the
original HashSet into the new HashSet ... not very clean and this
can just be removed by forcing the caller to use a HashSet (which
they should).
I've caved
not recompile your own source code against a new Lucene
JAR so I will simply provide another signature too.
Erik
On Mar 9, 2004, at 4:15 AM, Kevin A. Burton wrote:
Erik Hatcher wrote:
I don't see any reason for this to be a Hashtable.
It seems an acceptable alternative to not share analyzer
Kevin - I've made this change and committed it, using a Set.
Let me know if there are any issues with what I've committed - I
believe I've faithfully preserved backwards compatibility.
Erik
p.s. ...
On Mar 9, 2004, at 2:00 PM, Kevin A. Burton wrote:
public StopFilter(TokenStream in,
In the RealWorld... many applications actually just re-run a search and
jump to the appropriate page within the hits searching is generally
plenty fast enough to alleviate concerns of caching.
However, if you need to cache Hits, you need to be sure to keep around
the originating
My impression is the new term vector support should at least make this
type of comparison feasible in some manner. I'd be interested to see
what you come up with if you give this a try. You will need the latest
CVS codebase.
Erik
On Mar 8, 2004, at 4:37 PM, Michael Giles wrote:
I'm
I don't see any reason for this to be a Hashtable.
It seems an acceptable alternative to not share analyzer/filter
instances across threads - they don't really take up much space, so is
there a reason to share them? Or I'm guessing you're sharing it
implicitly through an IndexWriter, huh?
find quite enjoyable and refreshing.
Your taglib is a nicely done.
Erik
Regards,
Iskandar
- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, March 08, 2004 7:48 PM
Subject: Re: Lucene Taglib
On Mar 8, 2004, at 3:46 AM
On Mar 7, 2004, at 6:27 AM, [EMAIL PROTECTED] wrote:
On Fri, 5 Mar 2004 19:18:04 -0500, Erik Hatcher
[EMAIL PROTECTED]
wrote:
Thanks for the idea for a good example for the upcoming Lucene in
Action
book... it's been added!
Thanks for mentioning me in the book ;)
Well, I actually already had
On Mar 6, 2004, at 1:17 AM, prasen wrote:
Any tutorial/samples on how to use indices, and use them in your
search ?
Sure, tons. See the articles/resources section of the Lucene website.
Otis has written several. I've written a few articles at java.net on
Lucene. And there are a handful of
I, too, gave up on the sandbox taglib. I apologize for even committing
it without giving it more of a workout. I gave a good effort to fix it
up a couple of months ago, but there was more work to do than I was
willing to put in.
I have not heard from the original contributor, and I
Actually a slop of 1 does guarantee order... it is either an exact
match or 1 term off. It takes a slop of 2 or greater for reverse order
matches.
But it is not exactly 1 term off, which is what Jochen wants. *shrug*
Erik
On Mar 4, 2004, at 6:22 PM, Otis Gospodnetic wrote:
Ah, sorry, I
Kelvin,
In what scenarios does QueryParser fail without throwing a
ParseException?
I think we should fix those cases to ensure a ParseException is thrown.
Erik
On Mar 5, 2004, at 3:21 AM, Kelvin Tan wrote:
Lucene reacts pretty badly to non-wellformed queries, not throwing a
:18:29 -0500, Erik Hatcher said:
Kelvin,
In what scenarios does QueryParser fail without throwing a
ParseException?
I think we should fix those cases to ensure a ParseException is
thrown.
Erik
Sorry, my bad. Was it ever throwing Errors? Probably not, but somehow
I had the
impression
Terms in Lucene are text. If you want to deal with number ranges, you
need to pad them.
0001 for example. Be sure all numbers have the same width
and zero padded.
Lucene use lexicographical ordering, so you must be sure things collate
in this way.
Erik
On Mar 5, 2004, at 11:46
(oh, say through a common function :).
In fact, this is a great example for LIA. I'll add it! And I'll post
the code back here in a day or so after I write it.
Erik
On Mar 5, 2004, at 12:34 PM, [EMAIL PROTECTED] wrote:
On Friday 05 March 2004 18:01, Erik Hatcher wrote:
0001
On Mar 5, 2004, at 4:16 PM, Erik Hatcher wrote:
Another quite cool option is to subclass QueryParser, and override
getRangeQuery. Do the padding there. This will allow users to type
in normal looking numbers, and the padding happens automatically.
You'll need to be sure that numbers padded
Right Otis was confused by what you were asking.
Google supports what you are asking for, I believe, although I don't
recall if an '*' indicates one or more or just one.
As far as I know, there is no easy way to do the exact distance like
you desire. You could always clone the PhraseQuery
On Mar 3, 2004, at 4:25 PM, hui wrote:
Anoterh similar issue. If we could have a parameter to control the max
number of the files within the index, that is going to avoid the
problem of
running of the file handler issue.
When the file number within one index reaches the limit, optimization
is
On Mar 2, 2004, at 1:23 PM, Supun Edirisinghe wrote:
now, one more question: what are the big performance hits from using a
FuzzyQuery. what are some bad cases to use it(eg. many words in the
phrase? long strings? ) would it be better to read up on the
Levenshtein
algorithm or to get into the
On Mar 1, 2004, at 7:05 PM, Supun Edirisinghe wrote:
is there any documentation on FuzzyQuery or articles written on it? ( I
mean besides the API pages.)
I cover it a little in this article:
http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
What you are doing is really the job of an Analyzer. You are doing
pre-analysis, when instead you could do all of this within the context
of a custom analyzer and avoid many of these issues altogether.
Do you use the XML only during indexing? If so, you could bypass the
whole conversion to
Lucene's wiki has been migrated to:
http://wiki.apache.org/jakarta-lucene
The old content was migrated to
http://wiki.apache.org/jakarta-lucene/LuceneProjectPages
Doug has gotten on the wiki bandwagon with Nutch also:
http://www.nutch.org/cgi-bin/twiki/view/Main/Nutch
I've started PoweredBy
On Feb 28, 2004, at 5:38 PM, Moray McConnachie (OA) wrote:
- Original Message -
I guess the best way to handle this problem, other than getting the
application to transform values prior to query or indexing, is
actually to
tokenize the field after all, but use the same KeywordAnalyzer to
On Feb 27, 2004, at 5:16 AM, Moray McConnachie wrote:
I note from previous entries on the mailing list and my own
experiments that
you can add many entries to the same field for each document. Example:
a
given document belongs to more than one product, ergo I index the
product
field with values
On Feb 27, 2004, at 7:12 AM, Ankur Goel wrote:
Hi,
In the lucene-1.3-final version's CHANGES.txt it is written that Fix
StandardTokenizer's handling of CJK characters (Chinese, Japanese and
Korean
ideograms).
Does it mean that for CJK characters we now do not need to use any
separate
analyzer,
On Feb 27, 2004, at 10:00 AM, Moray McConnachie wrote:
Are you using QueryParser? Try using a TermQuery(product,
PROD_A)
when indexing as a Keyword and see what you get. If that finds it,
then you are suffering from analysis paralysis. QueryParser, Keyword
fields, and analyzers are a very
Roy,
On Feb 27, 2004, at 12:12 PM, Roy Klein wrote:
Document doc = new Document();
doc.add(Field.Text(contents, the));
Changing these to Field.Keyword gets it to work. I'm delving a little
bit to understand why, but it seems if you are adding words
individually anyway you'd
of troubleshooting but haven't
figured it out yet. Something in DocumentWriter I presume.
Erik
Roy
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, February 27, 2004 2:12 PM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field
On Feb 27, 2004, at 6:17 PM, Doug Cutting wrote:
I think it's document.add(). Fields are pushed onto the front, rather
than added to the end.
Ah, ok DocumentFieldList/DocumentFieldEnumeration are the culprits.
This is certainly a bug. With things going in reverse order as they
are now, a
in the phrase, the other document matches
the
phrase query.
Roy
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, February 27, 2004 4:34 PM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document
On Feb 27, 2004
On Feb 25, 2004, at 4:01 PM, sam xia wrote:
Or should I build the whole thing into one big segment
and use the filter to do this. There is a DateFilter.
Is there a way to implement a category filter?
What is the best way to accomplish this?
I'd recommend a pool of filters for each category.
On Feb 25, 2004, at 7:58 PM, sam xia wrote:
I'd recommend a pool of filters for each category.
Regenerate them
when the index changes, otherwise leave the
instances alive and reuse
them for queries - this will speed things up pretty
dramatically I'd
guess. There is a QueryFilter you could use,
On Feb 17, 2004, at 6:53 AM, [EMAIL PROTECTED] wrote:
On Monday 16 February 2004 20:56, Erik Hatcher wrote:
On Feb 16, 2004, at 9:50 AM, [EMAIL PROTECTED] wrote:
TokenStream in = new WhitespaceAnalyzer().tokenStream(contents, new
StringReader(doc.getField(contents).stringValue()));
The field
On Feb 17, 2004, at 9:58 AM, [EMAIL PROTECTED] wrote:
On Tuesday 17 February 2004 15:18, Erik Hatcher wrote:
You would do them separately. I'm not clear on what you are trying to
do. The Analyzer does all this during indexing automatically for you,
but it sounds like you are just trying
On Feb 17, 2004, at 11:39 AM, [EMAIL PROTECTED] wrote:
On Tuesday 17 February 2004 16:13, Erik Hatcher wrote:
The words (or terms) are already in the index ready to be read
very
rapidly and accurately. IndexReader is what you want to investigate
if
your fields are indexed.
Look
On Feb 16, 2004, at 6:12 AM, [EMAIL PROTECTED] wrote:
On Monday 16 February 2004 12:02, Viparthi, Kiran (AFIS) wrote:
As mentioned I didn't use any information from index so I didn't uses
any
TokenStream but let me check it out.
deprecated:
String description =
On Feb 16, 2004, at 7:59 AM, [EMAIL PROTECTED] wrote:
On Monday 16 February 2004 12:40, Erik Hatcher wrote:
On Feb 16, 2004, at 6:12 AM, [EMAIL PROTECTED] wrote:
String description = doc.getField(contents).stringValue();
What is the value of description here?
? The value of the field contents
On Feb 16, 2004, at 9:50 AM, [EMAIL PROTECTED] wrote:
Can somebody explain tokenStream() to me?
You are now venturing under the covers of Lucene's API. This is where
I give the sage advice to get the Lucene source code and surf around it
a bit. (It helps to have a nice IDE where you can click
On Feb 16, 2004, at 10:34 AM, [EMAIL PROTECTED] wrote:
On Monday 16 February 2004 15:16, Erik Hatcher wrote:
And thus the nature of the problem. Try using the WhitespaceAnalyzer
instead to see what you get.
Can I chain multiple analyzer in order to filter common stop words?
You cannot chain
Timo,
You are asking a lot of good questions, but also questions for which
answers already exist. Just dig a little deeper and you will see.
Have a look at my java.net article (titled Lucene Intro) and you will
find utility code that hilights how analyzers work. Tinker with that a
bit,
You must remove and re-add the entire document to perform an update.
Such is the (current) nature of Lucene.
Erik
On Feb 15, 2004, at 10:25 PM, Tim Walters wrote:
Hi,
I'm thinking of using Lucene in an application that might change the
field data without modifying the document. It would be
On Feb 13, 2004, at 7:02 AM, [EMAIL PROTECTED] wrote:
On Friday 13 February 2004 12:18, Julien Nioche wrote:
If you want to limit the set of Documents you're querying, you should
consider using Filter objects and send it to the searcher along with
your
Query.
Hm, hard to find information about
On Feb 13, 2004, at 9:12 AM, [EMAIL PROTECTED] wrote:
On Friday 13 February 2004 15:02, Erik Hatcher wrote:
Use a HitCollector and grab the first one that comes in, then bail
out.
That should do the trick for getting the first hit only.
According to the API docs I ought to use HitCollector only
On Feb 11, 2004, at 5:00 AM, Nicolas Maisonneuve wrote:
hy,
recently, there is a new subdirectory spans in the search directory.
what is it and how use it ?
Have a look at the test cases which use the new features, and also see
the CHANGES file which mentions it.
Erik
In this case, I'd recommend calling out to a Lucene, CLucene, or
PLucene.
Sam Ruby plugged it into his Perl-based blog like this:
http://radio.weblogs.com/0101679/stories/2002/08/13/
luceneSearchFromBlosxom.html
On Feb 11, 2004, at 6:23 PM, [EMAIL PROTECTED] wrote:
Hi!
Somewhat off-topic:
On Feb 8, 2004, at 11:13 AM, David Black wrote:
Let's assume I have an object that is composed of the following
fields...
UID: 434 (Keyword/Stored)
TITLE: Java For Dum Dums (Text/Stored)
AUTHOR: Fred Smith - Text/Stored
DESCRIPTION: This would be a big long field -
On Feb 7, 2004, at 5:32 PM, Ramy Hardan wrote:
Is there an efficient way to search refinement preferably without
losing the Hits class?
I'm not quite following your Filter questions, but QueryFilter seems to
fit the bill for what you are trying to do. Just keep around the
previous query, and
);
System.out.println(Key : + result.get(value) + Desc:
+ result.get(name)) ;
}
System.out.println(Finished Search: +hits.length());
}
Thanks in advance,
Justin
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 05, 2004 6:34 PM
On Feb 5, 2004, at 8:19 PM, Scott Smith wrote:
There is a minor issue I found that I think works as documented, but
wonder why it's that way. If you enter a search string that's a
hyphenated word such as fred-bill (w/o the quotes), the QueryParser
generates a search string to find all documents
On Feb 4, 2004, at 9:07 AM, [EMAIL PROTECTED] wrote:
On Wednesday 04 February 2004 14:48, Otis Gospodnetic wrote:
There is score.
Oops, you are right Hits.score(). But it seems I have to implement a
sorting
iterator on my own :-\
Well, the original design is to have hits sorted by score you
On Feb 4, 2004, at 12:21 PM, William W wrote:
Hi Erik,
How is the book ? ;)
William.
:)
Otis and I are burning the midnight oil to get this thing done as soon
as possible. We are probably 3/4 done with the manuscript. We've been
through one review cycle. The bulk should be done by the end of
Doug asked me to take care of the logistics of pushing the Lucene 1.3
FINAL release to the Apache site properly so that it is mirrored
worldwide. Last weekend I said the right magic incantations and it
looks like it has been successful. So, without further ado, Lucene 1.3
is now completely
On Feb 3, 2004, at 7:12 AM, Erik Hatcher wrote:
Doug asked me to take care of the logistics of pushing the Lucene 1.3
FINAL release to the Apache site properly so that it is mirrored
worldwide. Last weekend I said the right magic incantations and it
looks like it has been successful. So
The best suggestion I have is to look at the code in my first java.net
article (Intro Lucene) and borrow the Analyzer utility code to see what
happens to a sample string as it is analyzed. Then pass that same
string to QueryParser (along with the same analyzer) and see what the
On Feb 1, 2004, at 6:16 AM, [EMAIL PROTECTED] wrote:
There was some third-party SQLDirectory for lucene 1.2 which was
abandoned for
a matter of performance. Well, why not loading the index into RAM? Is
there
some (official) SQLDirectory for 1.3?
If you look back in the list archives a few weeks
On Feb 1, 2004, at 6:19 AM, [EMAIL PROTECTED] wrote:
Hi!
Is there any HTMLDocument out there? The one in the demo package of
lucene
does not handle non-wellformed HTML files (what about nekohtml?) and
seems to
have some other inabilities and bugs as well (and why isn't it part of
the
distro
On Jan 29, 2004, at 5:08 AM, tom wa wrote:
I'm trying to create an index which can also be searched with date
ranges. My first attempt using the Lucene date format ran in to
trouble after my index grew and I couldn't search over more than a few
days.
I saw some other posts explaining why this
If you know the document id, you can use IndexSearcher.explain() (you
could do a TermQuery to find it to get the number, or get to it more
directly through IndexReader perhaps).
You are affecting the score by adding more to the query as the score is
based on the query itself.
Erik
On Jan
On Jan 29, 2004, at 1:45 PM, Otis Gospodnetic wrote:
--- Weir, Michael [EMAIL PROTECTED] wrote:
Is the CJKAnalyzer the best to use for Japanese? If not, which is?
If so,
from where can I download it?
There is also a ChineseTokenizer/Analyzer in the sandbox as well. It
may have value for
On Jan 29, 2004, at 1:45 PM, Otis Gospodnetic wrote:
--- Weir, Michael [EMAIL PROTECTED] wrote:
Is the CJKAnalyzer the best to use for Japanese? If not, which is?
If so,
from where can I download it?
There is also a ChineseTokenizer/Analyzer in the sandbox as well. It
may have value for
On Jan 29, 2004, at 9:00 AM, Weir, Michael wrote:
I am fairly new to Lucene and I have noticed a difference between
Lucene
1.2RC1 (which came with our build of Cocoon) and the new Lucene
1.3Final.
I am indexing about 400 very small documents, each in 10 languages.
The
document contents are
and eHatcher Solutions would be happy to as well :))
On Jan 29, 2004, at 12:16 PM, Ryan Ackley wrote:
I know of two:
http://superlinksoftware.com
http://jboss.org
- Original Message -
From: Boris Goldowsky [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, January 29, 2004 12:04
On Jan 29, 2004, at 1:56 PM, Dror Matalon wrote:
On Thu, Jan 29, 2004 at 01:46:12PM -0500, Erik Hatcher wrote:
and eHatcher Solutions would be happy to as well :))
Recommended. Eric knows Lucene well and is very responsive.
That should read very expensive :)) But we all know you get what you
pay
Lucene is a Java API, and can be used within any type of Java program
(command-line, web, etc).
It is up to you as the developer embedding Lucene to put whatever kind
of interface you want on it. To index local files leverage some of the
code I have put in my java.net articles, or use the Ant
On Jan 28, 2004, at 9:01 AM, Sebastian Fey wrote:
How you present the search results will be up to you and the needs of
your
project.
ive NO experience with java.
it would be nice to see an example of a webinterface, that implements
lucene to have something to start with.
No offense intended at
Your escape character *is* working to pass it through the parser into
the analyzer.
It is the analyzer that is splitting at the dash. Phrases get analyzed
too.
Erik
p.s. I wish I had a nickel for every Lucene issue that boils down to
QueryParser or Analyzer misunderstanding. :) The two
On Jan 27, 2004, at 2:27 PM, Gabe wrote:
If I have a group of documents and I want to filter on
a category, it is fairly straightforward. I just
create a Field that contains the category and filter
on it.
However, what if I want the field category to have
multiple possible values? Is there a known
On Jan 24, 2004, at 6:44 PM, Pasha Bizhan wrote:
Luke use default ctor for Analyser, but Russian Analyser doesn't
contain it.
And German Analyser too - try Luke and the error will be the same.
You can add this code into RussianAnalyzer.java and enjoy:
public RussianAnalyzer() {
this.charset =
On Jan 25, 2004, at 2:53 PM, Pasha Bizhan wrote:
Hi,
I'm not sure that's rightly. Because Russian unicode charset, KOI
charset
and win1251 charset is equal in use. May be unicode charset is less
common.
I guess so Russian Analyser hasn't no-arg constructor.
Pasha - my apologies, but I'm not
On Jan 25, 2004, at 5:36 PM, Pasha Bizhan wrote:
My code is only example and RussianCharsets.RussianUnicode too.
We use RussianCharsets.CP1251. But other people can use other charset.
I think that Russian Analyser must not has no-arg constructor.
The choise of default charset is not evident.
The
It definitely cannot be done with custom token types. You're probably
aiming for field-specific boosting, so you will need to parse the HTML
into separate fields and use a multi-field search approach.
I'm sure there are other tricks that could be used for boosting, like
inserting the words
On Jan 20, 2004, at 10:22 AM, Terry Steichen wrote:
1) Is there a way to set the query boost factor depending not on the
presence of a term, but on the presence of two specific terms? For
example, I may want to boost the relevance of a document that contains
both iraq and clerics, but not
On Jan 21, 2004, at 10:01 AM, Terry Steichen wrote:
But doesn't the query itself take this into account? If there are
multiple matching terms then the overlap (coord) factor kicks in.
TS==Except that I'd like to be able to choose to do this on a
query-by-query basis. In other words,
it's
On Jan 21, 2004, at 4:21 PM, Terry Steichen wrote:
PS: Is this in the docs? If not, maybe it should be mentioned.
Depends on what you consider the docs. I looked at QueryParser.jj to
see what it parses.
Also, on http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
it has an example of
On Jan 19, 2004, at 5:03 AM, Nicolas Maisonneuve wrote:
i have a report to write about lucene and i don't know
what formula write in the paper and how explain it
Ultimately the answer lies within the code itself - as we all know
documentation and FAQ's can easily become out of sync from the
501 - 600 of 800 matches
Mail list logo