Hello All,
My program indexing a string---
London/Bristol/LondonEast/Scotland using standarad anlyser.
when i seach with a word londonit doesnt comeup in the hits. If i
search for london it is coming.
Where would be the problem?
should it requires a custom tokenizer
Hi all,
I'm having a problem searching for phrases (example: bucky badger). I
can search for the terms individually (using and or or searches
(booleanquery)), but can't seem to do a phrasequery (within the same boolean
query)see code:
BooleanQuery myquery = new BooleanQuery();
for (int
Hi Eric,
Thanks for the link. I've looked at it and it has some interesting parts
like the stop words and the analyser which I might partially include
(partially since I work with both english and french texts).
Cheers,
Stephane
Eric Isakson wrote:
Don't know if any of the code in this
Actually, I'm just looking to remove accentuated chars from java chars
(so Unicode), only for the search (original doc should stay the same as
I display), I'll just implement a TokenFilter to do this. It should be
relatively simple. Just wanted to know if it had already been done
(perhaps in a
Something flexible and elegant would also be a simple fst.
Here is one built for lucene:
http://sourceforge.net/projects/normalizer/
-Original Message-
From: stephane vaucher [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 12, 2002 12:23 PM
To: Lucene Users List
Subject: Re:
So, I have tried this with Lucene:
1) original JavaCC LL(k) HTML parser
2) SWING's HTML parser
In case of (1) I could process about 300K of HTML documents. In case of
(2) more than 400K.
But I cannot process complete collection (5M) and finish my hard stress
tests of Lucene.
Is there anyone
Look in the Lucene sandbox in CVS. I contributed an Ant task that
indexed HTML documents. It uses JTidy under the covers to parse HTML
into title and body content, and it could be extended to pull other
information such meta keywords.
Erik
Leo Galambos wrote:
So, I have tried this with
Fair enough, but a protected would only allow subclasses from
accessing it. Personally, I would rather not have to use a subclass to
implement my feature. I think the logic behind this is that its an
intrinsic property of a Term, thus it should be immutable, as any
modifications to this object
On a related note, I've also released a project that I developed for my
book and for presentations that I have been giving on Ant, XDoclet, and
JUnit. This project is a documentation search engine with a web
(Struts) interface. It uses Lucene and the Ant task I mentioned already
to index a
Yeah, Neko is not the most straight forward, but it works.
Sorry, the code is somewhere.can;t look for it now.
But you could also look at LARM under Lucene Sanbox, it's got a nice
HTML parser, too.
Otis
--- Leo Galambos [EMAIL PROTECTED] wrote:
So, I have tried this with Lucene:
1)
10 matches
Mail list logo