Hi Sanyi
Could you try XP on your desktop - that would take some variables out. The
problem is that you are comparing OS, as well as filesystems, as well as
different hardware configs.
Also, unless you take your hyperthreading off, with just one index you are
searching with just one half of the
Hi Venkat
If you want to go against just html pages (maybe with Dublin core tags) then
Swich-E isn't too bad, but it wont be as portable as Lucene plus it doesn't
seem to be as nearly as active on the development side as Lucene (so you'll
get less support in the event of problems). Swish seems ea
Hi Yousef
You are not doing anything wrong - its just how the Porter stemmer works!
The problem with Porter is that it tries to do everything in a purely algorithmic way
- which doesn't cater for irregular conjugations etc.
Don't worry too much though, as long as you do the same stemming on the
Hi Yousef
If you want to use it for something else then go direct for the Snowball
stemmers, for details go to the site:
http://snowball.tartarus.org/
Cheers
Pete
- Original Message -
From: "Yousef Ourabi" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Friday, N
Hi David
I like KStem more than Porter / Snowball - but still has limitations
although performs better as it has a dictionary to augment the rules.
Note that KStem will also treat "print" and "printer" as two distinct terms,
probably treating it as verb and noun respectively.
;printer" and submit then the results will be
"print" and "printer" - hence showing the the Porter stemmed versions are
the same as the originals. Therefore they are both distinct terms in their
own right and searches on one will not hit the other.
Cheers
Pete Lewis
ction, hence you don't garbage collect and hence you run out of memory.
Can you check whether or not your garbage collection is being triggered?
Anomalously therefore if this is the case, by reducing the heap space you
can improve performance get rid of the out of memory errors.
Cheers
Hi Aad
Use the stemmed result as what you index, but then also remember to stem the
query terms as well - you need to do the same on the way out as on the way
in.
We don't use MySpell but we do use our own stemmer in this way, as there are
many examples where Snowball falls down like:
caught ->
Hi
I'd recommend Kstem over Porter, it performs much better on English let
alone when you get to other languages. You can get the source code for
Kstem.jar at teh following website:
http://ciir.cs.umass.edu/downloads/
Pete
- Original Message -
From: "Otis Gospodnetic" <[EMAIL PROTECTED
Stefan Groschupf" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, November 05, 2003 11:01 AM
Subject: Re: Index entire filesystem
> There is some ongoing work for nutch.org.
> May be we can bundle all work together?!
> Nutc
Hi Stefan
Using OpenOffice will enable you to parse 182 file formats, but its not a
pure java solution and you still need an alternate solution for pdfs.
I'd be interested in knowing whether anyone is working on a pure java
solution that would give us a single method for handling ms office
documm
Might want two demos, one for Unix environments and one for Windows.
Most users will want a fast start that they can copy and adapt. So quick
targets would be:
filesystems - html / text / pdf / office documents for windows.
xml - fairly simple example maybe against news items.
database - again s
Does anyone know of Lucene being packaged onto a CD to provide a search facility for
the data on that CD? If so, would it be possible to refence?
Thanks
Pete
Hi all
Does anyone know of any sysnonym and homonym lists for the different European
languages?
Sorry for the cross-posting but I'd like to use them for query expanssion in different
languages.
Pete
Hi guys
Thanks, Jawin looks really nice :)
Pete
- Original Message -
From: "Andrzej Bialecki" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, May 29, 2003 9:45 AM
Subject: Re: RE : Parsers
> Victor Hadianto wrote:
> >>I'm using successfully a combinatio
Hi Victor
Thanks.
In the past I have used the Inso OutsideIn filters and found them very good;
however I'd like to come up with a pure Java solution, so if there is a Java
equivalent to the Inso filters I be grateful for any details. Failing that,
I thought that I'd go for individual parsers ini
one...
Adriano Labate
-Message d'origine-
De : Pete Lewis [mailto:[EMAIL PROTECTED]
Envoyé : mercredi, 28 mai 2003 12:48
À : Lucene Users List
Objet : Parsers
Hi all,
I have a rather nice html parser that I got from SourceForge. Does
anyone know of any good parsers for pdf and Micr
Hi all,
I have a rather nice html parser that I got from SourceForge. Does anyone know of any
good parsers for pdf and Microsoft Office Suite (.doc, .ppt, .xls, etc), any help
would be much appreciated.
Pete Lewis
18 matches
Mail list logo