Re: Is there way to get complete start end matches to be first in the list ?

2009-09-08 Thread Michael Barbarelli
What I do is run each entry in the hits collection through a home-rolled levenstein distance algorithm to obtain a score. Then I sort by score. On Sep 8, 2009 9:44 PM, "Paul Taylor" wrote: Is there way to get complete start end matches to be first in the list We use Lucene to search song albums

Re: Parsing large xml files

2009-05-21 Thread Michael Barbarelli
Why not use an XML pull parser? I recommend against using an in-memory parser. On Thu, May 21, 2009 at 3:42 PM, Sudarsan, Sithu D. < sithu.sudar...@fda.hhs.gov> wrote: > > Hi, > > While trying to parse xml documents of about 50MB size, we run into > OutOfMemoryError due to java heap space. Incre

Re: Luke site is down?

2009-03-04 Thread Michael Barbarelli
I'm not having any problems with the following. http://www.getopt.org/luke/ On Wed, Mar 4, 2009 at 5:07 PM, Ruslan Sivak wrote: > I'm not getting anything when I go to http://www.getopt.org/luke/, or > http://www.getopt.org. > > Does anyone know how long the site is expected to be down and is t

Possible to expose similarity as a property in hits collection?

2007-08-16 Thread Michael Barbarelli
Hello all. I am trying to get at the raw difference that Lucene uses -- the result of the fail-fast Levenstein distance algorithm. I believe that it is calculated in FuzzyTermEnum.java (FuzzyTermEnum.cs). In the application I have built upon Lucene, I would like to expose similarity as the score,

Re: Customizing Stop Word List?

2007-07-13 Thread Michael Barbarelli
Please disregard previous request for assistance. I've fixed the bug I was struggling with and it actually had nothing to do with the analyzer in question. Thanks very much. On 7/13/07, Michael Barbarelli <[EMAIL PROTECTED]> wrote: Here's the sample code. Incidentally, thi

Re: Customizing Stop Word List?

2007-07-13 Thread Michael Barbarelli
Here's the sample code. Incidentally, this is in C#. I am using Lucene.NET, but I am assuming this problem could be universal to all versions and that this is a question that is best exposed to the collective wisdom of the Java user group. default list of ISO country codes. * public string[] DEF

Re: Customizing Stop Word List?

2007-07-12 Thread Michael Barbarelli
Hello Hoss. Cheers for your response. Much appreciated. "typically the act of writing this sample code helps you spot where you amy be doing something wrong in your application" Fair enough point. Unfortunately, I won't be able to post any sample code until I return to my home office. Will po

Customizing Stop Word List?

2007-07-12 Thread Michael Barbarelli
Hello to All, I'm having a problem with Lucene where certain words that I would like to be included in the query are actually being ommitted from it. And I think that is because Lucene recognizes them as stop words. This is the case with roughly four terms in particular. They look like English

Re: custom stop word list for standard analyzer

2007-04-13 Thread Michael Barbarelli
Apologies and thanks all at the same time, everyone. Mike On 4/12/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Michael Barbarelli wrote: : > Can I instantiate a standard analyzer with an argument containing my own : > stop words? If so, how? Will they be appended to or

custom stop word list for standard analyzer

2007-04-12 Thread Michael Barbarelli
I know this is a relatively fundamental thing to arrange, but I'm having trouble. Can I instantiate a standard analyzer with an argument containing my own stop words? If so, how? Will they be appended to or override the built-in stop words? Or, do I have to modify the analyzer class itself and

Re: How to access Levenstein distance number?

2007-04-11 Thread Michael Barbarelli
Thank you Erick! Will give it a shot! On 4/11/07, Erick Erickson <[EMAIL PROTECTED]> wrote: Go for a HitCollector. In particular, TopDocs will give you the raw scores. Erick On 4/11/07, Michael Barbarelli <[EMAIL PROTECTED]> wrote: > > Hi Grant. > > Yes, I'm ge

Re: How to access Levenstein distance number?

2007-04-11 Thread Michael Barbarelli
that value? Thanks! Mike On 4/11/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: Have you looked at the explains to see what is coming out of the FuzzyQuery? Also, are you using Hits to get that score? Scores get normalized to 1 by that process. -Grant On Apr 11, 2007, at 2:06 AM, Michael

How to access Levenstein distance number?

2007-04-10 Thread Michael Barbarelli
Hello. I am using Lucene to submit fuzzy queries against an index. I have noticed that relevant matches are often retreived, but the scoring is not at all what I expected. For example, if my query is "rightches~", a reference to a text file with the single word "righteous" is returned with a sco

Fwd: Unable to retreive 2/13 field values

2007-02-27 Thread Michael Barbarelli
Hello. I'm using Lucene.NET, but would like to pose the question here in the Java group since I think the collective expertise here is still valid. Hope you don't mind. After indexing data from an Oracle DB using the standard analyzer, I am using Luke (standardanalyzer) to query at the moment.