RE: PorterStemmer / Levenshtein Distance

2004-11-05 Thread Tate Avery
Yousef, If you are interested in using the Levenshtein algorithm outside of Lucene, it is available in the Jakarta StringUtils class... T -Orig

WordListLoader's whereabouts

2004-09-27 Thread Tate Avery
Hello, I am trying to compile the analyzers from the Lucene sandbox contributions. Many of them seem to import org.apache.lucene.analysis.WordlistLoader which is not currently in my classpath. Does anyone know where I can find this class? It does not appear to be in Lucene 1.4, so I am assum

RE: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread Tate Avery
I get a NullPointerException shown (via Apache) when I try to access http://www.searchmorph.com/kat/spell.jsp T -Original Message- From: David Spencer [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 14, 2004 3:23 PM To: Lucene Users List Subject: NGramSpeller contribution -- Re: com

RE: AnalyZer HELP Please

2004-08-18 Thread Tate Avery
x27; in them". Look at Nutch for how it does something very similar. Erik On Aug 18, 2004, at 11:52 AM, Tate Avery wrote: > > That is interesting. > > I went to lookup the cases for this (on Google). > Here are my 4 queries and the results: > > > a) of the from it &

RE: AnalyZer HELP Please

2004-08-18 Thread Tate Avery
That is interesting. I went to lookup the cases for this (on Google). Here are my 4 queries and the results: a) of the from it - 25,500,000 matches containing 'of' and 'the' and 'from' and 'it' - i.e. stop list NOT used if query is only stopwords b) "of the from it"

RE: Finding All?

2004-08-13 Thread Tate Avery
I had to do this once and I put a field called "all" with a value of "true" for every document. _doc.addField(Field.Keyword("all", "true")); Then, if there was an empty query, I would substitute it for the query "all:true". And, of course, every doc would match this. There might be a MUCH mo

RE: boost keywords

2004-08-13 Thread Tate Avery
Well, as far as I know you can boost 3 different things: - Field - Document - Query So, I think you need to craft a solution using one of those. Here are some possibilities for each: 1) Field - make a keyword field which is alongside your content field - boost your keyword fiel

RE: Understanding Boolean Queries

2004-04-29 Thread Tate Avery
-Original Message- From: Tate Avery [mailto:[EMAIL PROTECTED] Sent: Thursday, April 29, 2004 1:30 PM To: 'Lucene Users List' Cc: [EMAIL PROTECTED] Subject: RE: Understanding Boolean Queries Thank you for the response. I am not using the QueryParser directly... it was just part

RE: Understanding Boolean Queries

2004-04-29 Thread Tate Avery
ding? Thanks, Tate -Original Message- From: Stephane James Vaucher [mailto:[EMAIL PROTECTED] Sent: Thursday, April 29, 2004 1:10 PM To: Lucene Users List; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Understanding Boolean Queries On Thu, 29 Apr 2004, Tate Avery wrote: >

Understanding Boolean Queries

2004-04-29 Thread Tate Avery
A B" AND "C D" AND "D E" ... and... ("A B") AND ("C D") AND ("D E") ... could that be the crux of it? Thank you for your time, Tate Avery - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: BooleanScorer - 32 required/prohibited clause limit

2004-04-27 Thread Tate Avery
Or if I overlooked some previous post or thread that covers this please help me track it down. Thank you, Tate -Original Message- From: Tate Avery [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 27, 2004 10:20 AM To: [EMAIL PROTECTED] Subject: BooleanScorer - 32 required/prohibited

BooleanScorer - 32 required/prohibited clause limit

2004-04-27 Thread Tate Avery
Hello, I am using Lucene 1.3 and I ran into the following exception: java.lang.IndexOutOfBoundsException: More than 32 required/prohibited clauses in query. at org.apache.lucene.search.BooleanScorer.add(BooleanScorer.java:98) Is there any easy way to fix/adjust this (like the BooleanQuer

RE: Software for suggesting alternative words or sentences

2004-04-16 Thread Tate Avery
Also... http://jazzy.sourceforge.net/ -Original Message- From: Felix Huber [mailto:[EMAIL PROTECTED] Sent: Friday, April 16, 2004 1:17 PM To: Lucene Users List Subject: Re: Software for suggesting alternative words or sentences Check http://www.iu.hio.no/~frodes/sprell/sprell.html - i

Numeric field data

2004-04-02 Thread Tate Avery
Hello, Is there a way (direct or indirect) to support a field with numeric data? More specifically, I would be interested in doing a range search on numeric data and having something like: number:[1 TO 2] ... and not have it return 11 or 103, etc. But, return 1.5, for example. Is ther

RE: Nested category strategy

2004-04-01 Thread Tate Avery
Could you put them all into a tab-delimited string and store that as a single field, then use a TabTokenizer on the field to search? And, if you need to, do a .split("\t") on the field value in order to break them back up into individual categories. -Original Message- From: David Blac

Searching in "all"

2004-04-01 Thread Tate Avery
Hello, If I have, for example, 3 fields in my document (title, body, notes)... is there some easy what to search 'all'? Below are the only 2 ideas I currently have/use: 1) If I want to search for 'x' in all, I do something like: title:x OR body:x OR notes:x ... but this does not reall

Natural Language Queries

2004-01-26 Thread Tate Avery
Hello, Has anyone come across a good (preferably open-source) module for parsing natural language queries into Lucene queries? I.e. Identifying concepts (single vs. multi-word), concept expansion (via thesauri), filtering extraneous words, etc. Any information would be appreciated. Thank you

RE: Displaying Query

2003-12-17 Thread Tate Avery
Try: String larequet = query.toString("default field name here"); Example: String larequet = query.toString("texte"); That should give string version of query. -Original Message- From: Gayo Diallo [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 17, 2003 10:46 AM To: [EMAIL PROTEC

RE: SearchBlox J2EE Search Component Version 1.1 released

2003-12-02 Thread Tate Avery
If you buy it, apparently: http://www.searchblox.com/buy.html -Original Message- From: Tun Lin [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2003 10:43 AM To: 'Lucene Users List'; [EMAIL PROTECTED] Subject: RE: SearchBlox J2EE Search Component Version 1.1 released Hi,

RE: New Lucene-powered Website

2003-12-02 Thread Tate Avery
Hello, This is the first time that I noticed this. Is the 'powered by Lucene' a legal requirement? Or just a suggestion? Does it apply to any system embedding Lucene (web pages, applications, etc)? That is not covered in the Apache Software License, I believe. Just curious... Tate -Orig

RE: Ask something about lucene

2003-11-19 Thread Tate Avery
Have a look at the API http://jakarta.apache.org/lucene/docs/api/ For example, the Hits object has a score see: org.apache.lucene.search.Hits (score) And the IndexReader allows you to get num docs in the index and term data, etc. see: org.apache.lucene.index.IndexReader (numDo

Which operations change document ids?

2003-11-17 Thread Tate Avery
Hello, I am considering using the document id in order to implement a fast 'join' during relational search. My first question is: should I steer clear of this all together? And why? If not, I need to know which Lucene operations can cause document ids to change. I am assuming that the follo

RE: Document Clustering

2003-11-11 Thread Tate Avery
Categorization typically assigns documents to a node in a pre-defined taxonomy. For clustering, however, the categorization 'structure' is emergent... i.e. the clusters (which are analogous to taxonomy nodes) are created dynamically based on the content of the documents at hand. -Original

Relational Search

2003-11-04 Thread Tate Avery
Hello, I want to perform a 'relational search' meanining that I want to search 2 indexes and perform an intersection between the 2. It would be very much like a table join in an SQL statement in terms of overall result. So, I might have an index of documents of type A that would allow me to re

RE: large index query time

2003-10-24 Thread Tate Avery
Below are some posts from Doug (circa 2001) that I found very helpful with regard to understanding Lucene scalability. I am assuming that they are still generally applicable. You might also find them useful. Tate --- Performance for

RE: Exact Match

2003-10-22 Thread Tate Avery
To ensure I understand... If you have: 1) A B C 2) B C 3) B C D 4) C You want "B C" to match #2 only But, "C" to match #1, #2, #3, and #4 If so, you can have a tokenized field and an untokenized one... Use the untokenized for matching 'exact' strings Use the tokenized for finding a single

RE: Lucene on Windows

2003-10-21 Thread Tate Avery
Lucene on Windows Tate Avery wrote: > You might have trouble with "too many open files" if you set your mergeFactor too > high. For example, on my Win2k, I can go up to mergeFactor=300 (or so). At 400 I > get a too many open files error. Note: the default mergeFactor of 10 shoul

RE: Lucene on Windows

2003-10-20 Thread Tate Avery
You might have trouble with "too many open files" if you set your mergeFactor too high. For example, on my Win2k, I can go up to mergeFactor=300 (or so). At 400 I get a too many open files error. Note: the default mergeFactor of 10 should give no trouble. FYI - On my linux box, I got the 't