Re: Should I use span query?

2005-09-01 Thread Andrew Boyd
essage- From: Erik Hatcher <[EMAIL PROTECTED]> Sent: Aug 26, 2005 3:23 PM To: java-user@lucene.apache.org Subject: Re: Should I use span query? On Aug 26, 2005, at 4:11 PM, Andrew Boyd wrote: > Hi All, > I'm trying to find all the terms that are within x number of > term

Should I use span query?

2005-08-26 Thread Andrew Boyd
Hi All, I'm trying to find all the terms that are within x number of terms of given query terms. Should I be using span query or something else. If you have any code samples I would greatly appreciated it. Thanks, Andrew -

RE: DEFAULT_OPERATOR_AND

2005-08-18 Thread Andrew Boyd
What about trying something like: BooleanQuery booQuery = new BooleanQuery(); Query titleQuery = null; QueryParser.Operator operator = contentParser.getDefaultOperator(); if(QueryParser.Operator.AND == operator){ //logger.debug("Content Ope

Re: Integrating lucene search with adobe search

2005-08-16 Thread Andrew Boyd
Thanks Ben! -Original Message- From: Ben Litchfield <[EMAIL PROTECTED]> Sent: Aug 15, 2005 6:33 PM To: java-user@lucene.apache.org, Andrew Boyd <[EMAIL PROTECTED]> Subject: Re: Integrating lucene search with adobe search Andrew, There are a couple different open parameters

RE: QueryParser Exceptions only under load?

2005-08-15 Thread Andrew Boyd
Thanks for the reply. I believe your initial thought is probably the correct one! Thanks, Andrew -Original Message- From: "Palmer, Andrew MMI Woking" <[EMAIL PROTECTED]> Sent: Aug 15, 2005 12:03 PM To: java-user@lucene.apache.org, Andrew Boyd <[EMAIL PROT

QueryParser Exceptions only under load?

2005-08-15 Thread Andrew Boyd
Hi all, I'm running lucene 1.9-rc with jdk 1.5/5.0 on JBoss 3.6 with tomcat 5.0. I'm using JMeter to do my load testing. I'm getting several different exceptions (NullPointer, ArrayIndexOutofBounds and ParseException) from QueryParser when I simulate 5 users (threads in JMeter)with no pausing

QueryParser Exceptions only under load?

2005-08-15 Thread Andrew Boyd
ine a Thinkpad G41 with a P4 3.33GHz with 1.5 GB of RAM. The queries are the same whether I'm running one user or 5 users. I expect that these exceptions are happining just because of the load but I thought I'd post them to get comments recomendations? Thanks, Andr

Integrating lucene search with adobe search

2005-08-15 Thread Andrew Boyd
Hello all, After I do my search and display the hits I get back I would like to pass the seach string that I used with lucene to acrobat reader when it opens. Has any one done this or has anyone seen any documents on how to do it? Thanks, Andrew Andrew Boyd Software Architect Sun Certified

Re: DOM or XML representation of a query?

2005-08-10 Thread Andrew Boyd
For additional commands, e-mail: [EMAIL PROTECTED] Andrew Boyd Software Architect Sun Certified J2EE Architect B&B Technical Services Inc. 205.422.2557 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: de pluralization

2005-08-05 Thread Andrew Boyd
You might want to look at stemming for "de pluralization" it boils down words to their "root" So bombs and bomming get stemmed to bomb. I'm using the snowball stemmer, which handles different languages as well as english. It is in the sandbox. org.apache.lucene.analysis.snowball.SnowballFilt

Re: Lucene vs Derby (vs MySQL) for spatial indexing

2005-07-28 Thread Andrew Boyd
cene? Or is coercing Lucene into doing range-based numeric queries a bad idea? (In case anyone's interested, I'm writing a zoomable/pannable world map, so finding the biggest cities in a given area quickly is important) -----

Re: Quick newbie question

2005-07-27 Thread Andrew Boyd
--- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Andrew Boyd Software Architect Sun Certified J2EE Architect B&B Technical Services Inc. 205.422.2557 --

Index with more than one analyzer?

2005-07-25 Thread Andrew Boyd
Hi All, When I first started my project I was creating 3 indexes. Standard, Synonym and SoundsLike. Now that the QueryParser has the ability to put multiple tokens in one position I no longer have to inject the synonyms at index creation time. So I really don't have to have a seperate index

Re: Lucene and numerical fields search

2005-07-19 Thread Andrew Boyd
I second the motion. It sounds like a good solution to TooManyClauses exception. -Original Message- From: Otis Gospodnetic <[EMAIL PROTECTED]> Sent: Jul 16, 2005 5:59 PM To: java-user@lucene.apache.org, Ray Tsang <[EMAIL PROTECTED]> Subject: Re: Lucene and numerical fields search Hi Ray

Re: How to get the un-stemed word

2005-07-12 Thread Andrew Boyd
TermVectors? Yes, but uou would need a scheme for identifying "original, unstemmed" terms vs stems. For example, you could use another field and analyzer for the unstemmed forms. Andrew Boyd wrote: >What about storing the unstemed word with the same position as the stemmed >

Re: How to get the un-stemed word

2005-07-11 Thread Andrew Boyd
What about storing the unstemed word with the same position as the stemmed word. Would that show up in the TermVectors? -Original Message- From: mark harwood <[EMAIL PROTECTED]> Sent: Jul 8, 2005 10:44 AM To: java-user@lucene.apache.org, Andrew Boyd <[EMAIL PROTECTED]> Subj

Re: How to get the un-stemed word

2005-07-08 Thread Andrew Boyd
AIL PROTECTED]> Sent: Jul 8, 2005 11:01 AM To: java-user@lucene.apache.org Subject: Re: How to get the un-stemed word On Jul 8, 2005, at 9:08 AM, Andrew Boyd wrote: > Hi all, > I am using the snowball stemmer and for all my searches that > works fine. > However, I have a need

How to get the un-stemed word

2005-07-08 Thread Andrew Boyd
Hi all, I am using the snowball stemmer and for all my searches that works fine. However, I have a need to display the un-stemmed word after doing some term vector analysis. I was thinking that I might insert the real word at the same position as the stemed word but give the real word a type

Phrase/Range Query Bug?

2005-06-28 Thread Andrew Boyd
Hi All, When I try to do a Range Query with Phrase as one of the end points I'm not getting the results I would expect. Here is a JUnit that shows what I'm trying to do. It fails on the last assertEquals public void testRangeBug(){ try{ RAMDirectory ramDir = new RAMDir

Highlighter problem with SpanNearQuery w/Fix

2005-06-14 Thread Andrew Boyd
erator(); iterator.hasNext();){ // break it out for debugging. Term term = (Term) iterator.next(); String text = term.text(); terms.add(new WeightedTerm(query.getBoost(), text)); } } And that seemed to fix it. Hope this helps, Andrew Andr

Re: Fastest way to fetch N documents with unique keys within large numbers of indexes..

2005-06-09 Thread Andrew Boyd
Kevin, Those results are awsome. Could you please give those of us that were following but not quite understanding everything some pseudo code or some more explaination? Thanks, andrew -Original Message- From: Kevin Burton <[EMAIL PROTECTED]> Sent: Jun 7, 2005 7:18 PM To: java-user@

Free IR testbed

2005-06-06 Thread Andrew Boyd
Can someone point me to a free ir testbed? I was hoping for a testbed that has at least 500k+ documents. I did see TRAC which it looks like a for pay test bed. Thanks, Andrew - To unsubscribe, e-mail: [EMAIL PROTECTED] For add

RE: calculate wi = tfi * IDFi for each document.

2005-06-03 Thread Andrew Boyd
M To: java-user@lucene.apache.org Subject: RE: calculate wi = tfi * IDFi for each document. Hi, DefaultSimilarity uses exactly this weighting scheme. Makes sense since it's a pretty standard relevance measure... Bye! max -Original Message- From: Andrew Boyd [mailto:[EMAIL PROTECTED] Se

RE: calculate wi = tfi * IDFi for each document.

2005-06-03 Thread Andrew Boyd
lso shown in the demo in org.apache.lucene.demo.SearchFiles from SVN. (see http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/demo/org/apache/lucene/demo/SearchFiles.java?rev=150739&view=markup). Hope that helps. max -Original Message- From: Andrew Boyd [mailto:[EMAIL PR

RE: calculate wi = tfi * IDFi for each document.

2005-06-02 Thread Andrew Boyd
ultSimilarity uses exactly this weighting scheme. Makes sense since it's a pretty standard relevance measure... Bye! max -----Original Message- From: Andrew Boyd [mailto:[EMAIL PROTECTED] Sent: Thursday, June 02, 2005 11:39 To: java-user@lucene.apache.org Subject: calculate wi = tfi * IDFi

RE: Lucene and Documentum

2005-06-02 Thread Andrew Boyd
a-user@lucene.apache.org, Andrew Boyd <[EMAIL PROTECTED]> Subject: RE: Lucene and Documentum Hi Andrew I have experience using lucene for Content Management System (Content Repository). We are using different file types and different locale(fe,de,us,ru) Santanu -Original Message-

Lucene and Documentum

2005-06-02 Thread Andrew Boyd
Hi All, Has anyone had any experience using lucene to search a documentum respoitory? Thanks Andrew - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

calculate wi = tfi * IDFi for each document.

2005-06-02 Thread Andrew Boyd
If I have search results how can I calculate, using lucene's API, wi = tfi * IDFi for each document. wi= term weight tfi= term frequency in a document IDFi = inverse document frequency = log(D/dfi) dfi = document frequency or number of documents containing term i D= number of docum

Highlighter ArrayIndexOutOfBoundsException - W/Fix

2005-06-01 Thread Andrew Boyd
Hi, I'm getting an ArrayIndexOutOfBoundsException within the highlighter: java.lang.ArrayIndexOutOfBoundsException: 50 at org.apache.lucene.search.highlight.TokenGroup.addToken(TokenGroup.java:47) at org.apache.lucene.search.highlight.Highlighter.getBestDocFragments(Highlighter.java:

Re: Clustering Carrot2 vs TermVector Analysis

2005-06-01 Thread Andrew Boyd
Responses inline prefixed with -Original Message- From: Dawid Weiss <[EMAIL PROTECTED]> Sent: Jun 1, 2005 3:24 AM To: java-user@lucene.apache.org Subject: Re: Clustering Carrot2 vs TermVector Analysis Hi Andrew, Coming up with an answer... sorry for the delay. > By using the carro

Re: Finding minimum and maximum value of a field?

2005-05-31 Thread Andrew Boyd
How about using range query? private Term begin, end; begin = new Term("dateField", DateTools.dateToString(Date.valueOf(<"backInTimeStringDate">))); end = new Term("dateField", DateTools.dateToString(Date.valueOf(<"farFutureStringDate">))); RangeQuery query = new RangeQuery(begin, end, true)

How to build 1.9-rc1 sandbox?

2005-05-30 Thread Andrew Boyd
In trunck/java I've built ant compile-core changing to trunck/java/contrib the only target is build-tree but I get a lot of compiler errors such as package org.apache.lucene.analysis does not exist. Anyone know what I'm doing wrong? Thanks, Andrew ---

Stemming at Query time

2005-05-30 Thread Andrew Boyd
Hi All, Now that the QueryParser knows about position increments has anyone used this to do stemming at query time and not at indexing time? I suppose one would need a reverse stemmer. Given the query breath it would need to inject breathe, breathes, breathing etc. One benifit is that if you

Clustering Carrot2 vs TermVector Analysis

2005-05-30 Thread Andrew Boyd
Hi All, By using the carrot demo: http://www.newsarch.com/archive/mailinglist/jakarta/lucene/user/msg03928.html I was able to easliy cluster search results based on the fields used by carrot( url, title, and summary). However I was wondering if there was a way to do something similar using t

Re: Ability to load a document with ONLY a few fields for performance?

2005-05-29 Thread Andrew Boyd
The numbers look impressive. If I build from the 1.9 trunck will I get the patch? Andrew -Original Message- From: Otis Gospodnetic <[EMAIL PROTECTED]> Sent: May 28, 2005 9:05 AM To: java-user@lucene.apache.org Subject: Re: Ability to load a document with ONLY a few fields for performanc

URL search causes BooleanQuery TooManyClauses Excp

2005-05-23 Thread Andrew Boyd
Hi All, I have an index with 4811 documents each of which have a field called url. When I try to search such as: url:http*C02MS00800* I get a BooleanQuery$TooManyClauses. I've seen other posting with this exception but they normally are caused by doing a range query. Any Ideas? I would ulti

No HighLights for Phrase Query

2005-05-13 Thread Andrew Boyd
Hi, When I do a Phrase Query I do not get any highlights. Here is my call highlighter = new Highlighter(new QuerySocorer(query.rewrite(indexReader))) highlighter.getBestFragments(tokenStream, body, numPreviews, ELIPSE); I tried it with out the rewite but that didn't help. Thanks, Andrew --

Re: Latitude/Longitude and Lexigraphical search

2005-05-08 Thread Andrew Boyd
AIL PROTECTED]> Sent: May 8, 2005 1:29 PM To: Andrew Boyd Subject: Re: Latitude/Longitude and Lexigraphical search Hello Andrew, There already is a plugin available for Nutch : http://wiki.apache.org/nutch/GeoPosition I think that one can easily integrate it into a lucene app (i'll make

Latitude/Longitude and Lexigraphical search

2005-05-08 Thread Andrew Boyd
Hi All, I'm wanting to do some range queries using latitude and longitude. I have numbers like so: long lat -84.65532 32.74212 What would be the best way to store this in lucene so I can do a range query? Also for all you smart people out there do you know the distance betwee

Re: indexing synonyms / reducing the index size

2005-05-05 Thread Andrew Boyd
I have done the same as Luke but I needed lucene 1.9rc1 to accomplish it. I tried it with 1.4.3 but the queryparser could not handle it. Andrew -Original Message- From: Luke Shannon <[EMAIL PROTECTED]> Sent: May 5, 2005 8:54 AM To: java-user@lucene.apache.org, Pablo Gomes Ludermir <[EMAIL