Re: auto-generate uid?

2004-11-22 Thread Terry Steichen
Not exactly sure what you're trying to do. You can easily generate a number when you index each Document and insert it in a uid field (which is, BTW, what I do), and if you base it on a timestamp plus some characteristic of the document (which is also what I do), it should always be unique. As

Re: disadvantages

2004-11-21 Thread Terry Steichen
Compared to what? - Original Message - From: Miguel Angel To: [EMAIL PROTECTED] Sent: Sunday, November 21, 2004 12:00 PM Subject: disadvantages What are disadvantages the Lucene?? -- Miguel Angel Angeles R. Asesoria en Conectividad y Servidores Telf. 97451277 -

Re: Lucene external field storage contribution.

2004-11-09 Thread Terry Steichen
Kevin, Sorry for the delay in replying. I think your idea for an external field storage mechanism is excellent. I'd love to see it, and if I can, will be willing to help make that happen. Regards, Terry - Original Message - From: Kevin A. Burton To: Lucene Users List Sent:

Re: BooleanQuery - TooManyClauses

2004-10-26 Thread Terry Steichen
I think what Erik's asking is whether you can live with expressing your indexed date in the form of MMDD, without the hour and minute extension. That will sharply educe the number of range query expansion terms. If you're using the timestamp as a unique identifier, you might consider creat

Re: Multisearcher question

2004-10-12 Thread Terry Steichen
I think what Sreedhar is asking for is the capability to form a "join" across multiple indices - and if so, I could sure use that capability myself. However, I think Lucene's logic focuses only on a single query, so I doubt if that's easily done. - Original Message - From: Otis Gos

Nutch vs Lucene

2004-09-16 Thread Terry Steichen
ting is the padre of both Nutch and Lucene. Otis --- Terry Steichen <[EMAIL PROTECTED]> wrote: > Otis, > > What's the relationship between Nutch and Lucene? > > Terry > - Original Message - > From: Otis Gospodnetic > To: Lucen

Re: Term highlighting and Term vector patch

2004-09-16 Thread Terry Steichen
Christoph, Just curious - how are you currently using Term Vectors? They seem to be a neat feature with lots of future promise, but I'm not sure how to best use them now. Regards, Terry - Original Message - From: Christoph Goller To: Lucene Developers List Sent: Thursday, Se

Re: Concurent operations with Lucene

2004-09-15 Thread Terry Steichen
Otis, What's the relationship between Nutch and Lucene? Terry - Original Message - From: Otis Gospodnetic To: Lucene Users List Sent: Wednesday, September 15, 2004 7:29 AM Subject: Re: Concurent operations with Lucene Hello Only 1 process can modify (add/delete) an ind

Re: Lucene Book

2004-09-07 Thread Terry Steichen
Jeez, Erik! Where's your sense of public spirit ;-) Terry PS: Glad to hear you're (finally!) nearing publication. - Original Message - From: Erik Hatcher To: Lucene Users List Sent: Tuesday, September 07, 2004 6:43 AM Subject: Re: Lucene Book On Sep 7, 2004, at 3:00 A

Re: Lucene Search Applet

2004-08-18 Thread Terry Steichen
I suspect it has to do with the security restrictions of the applet, 'cause it doesn't appear to be finding your Lucene jar file. Also, regarding the lock files, I believe you can disable the locking stuff just for purposes like yours (read-only index). Regards, Terry - Original Message

Re: Negative Boost

2004-08-04 Thread Terry Steichen
Aug 4, 2004, at 7:19 AM, Terry Steichen wrote: > I can't get negative boosts to work with QueryParser. Is it possible > to do so? Closer inspection on the parsing: TOKEN : { )+ ( "." (<_NUM_CHAR>)+ )? > : DEFAULT } where <#_NUM_CHAR: [&q

Re: Negative Boost

2004-08-04 Thread Terry Steichen
< 1.0, no? Otis --- Terry Steichen <[EMAIL PROTECTED]> wrote: > I can't get negative boosts to work with QueryParser. Is it possible > to do so? > > TIA, > > Terry > > >

Negative Boost

2004-08-04 Thread Terry Steichen
I can't get negative boosts to work with QueryParser. Is it possible to do so? TIA, Terry

Re: Underscore character and case issue

2004-07-05 Thread Terry Steichen
Luke runs just fine with 1.3.1. If you're using Windows, try highlighting it with Windows Explorer, right-clicking on it, choosing the "Open with.." menu option and selecting "javaw". Regards, Terry - Original Message - From: "Andrzej Bialecki" <[EMAIL PROTECTED]> To: "Lucene Users Lis

Re: NullAnalyzer

2004-06-11 Thread Terry Steichen
+1 - Original Message - From: "Eric Jain" <[EMAIL PROTECTED]> To: "lucene-user" <[EMAIL PROTECTED]> Sent: Friday, June 11, 2004 4:24 AM Subject: NullAnalyzer > There doesn't seem to be an Analyzer that doesn't do anything included > with Lucene, is there? This would seem useful to prev

Re: Open-ended range queries

2004-06-10 Thread Terry Steichen
Speaking for myself, only a small number of my code modules currently treat "null" as the open-ended range query term parameter. If the syntax change from 'null' --> '*' was deemed otherwise desirable and the syntax transition made very clearly, I could personally adjust to it without too much dif

Re: Open-ended range queries

2004-06-10 Thread Terry Steichen
- Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, June 10, 2004 2:24 PM Subject: Re: Open-ended range queries > On Jun 10, 2004, at 2:13 PM, Terry Steichen wrote: > > Ac

Re: Open-ended range queries

2004-06-10 Thread Terry Steichen
Actually, QueryParser does support open-ended ranges like : [term TO null]. Doesn't work for the lower end of the range (though that's usually less of a problem). Regards, Terry - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Se

Re: extensible query parser - Re: Proximity Searches behavior

2004-06-10 Thread Terry Steichen
Erik, When is "Lucene in Action" scheduled to be out? Regards, Terry - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, June 10, 2004 5:04 AM Subject: Re: extensible query parser - Re: Proximity Searches behavior

Re: Proximity Searches behavior

2004-06-09 Thread Terry Steichen
This poses a couple of additional questions: 1) If you set the default slop factor in QueryProcessor to something greater than 1, can you also use wildcards? (I ask that question because, to my understanding, you can't combine the explicit proximity query syntax with wildcards. That is, somethin

Re: why the score is not always 1.0 when comparing two identical strings?

2004-06-04 Thread Terry Steichen
Nothing is wrong. When the maximum relevance score is greater than one, all hit scores are normalized (making the highest score 1.0). When the maximum score is less than 1, normalization does not occur. The more complex the query, the more likely that the raw (non-normalized) score will be less

Re: FileNotFoundException when trying to indexing.

2004-06-03 Thread Terry Steichen
Prasad, I think you'll have to provide more code so we can see what's actually going on. BTW, I don't see you calling the UseCompoundFile method (unless you do it inside indexFile/Directory) - I wonder if that could be an issue? Regards, Terry PS: I run on XP/Pro just fine, so there's nothing

Re: similarity of two texts

2004-06-02 Thread Terry Steichen
Erik, Could you expand on this just a wee bit, perhaps with an example of how to compute this vector angle? TIA, Terry - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, June 01, 2004 9:39 AM Subject: Re: similarity

Lockfile Problem Solved

2004-05-31 Thread Terry Steichen
Just thought I'd pass on some info I just discovered. I've been successfully using the CVS head version of Lucene as of about 2 months ago. I then got the formal release (1.4-rc3) and tried it with my application, but it failed. I tried it with some commandline test routines and they worked f

Re: Internal full content store within Lucene

2004-05-18 Thread Terry Steichen
+1 - Original Message - From: "Kevin Burton" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, May 18, 2004 2:43 PM Subject: Internal full content store within Lucene > Per the discussion the other day about storing content external to > Lucene I think we h

Re: Java 1.4 (was: new Lucene release: 1.4 RC3)

2004-05-13 Thread Terry Steichen
> 1.3" work around, or to convert the anonymous inner classes > to named inner classes? > > this is the only 1.4 dependency that I know of. > > > > -Original Message- > > From: Terry Steichen [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, May 12,

Re: new Lucene release: 1.4 RC3

2004-05-12 Thread Terry Steichen
t; Sent: Wednesday, May 12, 2004 8:04 AM Subject: Re: new Lucene release: 1.4 RC3 > I don't recall any JDK 1.4 methods/classes being used, and I just saw > Doug replacing one AssertException (1.4) with RuntimeException. > > Are there some 1.4 dependencies I'm not aware of

Re: new Lucene release: 1.4 RC3

2004-05-12 Thread Terry Steichen
I presume this still requires Java 1.4 to build, but will run with Java 1.3? Regards, Terry - Original Message - From: "Doug Cutting" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, May 11, 2004 4:51 PM Subject: new Lucene release: 1.4 RC3 > Version 1.4

Re: Problems From the Word Go

2004-04-30 Thread Terry Steichen
Erik, Maybe you could donate some of those demo modules (and the accompanying article/text) to Lucene, so they'd be incorporated officially in the website? Regards, Terry - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Fri

Re: searching only part of an index

2004-04-27 Thread Terry Steichen
I think that if you include the indexing timestamp in the Document you create when indexing, you could sort on this and only pick the first 100. Regards, Terry - Original Message - From: "Alan Smith" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, April 27, 2004 8:02 AM Subjec

Re: Stemmer Benefits/Costs

2004-04-22 Thread Terry Steichen
Andrzej, Sorry for misspelling your name. My Polish sucks. Terry - Original Message - From: "Terry Steichen" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, April 22, 2004 7:56 PM Subject: Re: Stemmer Benefits/Costs &

Re: Stemmer Benefits/Costs

2004-04-22 Thread Terry Steichen
t; Sent: Thursday, April 22, 2004 5:37 PM Subject: Re: Stemmer Benefits/Costs > Terry Steichen wrote: > > > I've been experimenting with the Porter and Snowball stemmers. It > > seems to me that one of the most valuable benefits these provide is > > the capability t

Stemmer Benefits/Costs

2004-04-22 Thread Terry Steichen
I've been experimenting with the Porter and Snowball stemmers. It seems to me that one of the most valuable benefits these provide is the capability to generalize phrase terms. As a very simple example, without the stemmer, I might need to include three phrase terms in my query: "north korea",

Re: Wierd Search Behavior

2004-04-01 Thread Terry Steichen
t;\n with message: " + e.getMessage()); } } } ----- Original Message - From: "Terry Steichen" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, March 31, 2004 11:47 AM Subject: Re: Wierd Search Behavior > No, they're typos in the e-mail

Re: Wierd Search Behavior

2004-03-31 Thread Terry Steichen
MAIL PROTECTED]> Sent: Wednesday, March 31, 2004 9:55 AM Subject: Re: Wierd Search Behavior > On Mar 31, 2004, at 9:49 AM, Terry Steichen wrote:\ > > I'm experiencing some very puzzling search behavior. I am using the > > CVS head I pulled about a week ago. I use the Sta

Wierd Search Behavior

2004-03-31 Thread Terry Steichen
I'm experiencing some very puzzling search behavior. I am using the CVS head I pulled about a week ago. I use the StandardAnalyzer and QueryParser. I have a collection of XML documents indexed. One field is "subhead", and here's what I find with different queries: subhead:(missile defense)

Re: Similarity - position in Field[] effects scoring - how to change?

2004-03-23 Thread Terry Steichen
Joachim, I believe you'll have to replace the default Similarity class with one of your own. Not sure exactly what the settings should be - maybe some other list members can give you specifics. Otherwise, you'll probably have to experiment with it. Regards, Terry - Original Message -

Re: Final Hits

2004-03-22 Thread Terry Steichen
inal Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, March 22, 2004 7:06 AM Subject: Re: Final Hits > How exactly would you take advantage of a subclassable Hits class? > > > On Mar 21, 2

Re: SpanXXQuery Usage

2004-03-22 Thread Terry Steichen
AIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, March 22, 2004 2:46 AM Subject: Re: SpanXXQuery Usage > Only in unit tests, so far. > > Otis > > --- Terry Steichen <[EMAIL PROTECTED]> wrote: > > Is there any documentation (othe

Final Hits

2004-03-21 Thread Terry Steichen
Does anyone know why the Hits class is final (thus preventing it from being subclassed)? Regards, Terry

SpanXXQuery Usage

2004-03-19 Thread Terry Steichen
Is there any documentation (other than that in the source) on how to use the new SpanxxQuery features? Specifically: SpanNearQuery, SpanNotQuery, SpanFirstQuery and SpanOrQuery? Regards, Terry

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-08 Thread Terry Steichen
I tend to agree (but with the same uncertainty as to why I feel that way). Regards, Terry - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, March 08, 2004 2:34 PM Subject: Re: Sys properties Was: java.io.tmpdir as

Re: SubstringQuery -- Re: Leading Wild Card Search

2004-02-17 Thread Terry Steichen
Doug, What you say makes a good deal of sense to me. Could you give us a relative sense of the "slowness" of different operators? Regards Terry - Original Message - From: "Doug Cutting" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, February 17, 2004 1:1

Re: Query Term Questions

2004-01-21 Thread Terry Steichen
nesday, January 21, 2004 2:04 PM Subject: Re: Query Term Questions > On Jan 21, 2004, at 1:07 PM, Terry Steichen wrote: > > Unfortunately, using positive boost factors less than 1 causes the > > parser to > > barf the same as do negative boost factors. > > Are you sur

Re: Query Term Questions

2004-01-21 Thread Terry Steichen
Morus, Unfortunately, using positive boost factors less than 1 causes the parser to barf the same as do negative boost factors. Regards, Terry - Original Message - From: "Morus Walter" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, January 21, 2004 10:5

Re: Query Term Questions

2004-01-21 Thread Terry Steichen
OTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, January 21, 2004 9:31 AM Subject: Re: Query Term Questions > On Jan 20, 2004, at 10:22 AM, Terry Steichen wrote: > > 1) Is there a way to set the query boost factor depending not on the > >

Re: Query Term Questions

2004-01-21 Thread Terry Steichen
By the silence, I gather that the answers to my questions are "no", "no" and "no". Regards, Terry - Original Message - From: "Terry Steichen" <[EMAIL PROTECTED]> To: "Lucene Users Group" <[EMAIL PROTECTED]> Sent: Tuesday, J

Query Term Questions

2004-01-20 Thread Terry Steichen
1) Is there a way to set the query boost factor depending not on the presence of a term, but on the presence of two specific terms? For example, I may want to boost the relevance of a document that contains both "iraq" and "clerics", but not boost the relevance of documents that contain only on

Re: setMaxClauseCount ??

2004-01-18 Thread Terry Steichen
Maybe you're using wildcards (which cause the query to get expanded). Just go in and set the varb to something very large (provided that doing so doesn't give you an OutOfMemory error - which is why that limit was set). HTH, Terry - Original Message - From: "Karl Koch" <[EMAIL PROTECTED

Peculiar (?) Indexing Performance

2004-01-13 Thread Terry Steichen
I just aborted a re-indexing operation (because it was taking too much time - will run it overnight instead). But I was surprised by what I found in the index directory, which contained a total of 1,402 index files! It started out with 36 files with the name of "_I9a.*", followed by groups of

Re: Performance question

2004-01-07 Thread Terry Steichen
ster than xerces, you might want to look at these. You might > want to look at http://dom4j.org/. > > Dror > > > > > > Regards > > > > Scott > > > > -Original Message- > > From: Terry Steichen [mailto:[EMAIL PROTECTED] > > Sent: Tuesday

Re: Performance question

2004-01-06 Thread Terry Steichen
Scott, Here are some figures to use for comparision. Using the latest Lucene release, I index about 200 similar-sized XML files at a time, on a Windows XP machine (2Ghz). First I create a new index, which adds the documents at a rate of about 8 per second (I don't recall what the cpu % is during

Re: Error deleting a document when using the compound file index

2003-12-30 Thread Terry Steichen
Paul, I just started using 1.3 final (labeled 1.4 RC1) and ran into a similar problem (though I'm not using the compound file option). My code ran just fine all the way through 1.3RC3, but with the latest release, the reader.delete() threw a "Lock obtain timed out" IOException. What I finally di

Re: Term out of order.

2003-10-29 Thread Terry Steichen
What kind of response is this? (e.g. "apparently so.") Is this a problem or not? Regards, Terry Steichen - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, October

Re: Confusion over wildcard search logic

2003-09-23 Thread Terry Steichen
Erik's analysis is comprehensive and useful. I think this example reflects a common (and understandable) oversight - that wildcards do *not* work with a phrase. Got caught on that many times myself. Also there may be confusion about the format -> field:(term1 term2), in that the examples provide

Re: Is it possible in lucene for numeric search

2003-09-22 Thread Terry Steichen
You can also use a RangeQuery. If you index the field of numeric data, say 'score', as a string, then you can do things like: score:[75 TO 80]. Only extra work is that you need to pad the actual score with enough 0's (such that 9 becomes 09, etc.) to cover the expected range. Regards, Terry --

Re: Lucene Scoring Behavior

2003-09-21 Thread Terry Steichen
highest one is set to 1, and the others are proportionately lower? Regards, Terry - Original Message ----- From: "Terry Steichen" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, September 18, 2003 10:10 AM Subject: Re: Lucene Sc

Broken by Lock

2003-09-21 Thread Terry Steichen
About a month ago, timeouts were added to Lock (and they seem to make a lot of good sense). However, because of this enhancement, in using the latest CVS my application broke - I keep getting the message "Lock obtain timed out". I looked through the source in an attempt to figure out a quick w

Re: Lucene Scoring Behavior

2003-09-18 Thread Terry Steichen
nesday, September 17, 2003 11:15 PM Subject: Re: Lucene Scoring Behavior > Hmm. This makes no sense to me. Can you supply a reproducible > standalone test case? > > Doug > > Terry Steichen wrote: > > Doug, > > > > (1) No, I did *not* boost the pub_date field, eith

Re: Lucene Scoring Behavior

2003-09-17 Thread Terry Steichen
[EMAIL PROTECTED]> Sent: Wednesday, September 17, 2003 5:51 PM Subject: Re: Lucene Scoring Behavior > Terry Steichen wrote: > > 0.03125 = fieldNorm(field=pub_date, doc=90992) > > 1.0 = fieldNorm(field=pub_date, doc=90970) > > It looks like the fieldNorm's are what dif

Re: Lucene Scoring Behavior

2003-09-17 Thread Terry Steichen
tion { > if (term.field() == "date") { >return 1.0f; > } else { >return super.idf(term, searcher); > } >} > } > > Or you could just give date clauses of your query a very small boost > (e.g., .0001) so that other clauses domina

Lucene Scoring Behavior

2003-09-17 Thread Terry Steichen
I've run across some puzzling behavior regarding scoring. I have a set of documents which contain, among others, a date field (whose contents is a string in the MMDD format). When I query on the date 20030917 (that is, today), I get 157 hits, all of which have a score of .23000652. If I u

Negative boosting?

2003-09-11 Thread Terry Steichen
I've often found the use of query-based boosting to be very beneficial. This is particularly so when it's easy to identify the term that I want to stand out as a primary selector. However, I've come across quite a few other cases where it would be easier (and more logical) to apply a negativ

Lucene documentation

2003-08-30 Thread Terry Steichen
sers List" <[EMAIL PROTECTED]> Sent: Saturday, August 30, 2003 12:09 AM Subject: Re: Keyword search with space and wildcard > On Friday 29 August 2003 10:02, Terry Steichen wrote: > > I agree. One problem, however, that new (and not-so-new) Lucene users face > > is a

Re: Keyword search with space and wildcard

2003-08-29 Thread Terry Steichen
Tatu, I agree. One problem, however, that new (and not-so-new) Lucene users face is a learning curve when they want to get past the simplest and most obvious uses of Lucene. For example, I don't think any of the docs mention the fact that you can't combine a phrase and a wildcard query. Other t

Re: RC2 requires reindexing?

2003-08-29 Thread Terry Steichen
load area hold it) ? > > > > Jan > > > > - Original Message - > > From: "Lukas Zapletal" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Friday, August 29, 2003 12:14 PM > > Subje

Re: Keyword search with space and wildcard

2003-08-29 Thread Terry Steichen
If I understand your issue correctly, I think what you're experiencing is the fact that you can have a phrase query "hello world", or a wildcard query +hell* +wor*, but you can't mix the two together. As far as I've found, that's a basic limitation you just have to live with. (Of course, if someo

RC2 requires reindexing?

2003-08-28 Thread Terry Steichen
I just switched to RC2 and found that a number of queries now don't work. (When I switch back to RC! they work fine.) Can't seem to figure out a pattern regarding those that don't work versus those (the vast majority) that still work fine. I looked in the RC2 source and noticed that the dates

Re: Similar Document Search

2003-08-21 Thread Terry Steichen
usual question of what is > >actually interesting: high frequency, low frequency or the mid range). > > > >Indexing would probably be quite expensive since Lucene doesn't seem to > >support changes in the index, and the index for the terms would change > >all the time

Re: Similar Document Search

2003-08-18 Thread Terry Steichen
hange > all the time. We haven't implemented it yet, but it shouldn't be hard to > code. I just wouldn't expect good performance when indexing large > collections. > > Peter > > > Terry Steichen wrote: > > >Is it possible without extensive additional co

Similar Document Search

2003-08-18 Thread Terry Steichen
Is it possible without extensive additional coding to use Lucene to conduct a search based on a document rather than a query? (One use of this would be to refine a search by selecting one of the hits returned from the initial query and subsequently retrieving other documents "like" the selected

Re: searching data indexed from database??

2003-06-01 Thread Terry Steichen
--- Original Message - From: "Venkatraman, Shiv" <[EMAIL PROTECTED]> To: "'Terry Steichen '" <[EMAIL PROTECTED]>; "'Lucene Users List '" <[EMAIL PROTECTED]> Sent: Saturday, May 31, 2003 11:31 AM Subject: RE: searching data indexed

Re: searching data indexed from database??

2003-06-01 Thread Terry Steichen
Shiv, Searching in Lucene is field-based. Thus you must specify the field to be searched - the only 'exception' is that one field is defined as default. If you want to search across multiple fields, I believe you must create a concatenation of the individual fields into a single one during the i

Re: Problem while indexing

2003-04-03 Thread Terry Steichen
Amit, I don't exactly know what your problem is, but I'm using a configuration not too different from yours with no problems - so at least you know it's possible. I have an index of about 125MB which I use on various machines, including an old Windows98/SE 400MHz notebook. I used the default Mer

Re: Tokenize negative number

2003-03-25 Thread Terry Steichen
Probably tokenized 1234 as a string and treated '-' as a separator. See previous discussion on "query". Regards, Terry - Original Message - From: "Lixin Meng" <[EMAIL PROTECTED]> To: "'Lucene Users List'" <[EMAIL PROTECTED]> Sent: Tuesday, March 25, 2003 9:16 PM Subject: Tokenize negati

Re: query

2003-03-25 Thread Terry Steichen
Arsineh, There was some discussion on this list about this topic earlier. As I recall, the escaping a '-' doesn't work (for reasons I don't recall - something about interaction of analyzer and tokenizer, I think). To handle this for my own purposes, I believe I modified the QueryParser.jj source

Re: Indexing and searching database data

2003-03-10 Thread Terry Steichen
+1 - Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Monday, March 10, 2003 10:38 AM Subject: Indexing and searching database data > Hello, > > Would anyone be interested in ability to use Lucene search on the data fr

Re: Lucene Turbine Service

2003-03-04 Thread Terry Steichen
Samuel, Not exactly sure of your question. But, if the path is known at the time of indexing, you just insert it in the Document that is created as part of the indexing. If you don't know the path till later, you might insert a partial path at index time and add the exact location when you use i

Re: Computing Relevancy Differently

2003-02-28 Thread Terry Steichen
bug, cuold you please provide a complete, > self-contained test case? You could, for example, model this after the > TestSimilarity class in the test code hierarchy. > > The lengthNorm(String,int) method is called when you index the document. > > Doug > > Terry Steichen

Re: Computing Relevancy Differently

2003-02-28 Thread Terry Steichen
r that the new lengthNorm() method is being called. It's probably some silly goof, but I can't figure out where it is. If you (or anyone else, of course) have any ideas/suggestions, I'd appreciate them. Regards, Terry - Original Message ----- From: "Terry Steichen" &l

Re: MAX Index Size POLL

2003-02-27 Thread Terry Steichen
Samir, The size of the index depends on (a) the size of the documents, (b) the number of fields per document, (c) the fields that are kept in the index. The time taken to index depends on the same plus the characteristics of the processor and storage i/o. With so many variables, I don't think the

Re: Indexing Tips and Hints

2003-02-24 Thread Terry Steichen
are trying this anyway, and looking for ways to improve > indexing times... Could you perhaps try to replace use of > java.io.RandomAccessFile in FSDirectory implementation, with the > attached implementation? It supposedly increases I/O throughput by > orders of magnitude, by using part

Re: Indexing Tips and Hints

2003-02-24 Thread Terry Steichen
Mike, By way of comparison, I've got a collection of about 50,000 XML files, each of which averages about 8K. It takes about 1.25 hours to index (on a 1.8Ghz machine). I use basically the standard configuration (mergeFactor, etc.) and I've got about 30 fields per document. I add about 200 new o

Re: Syntax Problem - Maybe solved

2003-02-16 Thread Terry Steichen
ing it as an encoded space). That would explain the behavior. I will confirm this later on today. Regards, Terry - Original Message - From: "Terry Steichen" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Saturday, February 15, 20

Re: Syntax Problem

2003-02-15 Thread Terry Steichen
ED]> Sent: Saturday, February 15, 2003 7:41 PM Subject: Re: Syntax Problem > Terry Steichen wrote: > > I have an index which, when searched with this query ("cloning clone > > animal") produces 1103 hits. A different, more narrow query > > ("(cloning clone) AND a

Syntax Problem

2003-02-15 Thread Terry Steichen
I have an index which, when searched with this query ("cloning clone animal") produces 1103 hits. A different, more narrow query ("(cloning clone) AND animal") produces only 19 hits. What's puzzling to me is that if I try a different (but supposedly identical) form of the more narrow query ("+

Re: Computing Relevancy Differently

2003-02-10 Thread Terry Steichen
, 2003 1:57 PM Subject: Re: Computing Relevancy Differently > Terry Steichen wrote: > > Can you give me an idea of what to replace the lengthNorm() method with to, > > for example, remove any special weight given to shorter matching documents? > > The goal of the default imp

Re: Computing Relevancy Differently

2003-02-08 Thread Terry Steichen
iginal Message - From: "Doug Cutting" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Friday, February 07, 2003 2:37 PM Subject: Re: Computing Relevancy Differently > Terry Steichen wrote: > > I read all the relevant references I co

Score-Limited Hits?

2003-02-03 Thread Terry Steichen
Is there an existing API that allows you to conduct a search such that only hits with a score greater than X are returned? Regards, Terry

Re: regarding Query parser for relational operators

2003-02-03 Thread Terry Steichen
in advance > Nellai... > - Original Message - > From: "Terry Steichen" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Monday, February 03, 2003 7:50 PM > Subject: Re: regarding Query parser for relation

Re: regarding Query parser for relational operators

2003-02-03 Thread Terry Steichen
Nellai, Sounds like you want to use a range query. Regards, Terry - Original Message - From: "Nellai" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, February 03, 2003 5:10 AM Subject: regarding Query parser for relational operators Hi, Is there any way

Re: '-' character not interpreted correctly in field names

2003-02-03 Thread Terry Steichen
I believe that the tokenizer treats a dash as a token separator. Hence, the only way, as I recall, to eliminate this behavior is to modify QueryParser.jj so it doesn't do this. However, doing this can cause some other problems, like hyphenated words at a line break and the like. (Of course, if y

Re: Wildchars in phrase

2003-02-02 Thread Terry Steichen
Lukas, I believe that "this" is a stop word, so it is stripped out. Regards, Terry - Original Message - From: "Lukas Zapletal" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Sunday, February 02, 2003 11:47 AM Subject: Wildchars in phrase > Hello all! > > Why a

Re: Wildchar based search?? |

2003-02-02 Thread Terry Steichen
Leo, >From my experience, as I update the index (without optimizing), the number of physical index files grows. I typically use the number of files as an indicator as to when optimization is required. While I don't think Lucene itself has any API to check this, a shell script or the application

Re: Computing Relevancy Differently

2003-01-26 Thread Terry Steichen
I admit to a bit of frustration. With the past several messages, I simply asked (or, more accurately, tried to ask) how to alter the way that Lucene ranks relevancy, and I asked whether the selective boost mechanism might do the trick. I admitted that I don't know (nor care to know) the theory be

Re: Computing Relevancy Differently

2003-01-26 Thread Terry Steichen
rom: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Saturday, January 25, 2003 2:09 AM Subject: Re: Computing Relevancy Differently > Check the lucene-user archives, search for subject "custom scoring api > questio

Computing Relevancy Differently

2003-01-24 Thread Terry Steichen
How would one go about altering the formula for relevancy? (That is, which modules and which code?) I'm certain that the current algorithm is well founded in logic and probably works well in many environments. However, I find that, as I index news stories, the current algorithm frequently d

Re: Interpreting the score asociated with the Term? |

2003-01-23 Thread Terry Steichen
asociated with the Term? | > Yes, I believe so. > > --- Terry Steichen <[EMAIL PROTECTED]> wrote: > > Otis, > > > > Didn't somebody (Doug?) also mention that a keyword in a shorter > > document is > > deemed more significant than in a longer one (be

Re: Interpreting the score asociated with the Term? |

2003-01-23 Thread Terry Steichen
Otis, Didn't somebody (Doug?) also mention that a keyword in a shorter document is deemed more significant than in a longer one (because, I guess, it represents a larger percentage of the document)? Regards, Terry - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Luc

Re: Range queries

2003-01-23 Thread Terry Steichen
Erik, That's good. Now I don't have to keep proving what is, is. Glad it finally made sense. Regards, Terry - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, January 22, 2003 11:43 PM Subject: Re: Range queries

  1   2   >