Re: de-boosting fields

2006-12-11 Thread Antony Bowesman
Daniel Naber wrote: On Saturday 09 December 2006 02:25, Scott Smith wrote: What is the best way to do this? Is changing the boost the right answer? Can a field's boost be zero? Yes, just use: term1 term2 category1^0 category2^0. Erick's Filter idea is also useful. Isn't it also true

Using Lucene to search log files

2006-12-11 Thread abdul aleem
Hi All, Im a Lucene newbie, Requirement : == a) Build a log viewer tool, search log files for keywords and time stamp b) files in production approx 200 logs per day and each log file may range from 1MB - 5MB Lucene We wanted to utilize Lucene's search capabilities

Re: Using Lucene to search log files

2006-12-11 Thread Grant Ingersoll
See below On Dec 11, 2006, at 7:04 AM, abdul aleem wrote: Hi All, Im a Lucene newbie, Requirement : == a) Build a log viewer tool, search log files for keywords and time stamp b) files in production approx 200 logs per day and each log file may range from 1MB - 5MB Lucene

Re: Using Lucene to search log files

2006-12-11 Thread Erick Erickson
As far as the appropriateness of Lucene, it's an open question, but I think it'd be fine. If it isn't, you have an interesting problem G. About timestamps. This has been discussed a LOT on the thread, since they're not as straight-forward as you might assume. See the thread *Date ranges -

Re: Using Lucene to search log files

2006-12-11 Thread abdul aleem
Many thanks Grant, I will now dirty my hands with Lucene to get our requirements regards, Abdul --- Grant Ingersoll [EMAIL PROTECTED] wrote: See below On Dec 11, 2006, at 7:04 AM, abdul aleem wrote: Hi All, Im a Lucene newbie, Requirement : == a) Build a log

Re: Using Lucene to search log files

2006-12-11 Thread mark harwood
Extend QueryParser to sort this out. The latest version in SVN has changed the default QueryParser behaviour to use RangeFilters instead of RangeQuerys - Original Message From: Mike Streeton [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Monday, 11 December, 2006 1:35:47 PM

Re: Using Lucene to search log files

2006-12-11 Thread abdul aleem
Many thanks to All, well kind of puzzled because ours is a fast moving log down to Milliseconds :( as we deal with forex on a financial system. Im sure there will be workarounds, actually most of the time it is enough to search within 2 log files of 1-5MB size, coz we are more intersted in

Re: Using Lucene to search log files

2006-12-11 Thread abdul aleem
Aplogies, forget to mention there are great people around in this group, they are of great help as well :):) --- abdul aleem [EMAIL PROTECTED] wrote: Many thanks to All, well kind of puzzled because ours is a fast moving log down to Milliseconds :( as we deal with forex on a financial

Lucene id generation

2006-12-11 Thread Waheed Mohammed
Hello, Is there a way to influence lucene's generation of ids while indexing. my requirement is. I want to have different indexes where no index should have ids that have been assigned to an index earlier. for instance IDX1 : {0.100} IDX2: {101...200} IDX3: {201...300} but not

Re: Using Lucene to search log files

2006-12-11 Thread Erick Erickson
Then you really want to look at the classes that do the work with filters if you require milliseconds. You should be just fine On 12/11/06, abdul aleem [EMAIL PROTECTED] wrote: Many thanks to All, well kind of puzzled because ours is a fast moving log down to Milliseconds :( as we deal

Re: Lucene id generation

2006-12-11 Thread karl wettin
11 dec 2006 kl. 16.15 skrev Waheed Mohammed: Is there a way to influence lucene's generation of ids while indexing. If you speak of the Lucene document number, then no. And are you aware of the fact that document numbers are eligable for change at any time (optimization) without giving

Re: Lucene id generation

2006-12-11 Thread Erick Erickson
I don't believe that this is possible. Or desirable. Lucene IDs are mutable, even within an index. That is, if you index docs that get, say, IDs 1, 2, 3, 4, 5 and delete doc 2 and optimize, Docs 4 and 5 get reassigned IDs 3 and 4 (or something similar). You're far better off controlling this

Re: Lucene id generation

2006-12-11 Thread Find Me
On 12/11/06, Waheed Mohammed [EMAIL PROTECTED] wrote: Hello, Is there a way to influence lucene's generation of ids while indexing. my requirement is. I want to have different indexes where no index should have ids that have been assigned to an index earlier. for instance IDX1 :

Re: Using Lucene to search log files

2006-12-11 Thread abdul aleem
Thanks Erick, I will take a look, Apologies but a basic question, How to actually retrieve the content of search, Most of the examples in Lucene in Action Searcher gives the results found in number of documents but i coudln't find an API to retrieve the line or paragraph where the search is

RE: Lucene id generation

2006-12-11 Thread Ramana Jelda
I really lack this feature from lucene too. Whatever the requirements from Mohammed, There surely I see some improvements in search performance. My argument here is, why not lucene provides a mechanism to be able to provide custom document ids? -Original Message- From: Find Me

Re: Using Lucene to search log files

2006-12-11 Thread Steven Rowe
abdul aleem wrote: How to actually retrieve the content of search, Most of the examples in Lucene in Action Searcher gives the results found in number of documents but i coudln't find an API to retrieve the line or paragraph where the search is matched Hi Abdul, I don't know what

Re: Using Lucene to search log files

2006-12-11 Thread Grant Ingersoll
Span Queries also return positional information On Dec 11, 2006, at 12:12 PM, Steven Rowe wrote: abdul aleem wrote: How to actually retrieve the content of search, Most of the examples in Lucene in Action Searcher gives the results found in number of documents but i coudln't find an API to

Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception

2006-12-11 Thread Andreas Kohn
Hi, while playing with the various stemmers of Lucene(-1.9.1), I got an index out of bounds exception: lucene-1.9.1java -cp build/contrib/snowball/lucene-snowball-1.9.2-dev.jar net.sf.snowball.TestApp Kp bla.txt Exception in thread main java.lang.reflect.InvocationTargetException at

Re: de-boosting fields

2006-12-11 Thread Chris Hostetter
: Isn't it also true that using Field.Index.NO_NORMS when creating the field will : remove it from the scoring formula? I thought I read that somewhere, but now : can't find where. queries on fields with NO_NORMS will still contribute to the score, but the field *length* and/or field bosts

RE: Lucene id generation

2006-12-11 Thread Chris Hostetter
if you are trying to think of Lucene's docid as a meaningful number, you are doing something wrong. A lot of people want to view Lucene docids the same way they look at auto-incrimented unique keys in a database -- don't do that. Instead think of them as memory addresses in C or C++ ... they

Re: Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception

2006-12-11 Thread Doron Cohen
Andreas, I could generate the error as you describe. You can report this bug in http://issues.apache.org/jira/browse/LUCENE There seem to be a few updates in http://snowball.tartarus.org not reflected currently in Lucene - - SnowballProgram.java has this bug fix as you describe The

Re: Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception

2006-12-11 Thread Andreas Kohn
On 12/11/06, Daniel Naber [EMAIL PROTECTED] wrote: On Monday 11 December 2006 19:18, Andreas Kohn wrote: After some debugging, and some tests with the original snowball distribution from snowball.tartarus.org, it seems that the attached change is needed to avoid the exception. The

VTD-XML 1.9 released

2006-12-11 Thread Jimmy Zhang
Version 1.9 of VTD-XML, available in C, C#, and Java, is now released. This version contains XPath-related performance enhancements and bug fixes. To download the latest release, please visit http://sourceforge.net/project/showfiles.php?group_id=110612. For latest performance report, please

SegmentReader using too much memory?

2006-12-11 Thread Eric Jain
I've noticed that after stress-testing my application (uses Lucene 2.0) for I while, I have almost 200mb of byte[]s hanging around, the top two culprits being: 24 x SegmentReader.Norm.bytes = 112mb 2 x SegmentReader.ones = 16mb The second one isn't a big deal, but I wonder what's the

dump/backup fs index catalog without stop it

2006-12-11 Thread Nuno Alexandre Carvalho
Hi, I have one java service that uses lucene as it's text search engine. This is working perfectly, but I don't know how to dump/backup it's filesystem index catalog. Can I simply do a hot copy, without stoping the service and with index open? Thanks in advance. -- Nuno Alexandre Carvalho

How to cut certain index

2006-12-11 Thread spinergywmy
Hi guys, I m wondering how I can cut certain index out of the index file and paste it to other index file? For instance, I have index a particular file with contents and other necessary info into particular index folder, then I would like to move the index info that I have been indexed to

Re: SegmentReader using too much memory?

2006-12-11 Thread Yonik Seeley
On 12/11/06, Eric Jain [EMAIL PROTECTED] wrote: I've noticed that after stress-testing my application (uses Lucene 2.0) for I while, I have almost 200mb of byte[]s hanging around, the top two culprits being: 24 x SegmentReader.Norm.bytes = 112mb 2 x SegmentReader.ones = 16mb Each

Re: SegmentReader using too much memory?

2006-12-11 Thread Eric Jain
Yonik Seeley wrote: On 12/11/06, Eric Jain [EMAIL PROTECTED] wrote: I've noticed that after stress-testing my application (uses Lucene 2.0) for I while, I have almost 200mb of byte[]s hanging around, the top two culprits being: 24 x SegmentReader.Norm.bytes = 112mb 2 x SegmentReader.ones

Re: SegmentReader using too much memory?

2006-12-11 Thread Yonik Seeley
On 12/11/06, Eric Jain [EMAIL PROTECTED] wrote: I do want to use document boosting... Is that independent from field boosting? The length normalization on the other hand may not be necessary. There is no real document boost at the index level... it is simply multiplied into the boost for every

Re: SegmentReader using too much memory?

2006-12-11 Thread Doron Cohen
I do want to use document boosting... Is that independent from field boosting? The length normalization on the other hand may not be necessary. They go together - see Score Boosting in http://lucene.apache.org/java/docs/scoring.html

Re: SegmentReader using too much memory?

2006-12-11 Thread Otis Gospodnetic
Eric, you said you aren't using any Field.Index.NO_NORMS fields, but SegmentReader.ones should only be used if you do use NO_NORMS, so things don't add up here. Otis - Original Message From: Yonik Seeley [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Monday, December 11, 2006

Re: dump/backup fs index catalog without stop it

2006-12-11 Thread Otis Gospodnetic
Nuno, If you stop or block all operations that can change the index (e.g. deletes and additions), you can safely copy the whole index directory. If you do it from Java, you can use Lucene's own Lock class to lock index for modifications, copy the index directory, and unlock the index. Otis

Re: SegmentReader using too much memory?

2006-12-11 Thread Yonik Seeley
On 12/11/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: Eric, you said you aren't using any Field.Index.NO_NORMS fields, but SegmentReader.ones should only be used if you do use NO_NORMS, so things don't add up here. norms(fieldThatDoesntExist) will also return fakeNorms (ones) -Yonik

Re: SegmentReader using too much memory?

2006-12-11 Thread Yonik Seeley
On 12/11/06, Eric Jain [EMAIL PROTECTED] wrote: Yonik Seeley wrote: There is no real document boost at the index level... it is simply multiplied into the boost for every field of that document. So it comes down to what fields you want that index-time boost to take effect on (as well as

Re: SegmentReader using too much memory?

2006-12-11 Thread Eric Jain
Yonik Seeley wrote: It's read on demand, per indexed field. So assuming your index is optimized (a single segment), then it increases by one byte[] each time you search on a new field. OK, makes sense then. Thanks! - To

Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-11 Thread Karl Koch
Well it doesn't since there is not justification of why it is the way it is. Its like saying, here is that car with 5 weels... enjoy driving. Karl Original-Nachricht Datum: Sun, 10 Dec 2006 13:12:29 -0800 Von: Doron Cohen [EMAIL PROTECTED] An: java-user@lucene.apache.org

Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-11 Thread Doron Cohen
Well it doesn't since there is not justification of why it is the way it is. Its like saying, here is that car with 5 weels... enjoy driving. - I think the explanations there would also answer at least some of your questions. I hoped it would answer *some* of the questions... (not all)