Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-11 Thread Doron Cohen
> Well it doesn't since there is not justification of why it is the > way it is. Its like saying, here is that car with 5 weels... enjoy driving. > > - I think the explanations there would also answer at least some of your > > questions. I hoped it would answer *some* of the questions... (not al

Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-11 Thread Karl Koch
Well it doesn't since there is not justification of why it is the way it is. Its like saying, here is that car with 5 weels... enjoy driving. Karl Original-Nachricht Datum: Sun, 10 Dec 2006 13:12:29 -0800 Von: Doron Cohen <[EMAIL PROTECTED]> An: java-user@lucene.apache.org Be

Re: SegmentReader using too much memory?

2006-12-11 Thread Eric Jain
Yonik Seeley wrote: It's read on demand, per indexed field. So assuming your index is optimized (a single segment), then it increases by one byte[] each time you search on a new field. OK, makes sense then. Thanks! - To unsubs

Re: SegmentReader using too much memory?

2006-12-11 Thread Yonik Seeley
On 12/11/06, Eric Jain <[EMAIL PROTECTED]> wrote: Yonik Seeley wrote: > There is no real document boost at the index level... it is simply > multiplied into the boost for every field of that document. So it > comes down to what fields you want that index-time boost to take > effect on (as well a

Re: SegmentReader using too much memory?

2006-12-11 Thread Eric Jain
Yonik Seeley wrote: There is no real document boost at the index level... it is simply multiplied into the boost for every field of that document. So it comes down to what fields you want that index-time boost to take effect on (as well as length normalization). Come to think of it, I do have

Re: SegmentReader using too much memory?

2006-12-11 Thread Yonik Seeley
On 12/11/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Eric, you said you aren't using any Field.Index.NO_NORMS fields, but SegmentReader.ones should only be used if you do use NO_NORMS, so things don't add up here. norms(fieldThatDoesntExist) will also return fakeNorms (ones) -Yonik http:

Re: dump/backup fs index catalog without stop it

2006-12-11 Thread Otis Gospodnetic
Nuno, If you stop or block all operations that can change the index (e.g. deletes and additions), you can safely copy the whole index directory. If you do it from Java, you can use Lucene's own Lock class to lock index for modifications, copy the index directory, and unlock the index. Otis -

Re: SegmentReader using too much memory?

2006-12-11 Thread Otis Gospodnetic
Eric, you said you aren't using any Field.Index.NO_NORMS fields, but SegmentReader.ones should only be used if you do use NO_NORMS, so things don't add up here. Otis - Original Message From: Yonik Seeley <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, December 11, 200

Re: SegmentReader using too much memory?

2006-12-11 Thread Doron Cohen
> I do want to use document boosting... Is that independent from field > boosting? The length normalization on the other hand may not be necessary. They "go together" - see "Score Boosting" in http://lucene.apache.org/java/docs/scoring.html ---

Re: SegmentReader using too much memory?

2006-12-11 Thread Yonik Seeley
On 12/11/06, Eric Jain <[EMAIL PROTECTED]> wrote: I do want to use document boosting... Is that independent from field boosting? The length normalization on the other hand may not be necessary. There is no real document boost at the index level... it is simply multiplied into the boost for ever

Re: SegmentReader using too much memory?

2006-12-11 Thread Eric Jain
Yonik Seeley wrote: On 12/11/06, Eric Jain <[EMAIL PROTECTED]> wrote: I've noticed that after stress-testing my application (uses Lucene 2.0) for I while, I have almost 200mb of byte[]s hanging around, the top two culprits being: 24 x SegmentReader.Norm.bytes = 112mb 2 x SegmentReader.ones

Re: SegmentReader using too much memory?

2006-12-11 Thread Yonik Seeley
On 12/11/06, Eric Jain <[EMAIL PROTECTED]> wrote: I've noticed that after stress-testing my application (uses Lucene 2.0) for I while, I have almost 200mb of byte[]s hanging around, the top two culprits being: 24 x SegmentReader.Norm.bytes = 112mb 2 x SegmentReader.ones = 16mb Each in

How to cut certain index

2006-12-11 Thread spinergywmy
Hi guys, I m wondering how I can cut certain index out of the index file and paste it to other index file? For instance, I have index a particular file with contents and other necessary info into particular index folder, then I would like to move the index info that I have been indexed to othe

dump/backup fs index catalog without stop it

2006-12-11 Thread Nuno Alexandre Carvalho
Hi, I have one java service that uses lucene as it's text search engine. This is working perfectly, but I don't know how to dump/backup it's filesystem index catalog. Can I simply do a hot copy, without stoping the service and with index open? Thanks in advance. -- Nuno Alexandre Carvalho ---

SegmentReader using too much memory?

2006-12-11 Thread Eric Jain
I've noticed that after stress-testing my application (uses Lucene 2.0) for I while, I have almost 200mb of byte[]s hanging around, the top two culprits being: 24 x SegmentReader.Norm.bytes = 112mb 2 x SegmentReader.ones = 16mb The second one isn't a big deal, but I wonder what's the e

VTD-XML 1.9 released

2006-12-11 Thread Jimmy Zhang
Version 1.9 of VTD-XML, available in C, C#, and Java, is now released. This version contains XPath-related performance enhancements and bug fixes. To download the latest release, please visit http://sourceforge.net/project/showfiles.php?group_id=110612. For latest performance report, please vis

Re: Bugs in contrib/snowball/.../SnowballProgram.java -> Kraaij-Pohlmann gives Index-OOB Exception

2006-12-11 Thread Andreas Kohn
On 12/11/06, Daniel Naber <[EMAIL PROTECTED]> wrote: On Monday 11 December 2006 19:18, Andreas Kohn wrote: > After some debugging, and some tests with the original snowball > distribution from snowball.tartarus.org, it seems that the attached > change is needed to avoid the exception. The attac

Re: Bugs in contrib/snowball/.../SnowballProgram.java -> Kraaij-Pohlmann gives Index-OOB Exception

2006-12-11 Thread Doron Cohen
Andreas, I could generate the error as you describe. You can report this bug in http://issues.apache.org/jira/browse/LUCENE There seem to be a few updates in http://snowball.tartarus.org not reflected currently in Lucene - - SnowballProgram.java has this bug fix as you describe The algorithms

RE: Lucene id generation

2006-12-11 Thread Chris Hostetter
if you are trying to think of Lucene's docid as a meaningful number, you are doing something wrong. A lot of people want to view Lucene docids the same way they look at auto-incrimented unique keys in a database -- don't do that. Instead think of them as memory addresses in C or C++ ... they are

Re: de-boosting fields

2006-12-11 Thread Chris Hostetter
: Isn't it also true that using Field.Index.NO_NORMS when creating the field will : remove it from the scoring formula? I thought I read that somewhere, but now : can't find where. queries on fields with NO_NORMS will still contribute to the score, but the field *length* and/or field bosts won'

Bugs in contrib/snowball/.../SnowballProgram.java -> Kraaij-Pohlmann gives Index-OOB Exception

2006-12-11 Thread Andreas Kohn
Hi, while playing with the various stemmers of Lucene(-1.9.1), I got an index out of bounds exception: lucene-1.9.1>java -cp build/contrib/snowball/lucene-snowball-1.9.2-dev.jar net.sf.snowball.TestApp Kp bla.txt Exception in thread "main" java.lang.reflect.InvocationTargetException at su

Re: Using Lucene to search log files

2006-12-11 Thread Grant Ingersoll
Span Queries also return positional information On Dec 11, 2006, at 12:12 PM, Steven Rowe wrote: abdul aleem wrote: How to actually retrieve the content of search, Most of the examples in Lucene in Action Searcher gives the results found in number of documents but i coudln't find an API to r

Re: Using Lucene to search log files

2006-12-11 Thread Steven Rowe
abdul aleem wrote: > How to actually retrieve the content of search, > > Most of the examples in Lucene in Action > Searcher gives the results found in number of > documents > > but i coudln't find an API to retrieve the line or > paragraph where the search is matched Hi Abdul, I don't know w

RE: Lucene id generation

2006-12-11 Thread Ramana Jelda
I really lack this feature from lucene too. Whatever the requirements from Mohammed, There surely I see some improvements in search performance. My argument here is, why not lucene provides a mechanism to be able to provide custom document ids? > -Original Message- > From: Find Me [mailt

Re: Using Lucene to search log files

2006-12-11 Thread abdul aleem
Thanks Erick, I will take a look, Apologies but a basic question, How to actually retrieve the content of search, Most of the examples in Lucene in Action Searcher gives the results found in number of documents but i coudln't find an API to retrieve the line or paragraph where the search is ma

Re: Lucene id generation

2006-12-11 Thread Find Me
On 12/11/06, Waheed Mohammed <[EMAIL PROTECTED]> wrote: Hello, Is there a way to influence lucene's generation of ids while indexing. my requirement is. I want to have different indexes where no index should have ids that have been assigned to an index earlier. for instance IDX1 : {0.1

Re: Lucene id generation

2006-12-11 Thread Erick Erickson
I don't believe that this is possible. Or desirable. Lucene IDs are mutable, even within an index. That is, if you index docs that get, say, IDs 1, 2, 3, 4, 5 and delete doc 2 and optimize, Docs 4 and 5 get reassigned IDs 3 and 4 (or something similar). You're far better off controlling this your

Re: Lucene id generation

2006-12-11 Thread karl wettin
11 dec 2006 kl. 16.15 skrev Waheed Mohammed: Is there a way to influence lucene's generation of ids while indexing. If you speak of the Lucene "document number", then no. And are you aware of the fact that document numbers are eligable for change at any time (optimization) without giving

Re: Using Lucene to search log files

2006-12-11 Thread Erick Erickson
Then you really want to look at the classes that do the work with filters if you require milliseconds. You should be just fine On 12/11/06, abdul aleem <[EMAIL PROTECTED]> wrote: Many thanks to All, well kind of puzzled because ours is a fast moving log down to Milliseconds :( as we deal w

Lucene id generation

2006-12-11 Thread Waheed Mohammed
Hello, Is there a way to influence lucene's generation of ids while indexing. my requirement is. I want to have different indexes where no index should have ids that have been assigned to an index earlier. for instance IDX1 : {0.100} IDX2: {101...200} IDX3: {201...300} but not

Re: Using Lucene to search log files

2006-12-11 Thread abdul aleem
Aplogies, forget to mention there are great people around in this group, they are of great help as well :):) --- abdul aleem <[EMAIL PROTECTED]> wrote: > Many thanks to All, > > well kind of puzzled because ours is a fast moving > log > down to Milliseconds :( as we deal with forex on a > finan

Re: Using Lucene to search log files

2006-12-11 Thread abdul aleem
Many thanks to All, well kind of puzzled because ours is a fast moving log down to Milliseconds :( as we deal with forex on a financial system. Im sure there will be workarounds, actually most of the time it is enough to search within 2 log files of 1-5MB size, coz we are more intersted in second

Re: Using Lucene to search log files

2006-12-11 Thread mark harwood
>>Extend QueryParser to sort this out. The latest version in SVN has changed the default QueryParser behaviour to use RangeFilters instead of RangeQuerys - Original Message From: Mike Streeton <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, 11 December, 2006 1:35:47

RE: Using Lucene to search log files

2006-12-11 Thread Mike Streeton
I would use a RangeFilter instead of using the default Boolean query as this will always break at some point with Too many Boolean clauses. Extend QueryParser to sort this out. As far as extracting information from log files I would look at creating yourself a LogAnalyzer that can interpret the co

Re: Using Lucene to search log files

2006-12-11 Thread abdul aleem
Many thanks Grant, I will now dirty my hands with Lucene to get our requirements regards, Abdul --- Grant Ingersoll <[EMAIL PROTECTED]> wrote: > See below > > On Dec 11, 2006, at 7:04 AM, abdul aleem wrote: > > > Hi All, > > > > Im a Lucene newbie, > > > > > > Requirement : > > ==

Re: Using Lucene to search log files

2006-12-11 Thread Erick Erickson
As far as the appropriateness of Lucene, it's an open question, but I think it'd be fine. If it isn't, you have an "interesting" problem . About timestamps. This has been discussed a LOT on the thread, since they're not as straight-forward as you might assume. See the thread *"Date ranges - getti

Re: Using Lucene to search log files

2006-12-11 Thread Grant Ingersoll
See below On Dec 11, 2006, at 7:04 AM, abdul aleem wrote: Hi All, Im a Lucene newbie, Requirement : == a) Build a log viewer tool, search log files for keywords and time stamp b) files in production approx 200 logs per day and each log file may range from 1MB - 5MB Lucene

Using Lucene to search log files

2006-12-11 Thread abdul aleem
Hi All, Im a Lucene newbie, Requirement : == a) Build a log viewer tool, search log files for keywords and time stamp b) files in production approx 200 logs per day and each log file may range from 1MB - 5MB Lucene We wanted to utilize Lucene's search capabilities espec

Re: de-boosting fields

2006-12-11 Thread Antony Bowesman
Daniel Naber wrote: On Saturday 09 December 2006 02:25, Scott Smith wrote: What is the best way to do this? Is changing the boost the right answer? Can a field's boost be zero? Yes, just use: term1 term2 category1^0 category2^0. Erick's Filter idea is also useful. Isn't it also true that