Clustering with Lucene?

2011-04-26 Thread vivek sar
Hi, I've been researching about clustering with Lucene. Here is what I've found so far, 1) Lucene clustering with Carrot2 - http://download.carrot2.org/head/manual/#section.getting-started.lucene - but, this seems suitable for only smaller size index (few hundred documents) -

Re: Clustering with Lucene?

2011-04-26 Thread vivek sar
.)? With the sizes you report Carrot2 won't work for you, I'm afraid, but Mahout may. Still, there's plenty of algorithms and preprocessing options to consider, so if you provide more background somebody may push you in the right direction. Dawid On Tue, Apr 26, 2011 at 1:49 PM, vivek sar

Re: Clustering with Lucene?

2011-04-26 Thread vivek sar
Thanks Dawid. I was trying to give some example, but this is not exactly our text. Our fields include things like user name, IP Address, Application Name, Port 3, Byte Count - all network related stuff. So, if user searches on certain IP address then we would need to group the result by user,

Re: background merge hit exception

2009-02-25 Thread vivek sar
Hi, We ran into the same issue (corrupted index) using Lucene 2.4.0. There was no outage or system reboot - not sure how could it get corrupted. Here is the exception, Caused by: java.io.IOException: background merge hit exception: _io5:c66777491 _nh9:c10656736 _taq:c2021563 _s8m:c1421051

Re: Background merge hit exception

2008-09-18 Thread vivek sar
(java.lang.Thread.UncaughtExceptionHandler) Mike On Sep 17, 2008, at 4:24 PM, vivek sar wrote: Hi, We have been running Lucene 2.3 for last few months with our application and all the sudden we have hit the following exception, java.lang.RuntimeException: java.io.IOException: background

Background merge hit exception

2008-09-17 Thread vivek sar
Hi, We have been running Lucene 2.3 for last few months with our application and all the sudden we have hit the following exception, java.lang.RuntimeException: java.io.IOException: background merge hit exception: _2uxy:c11345949 _2uxz:c150 _2uy0:c150 _2uy1:c150 _2uy2:c150 _2uy3:c150

java.lang.IllegalArgumentException: Segment is too large

2008-03-30 Thread vivek sar
Hi, I'm using 2.3.0 Lucene build and have following merge parameters, mergeFactor = 100 maxMergeDocs = 9 maxBufferedDocs = 1 maxRAMBufferSizeMB = 200 After running with this setting for a month without problem all the sudden I'm getting following exception,

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar
. This is running your latest IndexAccessor-021508 code. Any ideas (it's kind of urgent for us)? Thanks, -vivek On Fri, Feb 15, 2008 at 6:50 PM, vivek sar [EMAIL PROTECTED] wrote: Mark, Thanks for the quick fix. Actually, it is possible that there might had been simultaneous queries using

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar
can check? Thanks, -vivek On Thu, Feb 28, 2008 at 1:26 PM, vivek sar [EMAIL PROTECTED] wrote: Mark, We deployed our indexer (using defaultIndexAccessor) on one of the production site and getting this error, Caused by: java.util.concurrent.RejectedExecutionException

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar
this to the test cases. Just as a personal interest question, what has led you to setup your index this way? Adding partitions as it grows that is. - Mark vivek sar wrote: Mark, Yes, I think that's what precisely is happening. I call accessor.close, which shuts down all

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar
[EMAIL PROTECTED] wrote: vivek sar wrote: Mark, Just for my clarification, 1) Would you have indexStop and indexStart methods? If that's the case then I don't have to call close() at all. These new methods would serve as just cleaning up the caches and not closing

Re: DefaultIndexAccessor

2008-02-15 Thread vivek sar
Mark, There seems to be some issue with DefaultMultiIndexAccessor.java. I got following NPE exception, 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl - java.lang.NullPointerException at

Re: DefaultIndexAccessor

2008-02-15 Thread vivek sar
a foreign MultiSearcher somehow. I will keep looking and keep you posted. In the mean time, do you have any other data or code snippets to share? vivek sar wrote: Mark, There seems to be some issue with DefaultMultiIndexAccessor.java. I got following NPE exception, 2008-02

Re: DefaultIndexAccessor

2008-02-15 Thread vivek sar
PROTECTED] wrote: Here is the fix: https://issues.apache.org/jira/browse/LUCENE-1026 vivek sar wrote: Mark, There seems to be some issue with DefaultMultiIndexAccessor.java. I got following NPE exception, 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl

Luke for Lucene 2.3?

2008-01-29 Thread vivek sar
Hi, Has anyone tried Luke v0.7.1 with the latest Lucene build, v2.3? I'm getting Unknown format version: -4 error when opening Lucene 2.3 index with Luke 0.7.1. Is there any upgraded version of Luke anywhere? I also read something about web-based Luke, but can't find it in the contrib in 2.3,

Re: Using RangeFilter

2008-01-24 Thread vivek sar
I've a field as NO_NORM, does it has to be untokenized to be able to sort on it? On Jan 21, 2008 12:47 PM, Antony Bowesman [EMAIL PROTECTED] wrote: vivek sar wrote: I need to be able to sort on optime as well, thus need to store it . Lucene's default sorting does not need the field

Re: Archiving Index using partitions

2008-01-24 Thread vivek sar
a single Document, if that is what you are asking. But you can create multiple smaller (e.g. weekly indices) instead one large one, and then every 2 weeks archive the one 2 weeks old. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar

Archiving Index using partitions

2008-01-21 Thread vivek sar
Hi, As a requirement I need to be able to archive any indexes older than 2 weeks (due to space and performance reasons). That means I would need to maintain weekly indexes. Here are my questions, 1) What's the best way to partition indexes using Lucene? 2) Is there a way I can partition

Re: Optimize for large index size

2008-01-20 Thread vivek sar
really mean maxMergeDocs and not maxBufferedDocs? Larg(er) maxBufferedDocs will speed up indexing. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Friday, January 18, 2008

Using RangeFilter

2008-01-19 Thread vivek sar
Hi, I have a requirement to filter out documents by date range. I'm using RangeFilter (in combination to FilteredQuery) to do this. I was under the impression the filtering is done on documents, thus I'm just storing the date values, but not indexing them. As every new document would have a new

Re: Using RangeFilter

2008-01-19 Thread vivek sar
and not store them if index size is a concern? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Saturday, January 19, 2008 8:06:25 PM Subject: Using RangeFilter Hi, I have

Optimize for large index size

2008-01-18 Thread vivek sar
Hi, We are using Lucene 2.2. We have an index of size 70G (within 3-4 days) and growing. We run optimize pretty frequently (once every hour - due to large number of index updates every min - can be up to 100K new documents every min). I have seen every now and then the optimize takes 3-4 hours

Re: Optimize for large index size

2008-01-18 Thread vivek sar
, -vivek On Jan 18, 2008 2:37 AM, Michael McCandless [EMAIL PROTECTED] wrote: vivek sar wrote: Hi, We are using Lucene 2.2. We have an index of size 70G (within 3-4 days) and growing. We run optimize pretty frequently (once every hour - due to large number of index updates every min

Re: restoring a corrupt index?

2007-11-13 Thread vivek sar
We have seen similar exceptions (with Lucene 2.2) when were doing the following mistakes, 1) Not closing the old searchers and re-creating a new one for every new search (fixed it by closing the searcher every time, if you want you could only one searcher instance as well) 2) Not having any jvm

Re: Help with Lucene Indexer crash recovery

2007-10-05 Thread vivek sar
McCandless [EMAIL PROTECTED] wrote: vivek sar [EMAIL PROTECTED] wrote: We are using Lucene 2.3. Do you mean Lucene 2.2? Your stack trace seems to line up with 2.2, and 2.3 isn't quite released yet. The problem we are facing is quite a few times if our application is stopped (killed

Help with Lucene Indexer crash recovery

2007-10-04 Thread vivek sar
Hi, We are using Lucene 2.3. The problem we are facing is quite a few times if our application is stopped (killed or crash) while Indexer is doing its job, the next time when we bring up the application the Indexer fails to run with the following exception, 2007-10-04 12:29:53,089 ERROR [PS