Re: How to index large set data

2009-05-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
set the JVM memory when I use DIH through web command full-import? Thanks! JB --- On Fri, 5/22/09, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com wrote: From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com Subject: Re: How to index large set data To: Jianbin Dai djian

Re: How to index large set data

2009-05-24 Thread Jianbin Dai
embedded client to do the push, would it be more efficient than DIH? --- On Fri, 5/22/09, Grant Ingersoll gsing...@apache.org wrote: From: Grant Ingersoll gsing...@apache.org Subject: Re: How to index large set data To: solr-user@lucene.apache.org Date: Friday, May 22, 2009, 5

Re: How to index large set data

2009-05-24 Thread nk 11
can I set the JVM memory when I use DIH through web command full-import? Thanks! JB --- On Fri, 5/22/09, Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com wrote: From: Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com Subject: Re: How to index large set data To: Jianbin Dai

Re: How to index large set data

2009-05-22 Thread Jianbin Dai
--- On Thu, 5/21/09, Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com wrote: From: Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com Subject: Re: How to index large set data To: solr-user@lucene.apache.org Date: Thursday, May 21, 2009, 10:39 PM what is the total no:of docs created ?  I guess it may

Re: How to index large set data

2009-05-22 Thread Grant Ingersoll
--- On Thu, 5/21/09, Noble Paul നോബിള്‍ नो ब्ळ् noble.p...@corp.aol.com wrote: From: Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com Subject: Re: How to index large set data To: solr-user@lucene.apache.org Date: Thursday, May 21, 2009, 10:39 PM what is the total no:of docs created ? I

Re: How to index large set data

2009-05-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
: Re: How to index large set data To: solr-user@lucene.apache.org Date: Thursday, May 21, 2009, 10:39 PM what is the total no:of docs created ?  I guess it may not be memory bound. indexing is mostly amn IO bound operation. You may be able to get a better perf if a SSD is used (solid state

Re: How to index large set data

2009-05-22 Thread Otis Gospodnetic
To: solr-user@lucene.apache.org; noble.p...@gmail.com Sent: Friday, May 22, 2009 3:42:04 AM Subject: Re: How to index large set data about 2.8 m total docs were created. only the first run finishes. In my 2nd try, it hangs there forever at the end of indexing, (I guess right before commit

Re: How to index large set data

2009-05-22 Thread Jianbin Dai
: Re: How to index large set data To: solr-user@lucene.apache.org Date: Friday, May 22, 2009, 7:26 AM Hi, Those settings are a little crazy.  Are you sure you want to give Solr/Lucene 3G to buffer documents before flushing them to disk?  Are you sure you want to use the mergeFactor of 1000

Re: How to index large set data

2009-05-22 Thread Jianbin Dai
If I do the xml parsing by myself and use embedded client to do the push, would it be more efficient than DIH? --- On Fri, 5/22/09, Grant Ingersoll gsing...@apache.org wrote: From: Grant Ingersoll gsing...@apache.org Subject: Re: How to index large set data To: solr-user@lucene.apache.org

Re: How to index large set data

2009-05-22 Thread Otis Gospodnetic
-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jianbin Dai djian...@yahoo.com To: solr-user@lucene.apache.org Sent: Friday, May 22, 2009 11:05:27 AM Subject: Re: How to index large set data I dont know exactly what is this 3G Ram buffer

Re: How to index large set data

2009-05-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
, Grant Ingersoll gsing...@apache.org wrote: From: Grant Ingersoll gsing...@apache.org Subject: Re: How to index large set data To: solr-user@lucene.apache.org Date: Friday, May 22, 2009, 5:38 AM Can you parallelize this?  I don't know that the DIH can handle it, but having multiple threads

Re: How to index large set data

2009-05-22 Thread Jianbin Dai
/22/09, Grant Ingersoll gsing...@apache.org wrote: From: Grant Ingersoll gsing...@apache.org Subject: Re: How to index large set data To: solr-user@lucene.apache.org Date: Friday, May 22, 2009, 5:38 AM Can you parallelize this?  I don't know that the DIH can handle it, but having

How to index large set data

2009-05-21 Thread Jianbin Dai
Hi, I have about 45GB xml files to be indexed. I am using DataImportHandler. I started the full import 4 hours ago, and it's still running My computer has 4GB memory. Any suggestion on the solutions? Thanks! JB

Re: How to index large set data

2009-05-21 Thread Erick Erickson
This isn't much data to go on. Do you have any idea what your throughput is?How many documents are you indexing? one 45G doc or 4.5 billion 10 character docs? Have you looked at any profiling data to see how much memory is being consumed? Are you IO bound or CPU bound? Best Erick On Thu, May 21,

Re: How to index large set data

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
check the status page of DIH and see if it is working properly. and if, yes what is the rate of indexing On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai djian...@yahoo.com wrote: Hi, I have about 45GB xml files to be indexed. I am using DataImportHandler. I started the full import 4 hours

Re: How to index large set data

2009-05-21 Thread Jianbin Dai
noble.p...@corp.aol.com wrote: From: Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com Subject: Re: How to index large set data To: solr-user@lucene.apache.org Date: Thursday, May 21, 2009, 9:57 PM check the status page of DIH and see if it is working properly. and if, yes what is the rate

Re: How to index large set data

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
wrote: From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com Subject: Re: How to index large set data To: solr-user@lucene.apache.org Date: Thursday, May 21, 2009, 9:57 PM check the status page of DIH and see if it is working properly. and if, yes what is the rate of indexing On Thu, May