Hi Paul, Thank you so much for answering my questions. It really helped. After some adjustment, basically setting mergeFactor to 1000 from the default value of 10, I can finished the whole job in 2.5 hours. I checked that during running time, only around 18% of memory is being used, and VIRT is always 1418m. I am thinking it may be restricted by JVM memory setting. But I run the data import command through web, i.e., http://<host>:<port>/solr/dataimport?command=full-import, how can I set the memory allocation for JVM? Thanks again!
JB --- On Thu, 5/21/09, Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com> wrote: > From: Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com> > Subject: Re: How to index large set data > To: solr-user@lucene.apache.org > Date: Thursday, May 21, 2009, 9:57 PM > check the status page of DIH and see > if it is working properly. and > if, yes what is the rate of indexing > > On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai <djian...@yahoo.com> > wrote: > > > > Hi, > > > > I have about 45GB xml files to be indexed. I am using > DataImportHandler. I started the full import 4 hours ago, > and it's still running.... > > My computer has 4GB memory. Any suggestion on the > solutions? > > Thanks! > > > > JB > > > > > > > > > > > > > > -- > ----------------------------------------------------- > Noble Paul | Principal Engineer| AOL | http://aol.com >