At this time we are not leveraging the NRT functionality. This is the
initial data load process where the idea is to just add all 200 millions
records first. Than do a single commit at the end to make them searchable.
We actually disabled auto commit at this time.

We have tried to leave auto commit enabled during the initial data load
process and ran into multiple issues that leads to botched loading process.

On Thu, Mar 22, 2012 at 2:15 PM, Mark Miller <markrmil...@gmail.com> wrote:

>
> On Mar 21, 2012, at 9:37 PM, I-Chiang Chen wrote:
>
> > We are currently experimenting with SolrCloud functionality in Solr 4.0.
> > The goal is to see if Solr 4.0 trunk with is current state is able to
> > handle roughly 200million documents. The document size is not big around
> 40
> > fields no more than a KB, most of which are empty majority of times.
> >
> > The setup we have is 4 servers w/ 2 shards w/ 2 servers per shard. We are
> > running in Tomcat.
> >
> > The questions are giving the approximate data volume, is it a realistic
> to
> > expect above setup can handle it.
>
> So 100 million docs per machine essentially? Totally depends on the
> hardware and what features you are using - but def in the realm of
> possibility.
>
> > Giving the number of documents should
> > commit every x documents or rely on auto commits?
>
> The number of docs shouldn't really matter here. Do you need near real
> time search?
>
> You should be able to commit about as frequently as you'd like with NRT
> (eg every 1 second if you'd like) - either using soft auto commit or
> commitWithin.
>
> Then you want to do a hard commit less frequently - every minute (or more
> or less) with openSearcher=false.
>
> eg
>
>     <autoCommit>
>       <maxTime>15000</maxTime>
>       <openSearcher>false</openSearcher>
>     </autoCommit>
>
> >
> > --
> > -IC
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>


-- 
-IC

Reply via email to