On Mon, Oct 18, 2010 at 5:26 PM, Jason, Kim <hialo...@gmail.com> wrote: > > Hi, Gora > I haven't tried yet indexing huge amount of xml files through curl or pure > java(like a post.jar). > Indexing through xml is really fast? > How many files did you index? And How did it(using curl or pure java)? [...]
We did it through curl. There were some 3.5 million XML files, and some 60 fields in the Solr schema, with minor tokenising, though with some facets. A total of about 40GB of data. We used five Solr instances, and five cores on each instance. From what I recall, it took 6h, though here we might have well been limited by the read speed on a slow network drive that held the data. If done in this way, one might need to merge the data from the various cores, a task which took us about 1.5h. Regards, Gora