I would use the Unix "split" command. You can give it a line count.
% split -l 14000000 myfile.csv You can use "wc -l" to count the lines. wunder On Nov 4, 2012, at 10:23 PM, Gora Mohanty wrote: > On 5 November 2012 11:11, mitra <mitra.re...@ornext.com> wrote: > >> Hello all >> >> i have a csv file of size 10 gb which i have to index using solr >> >> my question is how to index the csv in such a way so that >> i can get two separate index files of which one of the index is the index >> for the first half of the csv and the second index is the index for the >> second half of the csv >> > > I do not think that there is any automatic way to do that in Solr. > Could you not split the CSV file yourself, and index different > halves of it to different Solr indices? > > >> >> >> also coming to index settings what should be the optimal value of auto >> commit maxdocs and maxtime for the 10gb csv file it has around 28 milllion >> records >> > > That would depend on various local factors like how much RAM > you have to give to Solr, network speed, etc. The best way would > be to experiment with these settings. Usually, your goal should > to minimise auto-commits, so you can try setting these numbers > to high values. You could also disable auto-commit altogether, and > do manual commits. > > Given your data size, I think that the indexing should be quite fast > on reasonable hardware. > > Regards, > Gora