I would use the Unix "split" command. You can give it a line count.

% split -l 14000000 myfile.csv

You can use "wc -l" to count the lines.

wunder

On Nov 4, 2012, at 10:23 PM, Gora Mohanty wrote:

> On 5 November 2012 11:11, mitra <mitra.re...@ornext.com> wrote:
> 
>> Hello all
>> 
>> i have a csv file of size 10 gb which i have to index using solr
>> 
>> my question is how to index the csv in such a way so that
>> i can get two separate index files of which one of the index is the index
>> for the first half of the csv and the second index is the index for the
>> second half of the csv
>> 
> 
> I do not think that there is any automatic way to do that in Solr.
> Could you not split the CSV file yourself, and index different
> halves of it to different Solr indices?
> 
> 
>> 
>> 
>> also coming to index settings what should be the optimal value of auto
>> commit maxdocs and maxtime for the 10gb csv file it has around 28 milllion
>> records
>> 
> 
> That would depend on various local factors like how much RAM
> you have to give to Solr, network speed, etc. The best way would
> be to experiment with these settings. Usually, your goal should
> to minimise auto-commits, so you can try setting these numbers
> to high values. You could also disable auto-commit altogether, and
> do manual commits.
> 
> Given your data size, I think that the indexing should be quite fast
> on reasonable hardware.
> 
> Regards,
> Gora




Reply via email to