Re: Parallelism for small input data

2013-01-15 Thread Dipesh Kumar Singh
Thanks Dmitriy and Vitalii... !! I am able to control number of mappers by setting the split size. And, yes there isn't any reason of re-reading the dictionary, except that i was porting an existing code. I will re-implement to read it once and check the performance. Regards, Dipesh On Mon, Jan

Re: Parallelism for small input data

2013-01-14 Thread Vitalii Tymchyshyn
Well, if you will set split size to 1, you should get per-line split. 2013/1/13 Dipesh Kumar Singh > Hello users, > > I have an input file (1.2 MB) which contains list of words/phrases in every > new line. I am reading each phrase per line and passing it to udf to > correct/check that phrase. >

Re: Parallelism for small input data

2013-01-13 Thread Dmitriy Ryaboy
"The udf (simple extends eval func) refers and reads a dictionary file of 6 MB for each input phrase." Any reason to keep re-reading the dictionary instead of just reading it once? D On Sun, Jan 13, 2013 at 4:47 AM, Dipesh Kumar Singh wrote: > The udf (simple extends eval func) refers and reads