Re: Best practice for batch file conversions

2011-02-07 Thread Sonal Goyal
Hi, You can use FileStreamInputFormat which returns the file stream as the value. https://github.com/sonalgoyal/hiho/tree/hihoApache0.20/src/co/nubetech/hiho/mapreduce/lib/input You need to remember that you lose data locality by trying to manipulate the file as a whole, but in your case, the re

Re: Best practice for batch file conversions

2011-02-07 Thread Harsh J
Extend FileInputFormat, and write your own binary-format based implementation of it, and make it non-splittable (isSplitable should return false). This way, a Mapper would get a whole file, and you shouldn't have block-splitting issues. On Tue, Feb 8, 2011 at 6:37 AM, felix gao wrote: > Hello use

Best practice for batch file conversions

2011-02-07 Thread felix gao
Hello users of hadoop, I have a task to convert large binary files from one format to another. I am wondering what is the best practice to do this. Basically, I am trying to get one mapper to work on each binary file and i am not sure how to do that in hadoop properly. thanks, Felix

Re: Multiple queues question

2011-02-07 Thread Sonal Goyal
I think the CapacityScheduler is the one to use with multiple queues, see http://hadoop.apache.org/common/docs/r0.19.2/capacity_scheduler.html Thanks and Regards, Sonal Connect Hadoop with databases, Salesforce, FTP servers and others