Re: Hadoop file uploads

2011-10-04 Thread visioner sadak
Thanks a lot wellington and bejoy for your inputs will try out this api and sequence file On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil < wellington.chevre...@gmail.com> wrote: > Yes, Sadak, > > Within this API, you'll copy your files into Hadoop HDFS as you do > when writing to an Out

Re: Hadoop file uploads

2011-10-04 Thread Wellington Chevreuil
Yes, Sadak, Within this API, you'll copy your files into Hadoop HDFS as you do when writing to an OutputStream. It will be replicated in your cluster's HDFS then. Cheers. 2011/10/4 visioner sadak : > Hey thanks wellington just a thought will my data be replicated as well coz > i thought tht mapp

Re: Hadoop file uploads

2011-10-04 Thread Bejoy KS
Yes Sadak. The API would do the splitting for you, no need of MR for that. It'd be better keeping the file sizes atleast same as an hdfs block size. Sequence file is definitely a good choice. If you are looking out for a process and then archival of input then look into HAR (hadoop archives as well

Re: Hadoop file uploads

2011-10-04 Thread Bejoy KS
Hi Sadak You really don't need to fire a map reduce job to copy files from a local file system to hdfs. You can do it in two easy ways *Using linux CLI* - if you are going in with a shell script. The most convenient option and handy. hadoop fs -copyFromLocal *Using JAVA API* //load t

Re: Hadoop file uploads

2011-10-04 Thread visioner sadak
Hey thanks wellington just a thought will my data be replicated as well coz i thought tht mapper does the job of breaking data in to pieces and distribution and reducer will do the joining and combining while fetching data back thts why was confused to use a MR..can i use this API for uploading a l

Re: Hadoop file uploads

2011-10-04 Thread Wellington Chevreuil
Hey Sadak, you don't need to write a MR job for that. You can make your java program use Hadoop Java API for that. You would need to use FileSystem (http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html) and Path (http://hadoop.apache.org/common/docs/current/api/in

Hadoop file uploads

2011-10-04 Thread visioner sadak
Hello guys, I would like to know how to do file uploads in HDFS using java,is it to be done using map reduce what if i have a large number of small files should i use sequence file along with map reduce???,It will be great if you can provide some sort of information...

Can we use Inheritance hierarchy to specify the outputvalue class for mapper which is also inputvalue class for the reducer ?

2011-10-04 Thread Anuja Kulkarni
Hi, We have class hierarchy for output value for both mapper as well as reducer class   as parent (abstract class) , child1,child2,…   We have mapper class which is specified with its outputvalue class as parent class ; the map function will emit  either  child1 or child2 depending on the logi

Re: Submitting a Hadoop task from withing a reducer

2011-10-04 Thread Joey Echeverria
Yes. The reason I pointed it to him was that it seems like he's trying to do something with Hadoop for which MapReduce may not be right execution model. Yarn/MRv2 gives you the ability to try other execution models. As you pointed out, it may require some extra development, but it is more flexible

RE: Submitting a Hadoop task from withing a reducer

2011-10-04 Thread GOEKE, MATTHEW (AG/1000)
Joey, Is yarn just a synonym for MRv2? And if so he would still have to create a custom application master for his job type right? Matt -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Tuesday, October 04, 2011 11:06 AM To: mapreduce-user@hadoop.apache.org Subj

RE: Submitting a Hadoop task from withing a reducer

2011-10-04 Thread GOEKE, MATTHEW (AG/1000)
As long as your reduce task can kick off the MR job asynchronously then it shouldn't be too much of an issue but it could very quickly result in a deadlock otherwise. If you set this up as two stages 1) to kick off the recursive MR and 2) analyze the final result set then it should work but off

Re: Submitting a Hadoop task from withing a reducer

2011-10-04 Thread Joey Echeverria
You may want to check out Yarn, coming in Hadoop 0.23: https://issues.apache.org/jira/browse/MAPREDUCE-279 -Joey On Tue, Oct 4, 2011 at 11:45 AM, Yaron Gonen wrote: > Hi, > Hadoop tasks are always stacked to form a linear user-managed workflow (a > reduce step cannot start before all previous m

Submitting a Hadoop task from withing a reducer

2011-10-04 Thread Yaron Gonen
Hi, Hadoop tasks are always stacked to form a linear user-managed workflow (a reduce step cannot start before all previous mappers have stopped etc). This may be problematic in recursive tasks: for example in a BFS we will not get any output until the longest branch has been reached. In order to so