Multiple files as input to a mapreduce job

2011-09-08 Thread Shreya.Pal
HI, The following is the scenario I have: I have a java program that reads multiple files from the disk. * There are 3 files (A,B,C) that are read and populated into 3 collections (arraylist). * There are 2 files input1 and input2 that act as input to my program. *

Re: Setting permissions for slave nodes running mapper

2011-09-08 Thread Harsh J
Hello Joris! What types of files are you trying to execute or modify? Distributed cache files? Your own files? Files already present on the OS? If its distributed cache stuff, or your own creations/etc., one thing you can try it to set "keep.failed.task.files" to "true" for your job and fail a ta

Setting permissions for slave nodes running mapper

2011-09-08 Thread Joris Poort
Hi, I'm trying to set permissions for the tasktracker and/or mapred user. Basically I'm trying to execute and modify files from within the mapper, but the code errors out stating that the mapred user on the slave node doesn't have the right permissions to modify/execute files. Any help or tips on

RE: No Mapper but Reducer

2011-09-08 Thread GOEKE, MATTHEW (AG/1000)
Your last question is not as straight forward and would be better answered by running it on your own cluster and looking at the job tracker history. Data skew and partitioning, map and reduce slots available, mapred.reduce.slowstart.completed.maps, and several other things have the potential to

Re: Hive and Hbase not working with cloudera VM

2011-09-08 Thread Harsh J
Hello Bejoy, Moving this discussion to the cdh-u...@cloudera.org lists (Subscribe-able at https://groups.google.com/a/cloudera.org/group/cdh-user/topics) since it may be Cloudera-VM specific. (bcc'd common-user and mapreduce-user. Please avoid cross posting in future! :)) My comments inline. On

Hive and Hbase not working with cloudera VM

2011-09-08 Thread Bejoy KS
Hi I was using cloudera training VM to test out my map reduce codes which was working really well. Now i do have some requirements to run hive,hbase,Sqoop as well on on this VM for testing purposes. For hive and hbase I'm able to log in on to the cli client, but none of the commands are get

Re: How to Create an effective chained MapReduce program.

2011-09-08 Thread ilyal levin
* open a SequenceFile.Reader on the sequence file * in a loop, call next(key,val) on the reader to read the next key/val pair in the file (see: http://hadoop.apache.org/**common/docs/current/api/org/** apache/hadoop/io/SequenceFile.**Reader.html#next(org.apache.** hadoop.io.Writable,%20org.**apache

How to count records in FileInputFormat (MapFile, SequenceFile ?)

2011-09-08 Thread MONTMORY Alain
Hi everyBody, In my application the treatment of the whole dataset (that we called CycleWorkflow) may have a duration of several weeks and we want (mandatory) to split the CycleWorkflow into multiple DayWorkflow. The actual system use a traditionnal RDBMS approach and use SQL OFFSET LIMIT to sp