HI,
The following is the scenario I have:
I have a java program that reads multiple files from the disk.
* There are 3 files (A,B,C) that are read and populated into 3
collections (arraylist).
* There are 2 files input1 and input2 that act as input to my
program.
*
Hello Joris!
What types of files are you trying to execute or modify? Distributed
cache files? Your own files? Files already present on the OS?
If its distributed cache stuff, or your own creations/etc., one thing
you can try it to set "keep.failed.task.files" to "true" for your job
and fail a ta
Hi,
I'm trying to set permissions for the tasktracker and/or mapred user.
Basically I'm trying to execute and modify files from within the
mapper, but the code errors out stating that the mapred user on the
slave node doesn't have the right permissions to modify/execute files.
Any help or tips on
Your last question is not as straight forward and would be better answered by
running it on your own cluster and looking at the job tracker history. Data
skew and partitioning, map and reduce slots available,
mapred.reduce.slowstart.completed.maps, and several other things have the
potential to
Hello Bejoy,
Moving this discussion to the cdh-u...@cloudera.org lists
(Subscribe-able at
https://groups.google.com/a/cloudera.org/group/cdh-user/topics) since
it may be Cloudera-VM specific.
(bcc'd common-user and mapreduce-user. Please avoid cross posting in future! :))
My comments inline.
On
Hi
I was using cloudera training VM to test out my map reduce codes
which was working really well. Now i do have some requirements to run
hive,hbase,Sqoop as well on on this VM for testing purposes. For hive and
hbase I'm able to log in on to the cli client, but none of the commands are
get
* open a SequenceFile.Reader on the sequence file
* in a loop, call next(key,val) on the reader to read the next key/val pair
in the file (see: http://hadoop.apache.org/**common/docs/current/api/org/**
apache/hadoop/io/SequenceFile.**Reader.html#next(org.apache.**
hadoop.io.Writable,%20org.**apache
Hi everyBody,
In my application the treatment of the whole dataset (that we called
CycleWorkflow) may have a duration of several weeks and we want (mandatory) to
split the CycleWorkflow into multiple DayWorkflow.
The actual system use a traditionnal RDBMS approach and use SQL OFFSET LIMIT to
sp