On 04/22/2011 09:09 PM, W.P. McNeill wrote:
I want to create a sequence file on my local harddrive. I want to write
something like this:
LocalFileSystem fs = new LocalFileSystem();
Configuration configuration = new Configuration();
Try doing this instead:
Configuration
Is it possible to change the logging level for an individual job? (As
opposed to the cluster as a whole.) E.g., is there some key that I can
set on the job's configuration object that would allow me to bump up the
logging from info to debug just for that particular job?
Thanks,
DR
On 04/06/2011 08:40 PM, Haruyasu Ueda wrote:
Hi all,
I'm writing M/R java program.
I want to abort a job itself in a map task, when the map task found
irregular data.
I have two idea to do so.
1. execulte bin/hadoop -kill jobID in map task, from slave machine.
2. raise an IOException to
On 04/07/2011 03:39 AM, Guy Doulberg wrote:
Hey,
I have been developing Map/Red jars for a while now, and I am still not
comfortable with the developing environment I gathered for myself (and the team)
I am curious how other Hadoop developers out-there, are developing their jobs...
What IDE
On 03/31/2011 05:13 PM, W.P. McNeill wrote:
I'm running a big job on my cluster and a handful of attempts are failing
with a Too many fetch-failures error message. They're all on the same
node, but that node doesn't appear to be down. Subsequent attempts succeed,
so this looks like a transient
They do, but IIRC, they recently announced that they're going to be
discontinuing it.
DR
On Thu, March 24, 2011 8:10 pm, Rita wrote:
Thanks everyone for your replies.
I knew Cloudera had their release but never knew Y! had one too...
On Thu, Mar 24, 2011 at 5:04 PM, Eli Collins
I would try implementing this using an ArrayWritable, which contains an
array of IntWritables.
HTH,
DR
On 03/17/2011 05:04 PM, maha wrote:
Hello,
I'm stuck with this for two days now ...I found a previous post
discussing this, but not with arrays.
I know how to write Writable class with
On 03/16/2011 01:35 PM, W.P. McNeill wrote:
On HDFS, anyone can run hadoop fs -rmr /* and delete everything.
Not sure how you have your installation set but on ours (we installed
Cloudera CDH), only user hadoop has full read/write access to HDFS.
Since we rarely either login as user hadoop,
On 02/11/2011 05:43 AM, Nitin Khandelwal wrote:
Hi,
I want to give a folder as input path to Map Red. Each Task should read one
file out of that folder at once . i was using it before in 0.19 using
multiFileSplit Format and my own Input format extending it. can u plz tell
how to do the same in
On 02/08/2011 05:01 AM, Jun Young Kim wrote:
Hi,
Multipleoutputs supports to have named outputs as a result of a hadoop.
but, it has inconvenient restrictions to have it.
only, alphabet characters are valid as a named output.
A ~ Z
a ~ z
0 ~ 9
are only characters we can take.
I believe if I
On 02/03/2011 12:16 PM, Keith Wiley wrote:
I've seen this asked before, but haven't seen a response yet.
If the input to a streaming job is not actual data splits but simple
HDFS file names which are then read by the mappers, then how can data
locality be achieved.
Likewise, is there any
On 12/10/2010 02:16 PM, Harsh J wrote:
Hi,
On Thu, Dec 2, 2010 at 10:40 PM, Matt Tanquarymatt.tanqu...@gmail.com wrote:
I am using MultipleOutputs to split a mapper input into about 20
different files. Adding this split has had an extremely adverse effect
on performance. Is MultipleOutputs
On 11/11/2010 02:52 PM, Da Zheng wrote:
Hello,
I wrote a MapReduce program and ran it on a 3-node hadoop cluster, but
its running time varies a lot, from 2 minutes to 3 minutes. I want to
understand how time is used by the map phase and the reduce phase, and
hope to find the place to improve
On 09/21/2010 03:17 AM, Jing Tie wrote:
I am still suffering from the problem... Did anyone encounter it
before? Or any suggestions?
Many thanks in advance!
Jing
On Fri, Sep 17, 2010 at 5:19 PM, Jing Tietiej...@gmail.com wrote:
Dear all,
I am having this exception when starting jobtracker,
It certainly is! I wasted a few hours on that a couple of weeks back.
DR
On 09/16/2010 02:58 AM, Lance Norskog wrote:
After this, if you add anything to the conf object, it does not get
added to the job. This is a source of confusion.
Mark Kerzner wrote:
Thanks!
Mark
On Wed, Sep 15, 2010
On 09/15/2010 11:50 AM, Arv Mistry wrote:
Hi,
Is it possible to run multiple data nodes on a single machine? I
currently have a machine with multiple disks and enough disk capacity
for replication across them. I don't need redundancy at the machine
level but would like to be able to handle a
On 08/31/2010 12:58 PM, Mark wrote:
I have a question regarding outputting Writable objects. I thought all
Writables know how to serialize themselves to output.
For example I have an ArrayWritable of strings (or Texts) but when I
output it to a file it shows up as
On 08/31/2010 02:09 PM, Mark wrote:
On 8/31/10 10:07 AM, David Rosenstrauch wrote:
On 08/31/2010 12:58 PM, Mark wrote:
I have a question regarding outputting Writable objects. I thought all
Writables know how to serialize themselves to output.
For example I have an ArrayWritable of strings
On 08/25/2010 12:40 PM, Mithila Nagendra wrote:
In order to avoid this I was thinking of
passing the range boundaries to the partitioner. How would I do that? Is
there an alternative? Any suggestion would prove useful.
We use a custom partitioner, for which we pass in configuration data
that
If you define a Hadoop object as implementing Configurable, then its
setConf() method will be called once, right after it gets instantiated.
So each partitioner that gets instantiated will have its setConf()
method called right afterwards.
I'm taking advantage of that fact by calling my own
I had a job that I ran a few days ago that rolled over to the Job
tracker history. Now when I go view it in the history viewer although I
can see basic stats such as total # records in/out, I can no longer see
all the counter values (i.e, most notably my own custom counter values).
Is there
On 08/12/2010 01:42 PM, Rares Vernica wrote:
I forgot to mention that in my cluster the HDFS replication is set to
1. I know this is not recommended but I only have 5 nodes in the
cluster, there are no failures
There will be! :-)
DR
Someone sent this email to the commons-user list a while back, but it
seems like it slipped through the cracks. We're starting to dig into
some hard-core Hadoop development and just came upon this same issue,
though.
Anyone know if there's any particular reason why the new Partitioner
class
On 08/04/2010 12:30 PM, Owen O'Malley wrote:
On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote:
Anyone know if there's any particular reason why the new Partitioner
class doesn't implement JobConfigurable? (And, if not, whether there's
any plans to fix this omission?) We're working
On 08/04/2010 01:55 PM, Wilkes, Chris wrote:
On Aug 4, 2010, at 10:50 AM, David Rosenstrauch wrote:
On 08/04/2010 12:30 PM, Owen O'Malley wrote:
On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote:
Anyone know if there's any particular reason why the new Partitioner
class doesn't
On 07/14/2010 06:58 AM, abc xyz wrote:
Hi everyone,
When hadoop is running in fully-distributed mode and I am not the cluster
administrator, instead I just can execute my programs on the cluster, how can I
get access to the log files of the programs that I run on the cluster? I want to
see the
Thanks much for the helpful responses everyone. This very much helped
clarify our thinking on the code design. Sounds like all other things
being equal, sequence files are the way to go. Again, thanks again for
the advice, all.
DR
On 07/05/2010 03:47 AM, Aaron Kimball wrote:
David,
I
Our team is still new to Hadoop, and a colleague and I are trying to
make a decision on file formats. The arguments are:
* We should use a SequenceFile (binary) format as it's faster for the
machine to read than parsing text, and the files are smaller.
* We should use a text file format as
On 06/28/2010 10:09 AM, legolas wrote:
Hi,
I am wondering whether Hadoop has some dependencies on ZooKeeper or not. I
mean when I download
http://apache.thelorne.com/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
does it has ZooKeeper with it or I should download zoo keeper separately.
On 05/06/2010 11:09 AM, Alan Miller wrote:
Not sure if this is the right list for this question, but.
Is it possible to determine which host actually processed my MR job?
Regards,
Alan
I'm curious: why would you need to know?
DR
Having an issue with host names on my new Hadoop cluster.
The cluster is currently 1 name node and 2 data nodes, running in a
cloud vendor data center. All is well with general operations of the
cluster - i.e., name node and data nodes can talk just fine, I can
read/write to/from the HDFS,
31 matches
Mail list logo