d not find the the
"yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs"
setting in Cloudera Manager (I know that is cloudera specific) but I think
I can set it manually if anyone thinks that is worth trying.
Best Regards,
Ed Dorsey
On Fri, Oct 16, 2015 at 3:41 PM, ed <e
assigned? I searched through JIRA but did not see any open issues that
might relate to the error we're seeing. Are there any work around to this
or has anyone seen this happen before? Please let me know if there is any
other information I can provide.
Best Regards,
Ed Dorsey
within what
the nodes have (6 nodes with 90GB each). No errors in YARN either.
Thank you!
Best,
Ed
don't want to add anything to
the /etc/hosts files - shouldn't have to since the long and short
names all resolve properly.
Seeing hostname field in the error message has the ip field, I tried
using dfs.client.use.datanode.hostname = true but no change.
Any help appreciated!
-Ed
have added to you hosts.
Please also check your exclude file and third point is to increase your dn
heapsize and start it.
Thanks
On Jul 27, 2014 1:01 AM, Ed Sweeney ed.swee...@falkonry.com wrote:
All,
New AWS cluster with Cloudera 4.3 RPMs.
dfs.hosts contains 3 host names, they all
has seen this, has diagnostic tips, or best of all, a solution,
please let me know!
Thanks,
Adam
--
-
*Ed Serrano*
Mobile: 972-897-5443
Sorry for the layman question.
Whats the best way of writing data into HDFS from outside of the cluster. My
customer is looking for wire speed data ingest into HDFS. We are
considering flume, but initial performance results from flume are very
discouraging.
Thanks in advance.
Steve
Sorry for this novice question. I am trying to find the best way of moving
(Copying) data in and out of HDFS. There are bunch of tools available and I
need to pick the one which offers the easiest way. I have seen MapR
presentation, who claim to offer direct NFS mounts to feed data into HDFS.
:40 PM, Steve Ed sediso...@gmail.com wrote:
I am a newbie to Hadoop and trying to understand how to Size a Hadoop
cluster.
What are factors I should consider deciding the number of datanodes ?
Datanode configuration ? CPU, Memory
Amount of memory required for namenode ?
My client
Did you ever considered keeping a backup copy of FSImage on a NFS share?
The best practice is to have a reliable NFS storage mounted on the namenode
and instruct the site-xml to keep a copy on the NFS mount. This will prevent
FSImage loss.
-Original Message-
From: Markus Jelsma
I am a newbie to Hadoop and trying to understand how to Size a Hadoop
cluster.
What are factors I should consider deciding the number of datanodes ?
Datanode configuration ? CPU, Memory
Amount of memory required for namenode ?
My client is looking at 1 PB of usable data and will be
Hello,
The MapRunner classes looks promising. I noticed it is in the deprecated
mapred package but I didn't see an equivalent class in the mapreduce
package. Is this going to ported to mapreduce or is it no longer being
supported? Thanks!
~Ed
On Thu, Oct 21, 2010 at 6:36 AM, Harsh J
Just checked the Hadoop 0.21.0 API docs (I was looking in the wrong docs
before) and it doesn't look like MapRunner is deprecated so I'll try
catching the error there and will report back if it's a good solution.
Thanks!
~Ed
On Thu, Oct 21, 2010 at 11:23 AM, ed hadoopn...@gmail.com wrote
class (org.apache.hadoop.mapreduce.MaperKEYIN, VALUEIN, KEYOUT,
VALUEOUT for their mappers.
~Ed
On Thu, Oct 21, 2010 at 12:14 PM, ed hadoopn...@gmail.com wrote:
Just checked the Hadoop 0.21.0 API docs (I was looking in the wrong docs
before) and it doesn't look like MapRunner is deprecated so I'll
Thanks Tom! Didn't see your post before posting =)
On Thu, Oct 21, 2010 at 1:28 PM, ed hadoopn...@gmail.com wrote:
Sorry to keep spamming this thread. It looks like the correct way to
implement MapRunnable using the new mapreduce classes (instead of the
deprecated mapred) is to override
(context, EOFException: Corrupt gzip file + mFileName);
}
}
On Thu, Oct 21, 2010 at 1:29 PM, ed hadoopn...@gmail.com wrote:
Thanks Tom! Didn't see your post before posting =)
On Thu, Oct 21, 2010 at 1:28 PM, ed hadoopn...@gmail.com wrote:
Sorry to keep spamming this thread. It looks
You could also try
job.setNumReduceTasks(yourNumber);
~Ed
On Thu, Oct 21, 2010 at 4:45 PM, Alex Kozlov ale...@cloudera.com wrote:
Hi Matt, it might be that the parameter does not end up in the final
configuration for a number of reasons. Can you check the job config xml in
jt:/var/log
of using LZO compression in Hadoop.
Thank you!
~Ed
our
overriding the run() method and catching the EOFException works beautifully
for processing files that might be corrupt or have errors. Thanks!
~Ed
On Thu, Oct 21, 2010 at 2:07 PM, ed hadoopn...@gmail.com wrote:
I overwrote the run() method in the mapper with a run() method (below
Hi Todd,
I don't have the code in front of me right but I was looking over the API
docs and it looks like I forgot to call close() on the MultipleOutput. I'll
post back if that fixes the problem. If not I'll put together a unit test.
Thanks!
~Ed
On Thu, Oct 21, 2010 at 6:31 PM, Todd Lipcon t
Is there a configuration property or some other mechanism for changing
how reduce tasks report progress? E.g., instead of the shuffle, merge,
and reduce each counting for 33%, can this be changed?
The Hadoop Online Prototype did this, but I can't figure out how.
Ed
I don't think there is a stable CDH3 yet although we've been using CDH3B2
and it has been pretty stable for us. (at least I don't see it available on
their website and they JUST announced CDH3B3 last week at HadoopWorld.
~Ed
On Wed, Oct 20, 2010 at 5:57 AM, Abhinay Mehta abhinay.me
){
//split my key so that the bit flag is removed
//take the modified key and mod it by numPartitions
return the result
}
}
Of course Key and Value would be whatever Key and Value class you're using.
Hope that helps.
~Ed
On Mon, Oct 18, 2010 at 8:58
somewhere for the next reducer making the attempt so it might
be counterproductive to try and eliminate spills.
~Ed
On Tue, Oct 19, 2010 at 8:02 AM, Donovan Hide donovanh...@gmail.com wrote:
Hi,
is there a reason why the io.sort.mb setting is hard-coded to the
maximum of 2047MB?
MapTask.java 789
this exception
and tell hadoop to just ignore the file and move on? I think the exception
is being thrown by the class reading in the Gzip file and not my mapper
class. Is this correct? Is there a way to handle this type of error
gracefully?
Thank you!
~Ed
+);
Sorry that's probably not much help to you.
~Ed
On Wed, Oct 6, 2010 at 8:04 AM, Pramy Bhats pramybh...@googlemail.comwrote:
Hi Ed,
I was using the following file for mapreduce job.
Cloud9/src/dist/edu/umd/cloud9/example/cooccur/ComputeCooccurrenceMatrixStripes.java
thanks,
--Pramod
On Tue
Hello,
I'm using Cloudera's Hadoop CDH3B2--Hadoop-0.20.2+320 (based on Apache
Hadoop 20.2) with Pig 0.7 (from Cloudera's distro).
Thank you!
~Ed
On Wed, Sep 29, 2010 at 11:56 PM, Rohan Rai rohan@inmobi.com wrote:
Hi
Which Hadoop/ PIg version are you using ??
Regards
Rohan
ed
+HDFS
~Ed
On Thu, Sep 30, 2010 at 7:59 AM, Adarsh Sharma adarsh.sha...@orkash.comwrote:
Dear all,
I have set up a Hadoop cluster of 10 nodes.
I want to know that how we can read/write file from HDFS (simple).
Yes I know there are commands, i read the whole HDFS commands.
bin/hadoop
)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Do you think I'm forgetting some required library?
Thank you!
~Ed
On Tue, Sep 28, 2010 at 2:10 PM, ed hadoopn
Thank you Rohan, I really appreciate your help! I'll give it shot and post
back if it works.
~Ed
On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai rohan@inmobi.com wrote:
Just corrected/tested and pushed LzoTokenizedLoader to the personal fork
Hopefully it works now
Regards
Rohan
to make sure hadoop sees your jar and native
library)
Hope that works!
~Ed
On Tue, Sep 28, 2010 at 3:06 PM, Steve Kuo kuosen...@gmail.com wrote:
We have TB worth of XML data in .gz format where each file is about 20 MB.
This dataset is not expected to change. My goal is to write a map-only
Should this be something that needs to be added?
Thank you for the help!
~Ed
On Mon, Sep 27, 2010 at 11:18 AM, Ted Yu yuzhih...@gmail.com wrote:
The setting should be fs.inmemory.size.mb
On Mon, Sep 27, 2010 at 7:15 AM, pig hadoopn...@gmail.com wrote:
HI Sriguru,
Thank you for the tips
Hi,
Your question has an academic sound, so I'll give it an academic answer ;).
Unfortunately, there are not really any good generalized (ie. cross join a
large matrix with a large matrix) methods for doing joins in map-reduce. The
fundamental reason for this is that in the general case you're
. This grouping is
achieved by sorting which means you see keys in increasing order.
Ed
? With it enabled, I've observed that all
tasks associated with a particular JVM go to the same log.
Ed
merged
down to (at most) n files and a final merge goes directly into the
user reduce function.
Ed
On Fri, Mar 5, 2010 at 12:36 AM, prasenjit mukherjee
prasen@gmail.com wrote:
if I understand correctly reduce has 3 stages : copy,sort,reduce. Copy
happens parallely with mappers still
competition might be of interest to you.
Ed
[1] http://sortbenchmark.org/Yahoo2009.pdf
On Sun, Feb 28, 2010 at 1:53 PM, aa...@buffalo.edu wrote:
Hello,
I am trying to write a simple sorting application for hadoop. This is
what
I have thought till now. Suppose I have 100 lines of data
intermediate spills.
To fix this, you can try tuning the per-job configurables io.sort.mb
and io.sort.record.percent. Look at the counters of a few map tasks to
get an idea of how much data (io.sort.mb) and how many records
(io.sort.record.percent) they produce.
Ed
On Wed, Feb 24, 2010 at 2:45 AM
on the reduce side too during the shuffle and multi-pass
merge.
Ed
2010/2/23 Tim Kiefer tim-kie...@gmx.de:
Hi Gang,
thanks for your reply.
To clarify: I look at the statistics through the job tracker. In the
webinterface for my job I have columns for map, reduce and total. What I
was refering
they will all be references to the last item from
the iterator.
Ed
On Mon, Feb 8, 2010 at 12:23 PM, James Hammerton
james.hammer...@mendeley.com wrote:
Hi,
For a particular project I created a writable for holding a long and a
double called LongDoublePair. My mapper outputs LongDoublePair values
org.apache.hadoop.examples.SecondarySort for a nice example. This lets
Hadoop internals do some of the heavy lifting and removes the
requirement that all values for a key fit in memory (though I guess if
you only care about the top 20, your space requirement is still O(1)).
Ed
On Mon, Feb 8, 2010 at 5:58 PM, James Hammerton
some light on why your first attempt failed.
/user/brian/input should be a directory with several xml files.
Ed
On Wed, Feb 3, 2010 at 5:17 PM, Brian Wolf brw...@gmail.com wrote:
Alex Kozlov wrote:
Live Nodes http://localhost:50070/dfshealth.jsp#LiveNodes : 0
You datanode is dead
I tried running 0.20.0 on XP too a few weeks ago and stuck at the same
spot. No problems with standalone mode. Any insight would be
appreciated, thanks.
Ed
On Wed, Jan 27, 2010 at 11:41 AM, Yura Taras yura.ta...@gmail.com wrote:
Hi all
I'm trying to deploy pseudo-distributed cluster on my
between
parts 1 and 2 as the reduce memory buffer fills up, merges, and spills
to disk. There is also overlap between parts 2 and 3 because the final
merge is fed directly into the user reduce function to minimize the
amount of data written to disk.
Ed
On Tue, Jan 26, 2010 at 5:27 PM, adeelmahmood
back into key/value pairs,
unlike the (memory-consuming) ArrayWritable approach.
Ed
of something like Text, IntArrayWritable, is there a way
to build and output the id array without buffering values? The only
alternative I see is to instead use Text, IntWritable and repeat the
term for every doc id, but this seems wasteful.
Ed
that the underlying structure of an HDFS file is a
collection of large blocks (64MB default) and that it is these blocks
that are replicated.
Ed
Hi Huazhong,
Sounds like an interesting application. Here's a few tips.
1. If the frames are not independent, you should find a way to key them
according to their order before dumping them in Hadoop so that they can be
sorted as part of your map reduce task. BTW, the video won't appear split
Last time I checked EMR only runs 0.18.3. You can use EC2 though, which
winds up being cheaper anyways.
On Wed, Dec 16, 2009 at 8:51 PM, 松柳 lamfeeli...@gmail.com wrote:
Hi all, I'm wondering whether Amazon starts to support the newest stable
version of Hadoop, or we can still just use 0.18.3?
One important thing to note is that, with cross products, you'll almost
always get better performance if you can fit both files on a single node's
disk rather than distributing the files.
On Tue, Dec 8, 2009 at 9:18 AM, laser08150815 la...@laserxyz.de wrote:
pmg wrote:
I am evaluating
As far as replication goes, you should look at a project called pastry.
Apparently some people have used hadoop mapreduce on top of it. You will
need to be clever, however, in how you do your mapreduce because you
probably won't want the job to eat all the users cpu time.
On Dec 2, 2009 5:11 PM,
The tool looks interesting. You should consider providing the source for it.
Is it written in a language that can run on platforms besides windows?
On Nov 17, 2009 10:40 AM, Cubic cubicdes...@gmail.com wrote:
Hi list.
This tool is a graphic interface for Hadoop.
It may improove your productivity
Hi,
What you can fit in distributed cache generally depends on the available
disk space on your nodes. With most clusters 300 mb will not be a problem,
but it depends on the cluster and the workload you're processing.
On Sat, Nov 14, 2009 at 10:34 PM, 于凤东 fengdon...@gmail.com wrote:
I have a
53 matches
Mail list logo