Thank you for your time and suggestions, I've already tried starfish, but
not jmap. I'll check it out.
Thanks again,
Mark
On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl wrote:
> I assume you have also just tried running locally and using the jdk
> performance tools (e.g. jmap) to gain insight by c
Varun sorry for my late response. Today I have deployed a new version and I
can confirm that patches you provided works well. I' ve been running some
jobs on a 5node cluster for an hour without a core on full load so now
thinks works as expected.
Thank you again!
I have used just your first optio
Mohit,
I'm positive the real exception lies a few scrolls below that message
on the attempt page. Possibly a class not found issue.
The message you see on top is when something throws up an exception
while being configure()-ed. It is most likely a job config or
setup-time issue from your code or
Thanks for the example. I did look at the logs and also at the admin page
and all I see is the exception that I posted initially.
I am not sure why adding an extra jar to the classpath in DistributedCache
causes that exception. I tried to look at Configuration code in hadoop.util
package but it do
I think I've found the problem. There was one line of code that caused this
issue :) that was output.collect(key, value);
I had to add more logging to the code to get to it. For some reason kill
-QUIT didn't send the stacktrace to the userLogs///syslog , I
searched all the logs and couldn't find
I assume you have also just tried running locally and using the jdk performance
tools (e.g. jmap) to gain insight by configuring hadoop to run absolute minimum
number of tasks?
Perhaps the discussion
http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-t
Hi,
On Wed, Feb 29, 2012 at 19:13, Robert Evans wrote:
> What I really want to know is how well does this new CompressionCodec
> perform in comparison to the regular gzip codec in
various different conditions and what type of impact does it have on
> network traffic and datanode load. My gut
I can't seem to find what's causing this slowness. Nothing in the logs.
It's just painfuly slow. However, pig job is awesome in performance that
has the same logic. Here is the mapper code and the pig code:
*public* *static* *class* Map *extends* MapReduceBase
*implements* Mapper {
*public* *vo
The documentation on Starfish http://www.cs.duke.edu/starfish/index.html
looks promising , I have not used it. I wonder if others on the list have found
it more useful than setting mapred.task.profile.
C
On Feb 29, 2012, at 3:53 PM, Mark question wrote:
> I've used hadoop profiling (.prof) to sho
I've used hadoop profiling (.prof) to show the stack trace but it was hard
to follow. jConsole locally since I couldn't find a way to set a port
number to child processes when running them remotely. Linux commands
(top,/proc), showed me that the virtual memory is almost twice as my
physical which m
Mark,
So if I understand, it is more the memory management that you are interested
in, rather than a need to run an existing C or C++ application in MapReduce
platform?
Have you done profiling of the application?
C
On Feb 29, 2012, at 2:19 PM, Mark question wrote:
> Thanks Charles .. I'm running
Thanks Charles .. I'm running Hadoop for research to perform duplicate
detection methods. To go deeper, I need to understand what's slowing my
program, which usually starts with analyzing memory to predict best input
size for map task. So you're saying piping can help me control memory even
though
Mark,
Both streaming and pipes allow this, perhaps more so pipes at the level of the
mapreduce task. Can you provide more details on the application?
On Feb 29, 2012, at 1:56 PM, Mark question wrote:
> Hi guys, thought I should ask this before I use it ... will using C over
> Hadoop give me the u
Hi guys, thought I should ask this before I use it ... will using C over
Hadoop give me the usual C memory management? For example, malloc() ,
sizeof() ? My guess is no since this all will eventually be turned into
bytecode, but I need more control on memory which obviously is hard for me
to do wit
If many people are going to use it then by all means put it in. If there is
only one person, or a very small handful of people that are going to use it
then I personally would prefer to see it a separate project. However, Edward,
you have convinced me that I am trying to make a logical judgmen
I can do perform HDFS operations from the command line like "hadoop fs -ls
/". Doesn't that meant that the datanode is up?
I am going to try few things today. I have a JAXBContext object that
marshals the xml, this is static instance but my guess at this point is
that since this is in separate jar then the one where job runs and I used
DistributeCache.addClassPath this context is being created on every call
for some re
Too bad we can not up the replication on the first few blocks of the
file or distributed cache it.
The crontrib statement is arguable. I could make a case that the
majority of stuff should not be in hadoop-core. NLineInputFormat for
example, nice to have. Took a long time to get ported to the new
I can see a use for it, but I have two concerns about it. My biggest concern
is maintainability. We have had lots of things get thrown into contrib in the
past, very few people use them, and inevitably they start to suffer from bit
rot. I am not saying that it will happen with this, but if yo
Hi,
On Wed, Feb 29, 2012 at 16:52, Edward Capriolo wrote:
...
> But being able to generate split info for them and processing them
> would be good as well. I remember that was a hot thing to do with lzo
> back in the day. The pain of once overing the gz files to generate the
> split info is detra
Hi,
On Wed, Feb 29, 2012 at 13:10, Michel Segel wrote:
> Let's play devil's advocate for a second?
>
I always like that :)
> Why?
Because then datafiles from other systems (like the Apache HTTP webserver)
can be processed without preprocessing more efficiently.
Snappy exists.
>
Compared to
Mike,
Snappy is cool and all, but I was not overly impressed with it.
GZ zipps much better then Snappy. Last time I checked for our log file
gzip took them down from 100MB-> 40MB, while snappy compressed them
from 100MB->55MB. That was only with sequence files. But still that is
pretty significan
Yes this is fine to do. TTs are not dependent on co-located DNs, but
only benefit if they are.
On Wed, Feb 29, 2012 at 8:14 PM, Daniel Baptista
wrote:
> Forgot to mention that I am using Hadoop 0.20.2
>
> From: Daniel Baptista
> Sent: 29 February 2012 14:44
> To: common-user@hadoop.apache.org
> S
Forgot to mention that I am using Hadoop 0.20.2
From: Daniel Baptista
Sent: 29 February 2012 14:44
To: common-user@hadoop.apache.org
Subject: TaskTracker without datanode
Hi All,
I was wondering (network traffic considerations aside) is it possible to run a
TaskTracker without a DataNode. I was
Hi All,
I was wondering (network traffic considerations aside) is it possible to run a
TaskTracker without a DataNode. I was hoping to test this method as a means of
scaling processing power temporarily.
Are there better approaches, I don't (currently) need the additional storage
that a DataNo
How can I set the fair scheduler such that all jobs submitted from a
particular user group go to a pool with the group name?
I have setup fair scheduler and I have two users: A and B (belonging to the
user group hadoop)
When these users submit hadoop jobs, the jobs from A got to a pool named A
an
Let's play devil's advocate for a second?
Why? Snappy exists.
The only advantage is that you don't have to convert from gzip to snappy and
can process gzip files natively.
Next question is how large are the gzip files in the first place?
I don't disagree, I just want to have a solid argument in
Cross-posting with common-user, since there's only little activity on hdfs-user
these last days.
Evert
> Hi list,
>
> I'm having trouble starting up a DN (0.20.2) with Kerberos
> authentication and SSL enabled - I'm getting a NullPointerException
> during startup and the daemon exists. It's a b
28 matches
Mail list logo