Hi Varun Kapoor,
I've been trying to debug but unfortunately, I get the No stack.
message from the gdb.
Here're the details:
* Hadoop: 1.0
* Ganglia: 3.1.7
# gdb /usr/sbin/gmetad
GNU gdb (GDB) CentOS (7.0.1-42.el5.centos)
Copyright (C) 2009 Free Software Foundation, Inc.
Varun sorry for my late response. Today I have deployed a new version and I
can confirm that patches you provided works well. I' ve been running some
jobs on a 5node cluster for an hour without a core on full load so now
thinks works as expected.
Thank you again!
I have used just your first
The warnings about underflow are totally expected (they come from strtod(),
and they will no longer occur with Hadoop-1.0.1, which applies my patch
from HADOOP-8052), so that's not worrisome.
As for the buffer overflow, do you think you could show me a backtrace of
this core? If you can't find
Well rebuilding ganglia seemed easier and Merto was testing the other so i
though that i should give that one a chance :)
anyway i will send you gdb details or patch hadoop and try it at my
earliest convenience
Cheers
On Wed, Feb 15, 2012 at 6:59 PM, Varun Kapoor rez...@hortonworks.comwrote:
Hello Varun,
i have patched and recompiled ganglia from source bit it still cores after
the patch.
Here are some logs:
Feb 15 09:39:14 master gmetad[16487]: RRD_update
(/var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd):
Hey Merto,
Any luck getting the patch running on your cluster?
In case you're interested, there's now a JIRA for this:
https://issues.apache.org/jira/browse/HADOOP-8052.
Varun
On Wed, Feb 8, 2012 at 7:45 PM, Varun Kapoor rez...@hortonworks.com wrote:
Your general procedure sounds correct
Varun unfortunately I have had some problems with deploying a new version
on the cluster.. Hadoop is not picking the new build in lib folder despite
a classpath is set to it. The new build is picked just if I put it in the
$HD_HOME/share/hadoop/, which is very strange.. I've done this on all nodes
I will need your help. Please confirm if the following procedure is right.
I have a dev environment where I pimp my scheduler (no hadoop running) and
a small cluster environment where the changes(jars) are deployed with some
scripts, however I have never compiled the whole hadoop from source so I
Your general procedure sounds correct (i.e. dropping your newly built .jar
into $HD_HOME/lib/), but to make sure it's getting picked up, you should
explicitly add $HD_HOME/lib/ to your exported HADOOP_CLASSPATH environment
variable; here's mine, as an example:
export
I'm sorry to hear that gmetad cores continuously for you guys. Since I'm
not seeing that behavior, I'm going to just put out the 2 possible patches
you could apply and wait to hear back from you. :)
Option 1
* Apply gmetadBufferOverflow.Hadoop.patch to the relevant file (
Varun, have I missed your link to the patches? I have tried to search them
on jira but I did not find them.. Can you repost the link for these two
patches?
Thank you..
On 7 February 2012 20:36, Varun Kapoor rez...@hortonworks.com wrote:
I'm sorry to hear that gmetad cores continuously for you
I'm so sorry, Merto - like a silly goose, I attached the 2 patches to my
reply, and of course the mailing list did not accept the attachment.
I plan on opening JIRAs for this tomorrow, but till then, here are links to
the 2 patches (from my Dropbox account):
-
I assume you have seen the following information on Hadoop twiki,
http://wiki.apache.org/hadoop/GangliaMetrics
So do you use GangliaContext31 in hadoop-metrics2.properties?
We use Ganglia 3.2 with Hadoop 20.205 and works fine (I remember seeing
gmetad sometime goes down due to buffer overflow
Hello,
i also face this issue when using GangliaContext31 and hadoop-1.0.0, and
ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer overflows as
soon as i restart the gmetad.
Regards
Mete
On Mon, Feb 6, 2012 at 7:42 PM, Vitthal Suhas Gogate
gog...@hortonworks.com wrote:
I assume you
Yes I am encoutering the same problems and like Mete said few seconds
after restarting a segmentation fault appears.. here is my conf..
http://pastebin.com/VgBjp08d
And here are some info from /var/log/messages (ubuntu server 10.10):
kernel: [424447.140641] gmetad[26115] general protection
Hey Merto,
I've been digging into this problem since Sunday, and believe I may have
root-caused it.
I'm using ganglia-3.2.0, rrdtool-1.4.5 and
http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/ (which I
believe should be running essentially the identical relevant code as
0.20.205).
I have tried to run it but it repeats crashing..
- When you start gmetad and Hadoop is not emitting metrics, everything
is peachy.
Right, running just ganglia without running hadoop jobs seems stable for at
least a day..
- When you start Hadoop (and it thus starts emitting metrics),
Same with Merto's situation here, it always overflows short time after the
restart. Without the hadoop metrics enabled everything is smooth.
Regards
Mete
On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek masmer...@gmail.com wrote:
I have tried to run it but it repeats crashing..
- When you start
I spent a lot of time to figure it out however i did not find a solution.
Problems from the logs pointed me for some bugs in rrdupdate tool, however
i tried to solve it with different versions of ganglia and rrdtool but the
error is the same. Segmentation fault appears after the following lines,
I would be glad to hear that too.. I've setup the following:
Hadoop 0.20.205
Ganglia Front 3.1.7
Ganglia Back *(gmetad)* 3.1.7
RRDTool http://www.rrdtool.org/ 1.4.5. - i had some troubles installing
1.4.4
Ganglia works just in case hadoop is not running, so metrics are not
publshed to gmetad
20 matches
Mail list logo