There is a quite a bit of difference in the scope (no pun) of these different
interfaces. The SCOPE paper says rows are sets of typed columns (and the
paper's examples demo that). Hive's SerDe/ObjectInspector interfaces allow
plugging in objects with arbitrary levels of nesting and map/array
Steve, Thanks for your information!!
I examined about the bayseian filtering, and I can easily test it on
the distributed system -- map/reduce is easy.
See http://blog.udanax.org/2008/10/parallel-bayesian-spam-filtering-using.html
/Edward
On Mon, Sep 22, 2008 at 7:21 PM, Steve Loughran [EMAIL
First of all, mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum are both set to 2 in
hadoop-default.xml file; this file is read before hadoop-site.xml file so
any properties that aren't set in hadoop-site.xml will follow the values set
in hadoop-default.xml.
As for
If we have a group blog of the hadoop user/dev group such as a Y!
developer network, we can easily share/introduce our experience and
outcomes from our research. So, I thought about a group blog, I guess
there are plenty of contributors.
What do you think about it?
--
Best regards, Edward J.
The simple way would be use use nrpe and check_proc. I have never
tested, but a command like 'ps -ef | grep java | grep NameNode' would
be a fairly decent check. That is not very robust but it should let
you know if the process is alive.
You could also monitor the web interfaces associated with
Hey Edward,
The JMX documentation for Hadoop is non-existent, but here's about
what you need to do:
1) download and install the check_jmx Nagios plugin
2) Open up the hadoop JMX install to the outside world. I added the
following lines to hadoop-env.sh
export HADOOP_OPTS=
Edward J. Yoon wrote:
If we have a group blog of the hadoop user/dev group such as a Y!
developer network, we can easily share/introduce our experience and
outcomes from our research. So, I thought about a group blog, I guess
there are plenty of contributors.
What do you think about it?
Edward Capriolo wrote:
The simple way would be use use nrpe and check_proc. I have never
tested, but a command like 'ps -ef | grep java | grep NameNode' would
be a fairly decent check. That is not very robust but it should let
you know if the process is alive.
You could also monitor the web
Elia, perhaps you can try changing mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum to 4 in hadoop-site.xml in
hopes of getting better utilization. It's strange to me that having these
both set to 2 only utilizes a single core, because I would imagine that any
Hi,
Well, not a bad idea I think. But isn't wiki a better tool to catch and
shape collective knowledge?
Lukas
On Wed, Oct 8, 2008 at 5:39 PM, Steve Loughran [EMAIL PROTECTED] wrote:
Edward J. Yoon wrote:
If we have a group blog of the hadoop user/dev group such as a Y!
developer network, we
false alarm guys, thanks for the replies,
I do have 2 set as the task maximum, and it is utilizing 2 cores
according to top.
I must have caught it in between tasks or during the reduce, since i had
only 1 reducer per node going on at the time.
hadoop-default.xml:
property
Hi!
I've developed a Map/Reduce algorithm to analyze some logs from web
application.
So basically, we are ready to start QA test phase, so now, I would like to
now how efficient is my application
from performance point of view.
So is there any procedure I could use to do some profiling?
That all sounds good. By 'quick hack' I meant 'check_tcp' was not
good enough because an open TCP socket does not prove much. However,
if the page returns useful attributes that show cluster is alive that
is great and easy.
Come to think of it you can navigate the dfshealth page and get useful
Just run your map reduce job local and connect your profiler. I use
yourkit.
Works great!
You can profile your map reduce job running the job in local mode as
ant other java app as well.
However we also profiled in a grid. You just need to install the
yourkit agent into the jvm of the node
Are you interested in simply profiling your own code (in which case you can
clearly use what ever java profiler you want), or your construction of the
MapReduce job, ie how much time is being spent in the Map vs the sort vs
the shuffle vs the Reduce. I am not aware of a good solution to the
Great, thanks for this info, is there any chance that this information can
also be exposed for streaming jobs as well?
(All of the jobs that we run in our lab are only via streaming...)
Thanks!
Ashish
On Wed, Oct 8, 2008 at 12:30 PM, George Porter [EMAIL PROTECTED]wrote:
Hi Ashish,
I
Glad we could help, Terrence. The second pivot might be tricky; you may
have to run a second iteration. I haven't thought the problem all the way
through, though.
Good luck.
Alex
On Wed, Oct 8, 2008 at 1:02 PM, Terrence A. Pietrondi [EMAIL PROTECTED]
wrote:
I think I can figure this out
Has anybody been able to ship a hadoop streaming library using
cacheArchive? I am able to see my unjarred archive from my mapper,
but I'm not able to import Python files within it.
As a test, I'm jarring up a test directory and putting it on the HDFS:
[EMAIL PROTECTED] ~]# ls jar_test
Oh, Great!! Now I did know that. :)
On Thu, Oct 9, 2008 at 12:39 AM, Steve Loughran [EMAIL PROTECTED] wrote:
Edward J. Yoon wrote:
If we have a group blog of the hadoop user/dev group such as a Y!
developer network, we can easily share/introduce our experience and
outcomes from our research.
Well, not a bad idea I think. But isn't wiki a better tool to catch and
shape collective knowledge?
Yes, but I think some stuff (e.g. news-tic information, ) aren't
publishable on wiki.
On Thu, Oct 9, 2008 at 1:15 AM, Lukáš Vlček [EMAIL PROTECTED] wrote:
Hi,
Well, not a bad idea I think.
Hi,
I received below message. Can anyone explain this?
08/10/09 11:53:33 INFO mapred.JobClient: Task Id :
task_200810081842_0004_m_00_0, Status : FAILED
java.io.IOException: Cannot run program bash: java.io.IOException:
error=12, Cannot allocate memory
at
21 matches
Mail list logo