RE: [hive-users] Questions regarding Hive metadata schema

2008-10-08 Thread Joydeep Sen Sarma
There is a quite a bit of difference in the scope (no pun) of these different interfaces. The SCOPE paper says rows are sets of typed columns (and the paper's examples demo that). Hive's SerDe/ObjectInspector interfaces allow plugging in objects with arbitrary levels of nesting and map/array

Re: The statistical spam filtering

2008-10-08 Thread Edward J. Yoon
Steve, Thanks for your information!! I examined about the bayseian filtering, and I can easily test it on the distributed system -- map/reduce is easy. See http://blog.udanax.org/2008/10/parallel-bayesian-spam-filtering-using.html /Edward On Mon, Sep 22, 2008 at 7:21 PM, Steve Loughran [EMAIL

Re: dual core configuration

2008-10-08 Thread Taeho Kang
First of all, mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum are both set to 2 in hadoop-default.xml file; this file is read before hadoop-site.xml file so any properties that aren't set in hadoop-site.xml will follow the values set in hadoop-default.xml. As for

Official group blog of the hadoop user/dev group?

2008-10-08 Thread Edward J. Yoon
If we have a group blog of the hadoop user/dev group such as a Y! developer network, we can easily share/introduce our experience and outcomes from our research. So, I thought about a group blog, I guess there are plenty of contributors. What do you think about it? -- Best regards, Edward J.

Re: nagios to monitor hadoop datanodes!

2008-10-08 Thread Edward Capriolo
The simple way would be use use nrpe and check_proc. I have never tested, but a command like 'ps -ef | grep java | grep NameNode' would be a fairly decent check. That is not very robust but it should let you know if the process is alive. You could also monitor the web interfaces associated with

Re: nagios to monitor hadoop datanodes!

2008-10-08 Thread Brian Bockelman
Hey Edward, The JMX documentation for Hadoop is non-existent, but here's about what you need to do: 1) download and install the check_jmx Nagios plugin 2) Open up the hadoop JMX install to the outside world. I added the following lines to hadoop-env.sh export HADOOP_OPTS=

Re: Official group blog of the hadoop user/dev group?

2008-10-08 Thread Steve Loughran
Edward J. Yoon wrote: If we have a group blog of the hadoop user/dev group such as a Y! developer network, we can easily share/introduce our experience and outcomes from our research. So, I thought about a group blog, I guess there are plenty of contributors. What do you think about it?

Re: nagios to monitor hadoop datanodes!

2008-10-08 Thread Steve Loughran
Edward Capriolo wrote: The simple way would be use use nrpe and check_proc. I have never tested, but a command like 'ps -ef | grep java | grep NameNode' would be a fairly decent check. That is not very robust but it should let you know if the process is alive. You could also monitor the web

Re: dual core configuration

2008-10-08 Thread Alex Loddengaard
Elia, perhaps you can try changing mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum to 4 in hadoop-site.xml in hopes of getting better utilization. It's strange to me that having these both set to 2 only utilizes a single core, because I would imagine that any

Re: Official group blog of the hadoop user/dev group?

2008-10-08 Thread Lukáš Vlček
Hi, Well, not a bad idea I think. But isn't wiki a better tool to catch and shape collective knowledge? Lukas On Wed, Oct 8, 2008 at 5:39 PM, Steve Loughran [EMAIL PROTECTED] wrote: Edward J. Yoon wrote: If we have a group blog of the hadoop user/dev group such as a Y! developer network, we

Re: dual core configuration

2008-10-08 Thread Elia Mazzawi
false alarm guys, thanks for the replies, I do have 2 set as the task maximum, and it is utilizing 2 cores according to top. I must have caught it in between tasks or during the reduce, since i had only 1 reducer per node going on at the time. hadoop-default.xml: property

Hadoop Profiling!

2008-10-08 Thread Gerardo Velez
Hi! I've developed a Map/Reduce algorithm to analyze some logs from web application. So basically, we are ready to start QA test phase, so now, I would like to now how efficient is my application from performance point of view. So is there any procedure I could use to do some profiling?

Re: nagios to monitor hadoop datanodes!

2008-10-08 Thread Edward Capriolo
That all sounds good. By 'quick hack' I meant 'check_tcp' was not good enough because an open TCP socket does not prove much. However, if the page returns useful attributes that show cluster is alive that is great and easy. Come to think of it you can navigate the dfshealth page and get useful

Re: Hadoop Profiling!

2008-10-08 Thread Stefan Groschupf
Just run your map reduce job local and connect your profiler. I use yourkit. Works great! You can profile your map reduce job running the job in local mode as ant other java app as well. However we also profiled in a grid. You just need to install the yourkit agent into the jvm of the node

Re: Hadoop Profiling!

2008-10-08 Thread Ashish Venugopal
Are you interested in simply profiling your own code (in which case you can clearly use what ever java profiler you want), or your construction of the MapReduce job, ie how much time is being spent in the Map vs the sort vs the shuffle vs the Reduce. I am not aware of a good solution to the

Re: Hadoop Profiling!

2008-10-08 Thread Ashish Venugopal
Great, thanks for this info, is there any chance that this information can also be exposed for streaming jobs as well? (All of the jobs that we run in our lab are only via streaming...) Thanks! Ashish On Wed, Oct 8, 2008 at 12:30 PM, George Porter [EMAIL PROTECTED]wrote: Hi Ashish, I

Re: architecture diagram

2008-10-08 Thread Alex Loddengaard
Glad we could help, Terrence. The second pivot might be tricky; you may have to run a second iteration. I haven't thought the problem all the way through, though. Good luck. Alex On Wed, Oct 8, 2008 at 1:02 PM, Terrence A. Pietrondi [EMAIL PROTECTED] wrote: I think I can figure this out

shipping streaming libraries with cacheArchive

2008-10-08 Thread Karl Anderson
Has anybody been able to ship a hadoop streaming library using cacheArchive? I am able to see my unjarred archive from my mapper, but I'm not able to import Python files within it. As a test, I'm jarring up a test directory and putting it on the HDFS: [EMAIL PROTECTED] ~]# ls jar_test

Re: Official group blog of the hadoop user/dev group?

2008-10-08 Thread Edward J. Yoon
Oh, Great!! Now I did know that. :) On Thu, Oct 9, 2008 at 12:39 AM, Steve Loughran [EMAIL PROTECTED] wrote: Edward J. Yoon wrote: If we have a group blog of the hadoop user/dev group such as a Y! developer network, we can easily share/introduce our experience and outcomes from our research.

Re: Official group blog of the hadoop user/dev group?

2008-10-08 Thread Edward J. Yoon
Well, not a bad idea I think. But isn't wiki a better tool to catch and shape collective knowledge? Yes, but I think some stuff (e.g. news-tic information, ) aren't publishable on wiki. On Thu, Oct 9, 2008 at 1:15 AM, Lukáš Vlček [EMAIL PROTECTED] wrote: Hi, Well, not a bad idea I think.

Cannot run program bash: java.io.IOException: error=12, Cannot allocate memory

2008-10-08 Thread Edward J. Yoon
Hi, I received below message. Can anyone explain this? 08/10/09 11:53:33 INFO mapred.JobClient: Task Id : task_200810081842_0004_m_00_0, Status : FAILED java.io.IOException: Cannot run program bash: java.io.IOException: error=12, Cannot allocate memory at