use the request column in apache access.log as the source of the Hadoop table

2010-11-23 Thread liad livnat
Hi All I'm facing a problem and need your help. *I would like to use the request column in apache access.log as the source of the Hadoop table.* I was able to insert the entire log table but, I would like to insert a *specific request to specific table* *the question is* : is possible

Example of automatic insertion process from apache access.log using to hadoop table using hive

2010-11-23 Thread liad livnat
Hi, 1. can someone provide me an example for automatic insertion process from apache access.log to hadoop table using hive. 2. can someone explain if there is a way to directly point a directory which will be the data source of hadoop table (ex. copying a file to directory and when i use select

Re: Getting CheckSumException too often

2010-11-23 Thread Steve Loughran
On 22/11/10 11:02, Hari Sreekumar wrote: Hi, What could be the possible reasons for getting too many checksum exceptions? I am getting these kind of exceptions quite frequently, and the whole job fails in the end: org.apache.hadoop.fs.ChecksumException: Checksum error:

MapReduce program unable to find custom Mapper.

2010-11-23 Thread Patil Yogesh
I am trying to run a sample application. But I am getting follwoing error. 10/11/23 07:37:17 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30 10/11/23 07:37:17 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use

Re: MapReduce program unable to find custom Mapper.

2010-11-23 Thread Harsh J
The warning log (WARN) at the top of the outputs explains the answer pretty much :)

use the request column in apache access.log as the source of the Hadoop table

2010-11-23 Thread liad livnat
Hi All I'm facing a problem and need your help. *I would like to use the request column in apache access.log as the source of the Hadoop table.* I was able to insert the entire log table but, I would like to insert a *specific request to specific table* *the question is* : is possible

Is there a single command to start the whole cluster in CDH3 ?

2010-11-23 Thread Ricky Ho
I setup the cluster configuration in masters, slaves, core-site.xml, hdfs-site.xml, mapred-site.xml and copy to all the machines. And I login to one of the machines and use the following to start the cluster. for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done I expect this

Re: Is there a single command to start the whole cluster in CDH3 ?

2010-11-23 Thread Hari Sreekumar
Hi Ricky, Which hadoop version are you using? I am using hadoop-0.20.2 apache version, and I generally just run the $HADOOP_HOME/bin/start-dfs.sh and start-mapred.sh script on my master node. If passwordless ssh is configured, this script will start the required services on each node.

Re: Not a host:port pair: local

2010-11-23 Thread Skye Berghel
On 11/19/2010 10:07 PM, Harsh J wrote: How are you starting your JobTracker by the way? With bin/start-mapred.sh (from the Hadoop installation). --Skye

Re: Is there a single command to start the whole cluster in CDH3 ?

2010-11-23 Thread Ricky Ho
Thanks for pointing me to the right command. I am using the CDH3 distribution. I figure out no matter what I put in the masters file, it always start the NamedNode at the machine where I issue the start-all.sh command. And always start a SecondaryNamedNode in all other machines. Any clue ?

Config

2010-11-23 Thread William
We are currently modifying the configuration of our hadoop grid (250 machines). The machines are homogeneous and the specs are dual quad core cpu 18Gb ram 8x1tb drives currently we have set this up - 8 reduce slots at 800mb 8 map slots at 800mb raised our io.sort.mb to 256mb we see a lot of

Starting Hadoop on OS X fails, nohup issue

2010-11-23 Thread Bryan Keller
I am trying to get Hadoop 0.21.0 running on OS X 10.6.5 in pseudo-distributed mode. I downloaded and extracted the tarball, and I followed the instructions on editing core-site.xml, hdfs-site.xml, and mapred-site.xml. I also set JAVA_HOME in hadoop-env.sh as well as in my .profile. When

How to debug (log4j.properties),

2010-11-23 Thread Tali K
I am trying to debug my map/reduce (Hadoop) app with help of the logging. When I do grep -r in $HADOOP_HOME/logs/* There is no line with debug info found. I need your help. What am I doing wrong? Thanks in advance, Tali In my class I put :

Re: How to debug (log4j.properties),

2010-11-23 Thread Konstantin Boudnik
Line like this log4j.logger.org.apache.hadoop=DEBUG works for 0.20.* and for 0.21+. Therefore it should work for all others :) So, are you trying to see your program's debug or from Hadoop ? -- Cos On Tue, Nov 23, 2010 at 05:59PM, Tali K wrote: I am trying to debug my map/reduce

RE: How to debug (log4j.properties),

2010-11-23 Thread Tali K
Thanks, It worked! So, are you trying to see your program's debug or from Hadoop ? I am printing some values from my Mapper. Date: Tue, 23 Nov 2010 18:26:28 -0800 From: c...@apache.org To: common-user@hadoop.apache.org Subject: Re: How to debug (log4j.properties), Line like this

Re: Is there a single command to start the whole cluster in CDH3 ?

2010-11-23 Thread Hari Sreekumar
Hi Ricky, Yes, that's how it is meant to be. The machine where you run start-dfs.sh will become the namenode, and the machine whihc you specify in you masters file becomes the secondary namenode. Hari On Wed, Nov 24, 2010 at 2:13 AM, Ricky Ho rickyphyl...@yahoo.com wrote: Thanks for

Re: Config

2010-11-23 Thread Yu Li
Hi William, I think the most proper config parameter to try is io.sort.factor, which affects disk spilling times on both map and reduce side. The default value of this parameter is 10, try to enlarge it to 100 or more. If the spilling on reduce side is still frequent you could try tuning up

Re: Is there a single command to start the whole cluster in CDH3 ?

2010-11-23 Thread rahul patodi
hi Hary, when i try to start hadoop daemons by /usr/lib/hadoop# bin/start-dfs.sh on name node it is giving this error:*May not run daemons as root. Please specify HADOOP_NAMENODE_USER(*same for other daemons*)* but when i try to start it using */etc/init.d/hadoop-0.20-namenode start * it* *gets

Re: Is there a single command to start the whole cluster in CDH3 ?

2010-11-23 Thread rahul patodi
hi Ricky, for installing CDH3 you can refer this tutorial: http://cloudera-tutorial.blogspot.com/2010/11/running-cloudera-in-distributed-mode.html all the steps in this tutorial are well tested.(*in case of any query please leave a comment*) On Wed, Nov 24, 2010 at 11:48 AM, rahul patodi

Re: Is there a single command to start the whole cluster in CDH3 ?

2010-11-23 Thread Hari Sreekumar
Hi Raul, I am not sure about CDH, but I have created a separate hadoop user to run my ASF hadoop version, and it works fine. Maybe you can also try creating a new hadoop user, make hadoop the owner of hadoop root directory. HTH, Hari On Wed, Nov 24, 2010 at 11:51 AM, rahul patodi

Re: Is there a single command to start the whole cluster in CDH3 ?

2010-11-23 Thread Todd Lipcon
Hi everyone, Since this question is CDH-specific, it's better to ask on the cdh-user mailing list: https://groups.google.com/a/cloudera.org/group/cdh-user/topics?pli=1 Thanks -Todd On Wed, Nov 24, 2010 at 1:26 AM, Hari Sreekumar hsreeku...@clickable.comwrote: Hi Raul, I am not sure

is HDFS-788 resolved?

2010-11-23 Thread Manhee Jo
Hi there, Is https://issues.apache.org/jira/browse/HDFS-788 resolved? What actually happens if the smaller partition of some datanodes get full while writing a block? Is it possible that the datanodes are recognized as dead making replication storm among some hundreds of machines? Thanks,