Hi All
I'm facing a problem and need your help.
*I would like to use the request column in apache access.log as the source
of the Hadoop table.*
I was able to insert the entire log table but, I would like to insert
a *specific
request to specific table* *the question is* : is possible
Hi,
1. can someone provide me an example for automatic insertion process from
apache access.log to hadoop table using hive.
2. can someone explain if there is a way to directly point a directory which
will be the data source of hadoop table (ex. copying a file to directory and
when i use select
On 22/11/10 11:02, Hari Sreekumar wrote:
Hi,
What could be the possible reasons for getting too many checksum
exceptions? I am getting these kind of exceptions quite frequently, and the
whole job fails in the end:
org.apache.hadoop.fs.ChecksumException: Checksum error:
I am trying to run a sample application. But I am getting follwoing error.
10/11/23 07:37:17 INFO security.Groups: Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=30
10/11/23 07:37:17 WARN conf.Configuration: mapred.task.id is deprecated.
Instead, use
The warning log (WARN) at the top of the outputs explains the answer pretty
much :)
Hi All
I'm facing a problem and need your help.
*I would like to use the request column in apache access.log as the source
of the Hadoop table.*
I was able to insert the entire log table but, I would like to insert
a *specific
request to specific table* *the question is* : is possible
I setup the cluster configuration in masters, slaves, core-site.xml,
hdfs-site.xml, mapred-site.xml and copy to all the machines.
And I login to one of the machines and use the following to start the cluster.
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done
I expect this
Hi Ricky,
Which hadoop version are you using? I am using hadoop-0.20.2 apache
version, and I generally just run the $HADOOP_HOME/bin/start-dfs.sh and
start-mapred.sh script on my master node. If passwordless ssh is configured,
this script will start the required services on each node.
On 11/19/2010 10:07 PM, Harsh J wrote:
How are you starting your JobTracker by the way?
With bin/start-mapred.sh (from the Hadoop installation).
--Skye
Thanks for pointing me to the right command. I am using the CDH3 distribution.
I figure out no matter what I put in the masters file, it always start the
NamedNode at the machine where I issue the start-all.sh command. And always
start a SecondaryNamedNode in all other machines. Any clue ?
We are currently modifying the configuration of our hadoop grid (250
machines). The machines are homogeneous and the specs are
dual quad core cpu 18Gb ram 8x1tb drives
currently we have set this up -
8 reduce slots at 800mb
8 map slots at 800mb
raised our io.sort.mb to 256mb
we see a lot of
I am trying to get Hadoop 0.21.0 running on OS X 10.6.5 in pseudo-distributed
mode. I downloaded and extracted the tarball, and I followed the instructions
on editing core-site.xml, hdfs-site.xml, and mapred-site.xml. I also set
JAVA_HOME in hadoop-env.sh as well as in my .profile. When
I am trying to debug my map/reduce (Hadoop) app with help of the logging.
When I do grep -r in $HADOOP_HOME/logs/*
There is no line with debug info found.
I need your help. What am I doing wrong?
Thanks in advance,
Tali
In my class I put :
Line like this
log4j.logger.org.apache.hadoop=DEBUG
works for 0.20.* and for 0.21+. Therefore it should work for all others :)
So, are you trying to see your program's debug or from Hadoop ?
--
Cos
On Tue, Nov 23, 2010 at 05:59PM, Tali K wrote:
I am trying to debug my map/reduce
Thanks,
It worked!
So, are you trying to see your program's debug or from Hadoop ?
I am printing some values from my Mapper.
Date: Tue, 23 Nov 2010 18:26:28 -0800
From: c...@apache.org
To: common-user@hadoop.apache.org
Subject: Re: How to debug (log4j.properties),
Line like this
Hi Ricky,
Yes, that's how it is meant to be. The machine where you run
start-dfs.sh will become the namenode, and the machine whihc you specify in
you masters file becomes the secondary namenode.
Hari
On Wed, Nov 24, 2010 at 2:13 AM, Ricky Ho rickyphyl...@yahoo.com wrote:
Thanks for
Hi William,
I think the most proper config parameter to try is io.sort.factor, which
affects disk spilling times on both map and reduce side. The default value
of this parameter is 10, try to enlarge it to 100 or more.
If the spilling on reduce side is still frequent you could try tuning
up
hi Hary,
when i try to start hadoop daemons by /usr/lib/hadoop# bin/start-dfs.sh on
name node it is giving this error:*May not run daemons as root. Please
specify HADOOP_NAMENODE_USER(*same for other daemons*)*
but when i try to start it using */etc/init.d/hadoop-0.20-namenode start *
it* *gets
hi Ricky,
for installing CDH3 you can refer this tutorial:
http://cloudera-tutorial.blogspot.com/2010/11/running-cloudera-in-distributed-mode.html
all the steps in this tutorial are well tested.(*in case of any query please
leave a comment*)
On Wed, Nov 24, 2010 at 11:48 AM, rahul patodi
Hi Raul,
I am not sure about CDH, but I have created a separate hadoop
user to run my ASF hadoop version, and it works fine. Maybe you can also try
creating a new hadoop user, make hadoop the owner of hadoop root directory.
HTH,
Hari
On Wed, Nov 24, 2010 at 11:51 AM, rahul patodi
Hi everyone,
Since this question is CDH-specific, it's better to ask on the cdh-user
mailing list:
https://groups.google.com/a/cloudera.org/group/cdh-user/topics?pli=1
Thanks
-Todd
On Wed, Nov 24, 2010 at 1:26 AM, Hari Sreekumar hsreeku...@clickable.comwrote:
Hi Raul,
I am not sure
Hi there,
Is
https://issues.apache.org/jira/browse/HDFS-788
resolved?
What actually happens if the smaller partition of some datanodes get full
while writing a block?
Is it possible that the datanodes are recognized as dead making replication
storm among some hundreds of machines?
Thanks,
22 matches
Mail list logo