Re: Tasktracker fails

2012-02-22 Thread Adarsh Sharma
Any update on the below issue. Thanks Adarsh Sharma wrote: Dear all, Today I am trying to configure hadoop-0.20.205.0 on a 4 node Cluster. When I start my cluster , all daemons got started except tasktracker, don't know why task tracker fails due to following error logs. Cluster is in

Hadoop MR jobs failed: Owner 'uid' for path */jobcache/job_*/attempt_*/output/file.out.index did not match expected owner 'username'

2012-02-22 Thread Dirk Meister
Hello Hadoop mailinglist, we have problems running a Hadoop M/R Job on HDFS. It is a 2-node test system using 0.20.203 using a PIG script. The map tasks run through, but most job attempt outputs of one of the machines are rejected by the reducer and rescheduled. This is the stack trace/error

Re: HBase/HDFS very high iowait

2012-02-22 Thread Per Steffensen
Observe about 50% iowait before even starting clients - that is when there is actually no load from clients on the system. So only internal stuff in HBase/HDFS can cause this - HBase compaction? HDFS? Regards, Per Steffensen Per Steffensen skrev: Hi We have a system a.o. with a HBase

Re: HBase/HDFS very high iowait

2012-02-22 Thread Per Steffensen
Per Steffensen skrev: Observe about 50% iowait before even starting clients - that is when there is actually no load from clients on the system. So only internal stuff in HBase/HDFS can cause this - HBase compaction? HDFS? Ahh ok, that was only for half a minute after restart. So basically down

Security at file level in Hadoop

2012-02-22 Thread Shreya.Pal
Hi I want to implement security at file level in Hadoop, essentially restricting certain data to certain users. Ex - File A can be accessed only by a user X File B can be accessed by only user X and user Y Is this possible in Hadoop, how do we do it? At what level are these permissions

Security at file level in Hadoop

2012-02-22 Thread Shreya.Pal
Hi I want to implement security at file level in Hadoop, essentially restricting certain data to certain users. Ex - File A can be accessed only by a user X File B can be accessed by only user X and user Y Is this possible in Hadoop, how do we do it? At what level are these permissions

Re: Security at file level in Hadoop

2012-02-22 Thread Ben Smithers
Hi Shreya, A permissions guide for HDFS is available at: http://hadoop.apache.org/common/docs/current/hdfs_permissions_guide.html The permissions system is much the same as unix-like systems with users and groups. Though I have not worked with this, I think it is likely that all permissions will

Re: Security at file level in Hadoop

2012-02-22 Thread praveenesh kumar
You can probably use hadoop fs - chmod permission filename as suggested above. You can provide r/w permissions as you provide for general unix files. Can you please share your experiences on this thing ? Thanks, Praveenesh On Wed, Feb 22, 2012 at 4:37 PM, Ben Smithers

Re: Optimized Hadoop

2012-02-22 Thread Dieter Plaetinck
Great work folks! Very interesting. PS: did you notice if you google for hanborq or HDH it's very hard to find your website, hanborq.com ? Dieter On Tue, 21 Feb 2012 02:17:31 +0800 Schubert Zhang zson...@gmail.com wrote: We just update the slides of this improvements:

mapred.map.tasks and mapred.reduce.tasks parameter meaning

2012-02-22 Thread sangroya
Hello, Could someone please help me to understand these configuration parameters in depth. mapred.map.tasks and mapred.reduce.tasks It is mentioned that default value of these parameters is 2 and 1. *What does it mean?* Does it mean 2 maps and 1 reduce per node. Does it mean 2 maps and 1

Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning

2012-02-22 Thread Harsh J
Amit, On Wed, Feb 22, 2012 at 5:08 PM, sangroya sangroyaa...@gmail.com wrote: Hello, Could someone please help me to understand these configuration parameters in depth. mapred.map.tasks and mapred.reduce.tasks It is mentioned that default value of these parameters is 2 and 1. *What does

Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning

2012-02-22 Thread praveenesh kumar
If I am correct : For setting mappers/node --- mapred.tasktracker.map.tasks.maximum For setting reducers/node --- mapred.tasktracker.reduce.tasks.maximum For setting mappers/job mapred.map.tasks (applicable for whole cluster) For setting reducers/job mapred.reduce.tasks(same) You can

Re: Changing into Replication factor

2012-02-22 Thread Harsh J
Hi, You need to use hadoop fs -setrep to change replication of existing files. See the manual at http://hadoop.apache.org/common/docs/r0.20.2/hdfs_shell.html#setrep on how to use it. On Wed, Feb 22, 2012 at 1:03 PM, hadoop hive hadooph...@gmail.com wrote: HI Folks, Rite now i m having

Re: Security at file level in Hadoop

2012-02-22 Thread Joey Echeverria
HDFS supports POSIX style file and directory permissions (read, write, execute) for the owner, group and world. You can change the permissions with hadoop fs -chmod permissions path -Joey On Feb 22, 2012, at 5:32, shreya@cognizant.com wrote: Hi I want to implement security at

Re: Tasktracker fails

2012-02-22 Thread Merto Mertek
Hm.. I would try first to stop all the deamons wtih $haddop_home/bin/stop-all.sh. Afterwards check that on the master and one of the slaves no deamons are running (jps). Maybe you could try to check if your conf on tasktrackers for the jobtracker is pointing to the right place (mapred-site.xml).

ClassNotFoundException: -libjars not working?

2012-02-22 Thread Ioan Eugen Stan
Hello, I'm trying to run a map-reduce job and I get ClassNotFoundException, but I have the class submitted with -libjars. What's wrong with how I do things? Please help. I'm running hadoop-0.20.2-cdh3u1, and I have everithing on the -libjars line. The job is submitted via a java app like:

Re: Security at file level in Hadoop

2012-02-22 Thread Praveen Sripati
According to this (http://goo.gl/rfwy4) Prior to 0.22, Hadoop uses the 'whoami' and id commands to determine the user and groups of the running process. How does this work now? Praveen On Wed, Feb 22, 2012 at 6:03 PM, Joey Echeverria j...@cloudera.com wrote: HDFS supports POSIX style file

Backupnode in 1.0.0?

2012-02-22 Thread Jeremy Hansen
It looks as if backupnode isn't supported in 1.0.0? Any chances it's in 1.0.1? Thanks -jeremy

Runtime Comparison of Hadoop 0.21.0 and 1.0.1

2012-02-22 Thread Geoffry Roberts
All, I saw the announcement that hadoop 1.0.1 micro release was available. I have been waiting for this because I need the MutipleOutputs capability, which 1.0.0 didn't support. I grabbed a copy of the release candidate. I was happy to see that the directory structure once again conforms

Re: Splitting files on new line using hadoop fs

2012-02-22 Thread bejoy . hadoop
Hi Mohit AFAIK there is no default mechanism available for the same in hadoop. File is split into blocks just based on the configured block size during hdfs copy. While processing the file using Mapreduce the record reader takes care of the new lines even if a line spans across multiple

Re: Tasktracker fails

2012-02-22 Thread Merto Mertek
I do not know how the distribution and splitting of deflate files exactly works if that is your question but probably you will find something useful in *Codec classes, where are located implementations of few compression formats. Deflate files are just a type of compression files that you can use

Re: Splitting files on new line using hadoop fs

2012-02-22 Thread Mohit Anchlia
On Wed, Feb 22, 2012 at 12:23 PM, bejoy.had...@gmail.com wrote: Hi Mohit AFAIK there is no default mechanism available for the same in hadoop. File is split into blocks just based on the configured block size during hdfs copy. While processing the file using Mapreduce the record

Re: Splitting files on new line using hadoop fs

2012-02-22 Thread bejoy . hadoop
Hi Mohit I'm not an expert in pig and it'd be better using the pig user group for pig specific queries. I'd try to help you with some basic trouble shooting of the same It sounds strange that pig's XML Loader can't load larger XML files that consists of multiple blocks. Or is it like,

Re: OSX starting hadoop error

2012-02-22 Thread Bryan Keller
For those interested, you can prevent this error by setting the following in hadoop-env.sh: export HADOOP_OPTS=-Djava.security.krb5.realm= -Djava.security.krb5.kdc= On Jul 28, 2011, at 11:51 AM, Bryan Keller wrote: FYI, I logged a bug for this:

Re: Backupnode in 1.0.0?

2012-02-22 Thread Joey Echeverria
Check out the Apache Bigtop project. I believe they have 0.22 RPMs. Out of curiosity, why are you interested in BackupNode? -Joey Sent from my iPhone On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote: Any possibility of getting spec files to create packages for 0.22?

Re: Splitting files on new line using hadoop fs

2012-02-22 Thread Mohit Anchlia
Thanks I did post this question to that group. All xml document are separated by a new line so that shouldn't be the issue, I think. On Wed, Feb 22, 2012 at 12:44 PM, bejoy.had...@gmail.com wrote: ** Hi Mohit I'm not an expert in pig and it'd be better using the pig user group for pig

RE: Dynamic changing of slaves

2012-02-22 Thread kaveh
sounds like what you r looking for is a custom scheduler. along the line of !--property namemapred.jobtracker.taskScheduler/name valueorg.apache.hadoop.mapred.FairScheduler/value /property-- obviously not the FairScheduler, but it could give u some idea -Original Message- From:

Re: Backupnode in 1.0.0?

2012-02-22 Thread Jeremy Hansen
I guess I thought that backupnode would provide some level of namenode redundancy. Perhaps I don't fully understand. I'll check out Bigtop. I looked at it a while ago and forgot about it. Thanks -jeremy On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote: Check out the Apache Bigtop

Re: Backupnode in 1.0.0?

2012-02-22 Thread Jeremy Hansen
By the way, I don't see anything 0.22 based in the bigtop repos. Thanks -jeremy On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote: I guess I thought that backupnode would provide some level of namenode redundancy. Perhaps I don't fully understand. I'll check out Bigtop. I looked at it a

Re: Backupnode in 1.0.0?

2012-02-22 Thread Joey Echeverria
Check out this branch for the 0.22 version of Bigtop: https://svn.apache.org/repos/asf/incubator/bigtop/branches/hadoop-0.22/ However, I don't think BackupNode is what you want. It sounds like you want HA which is coming in (hopefully) 0.23.2 and is also available today in CDH4b1. -Joey On

Streaming job hanging

2012-02-22 Thread Mohit Anchlia
Streaming job just seems to be hanging 12/02/22 17:35:50 INFO streaming.StreamJob: map 0% reduce 0% - On the admin page I see that it created 551 input split. Could somone suggest a way to find out what might be causing it to hang? I increased io.sort.mb to 200 MB. I am using 5 data

Re: Clickstream and video Analysis

2012-02-22 Thread Mark Kerzner
http://www.wibidata.com/ Only it's not open source :) You can research the story by looking at http://www.youtube.com/watch?v=pUogubA9CEA to start Mark On Wed, Feb 22, 2012 at 11:30 PM, shreya@cognizant.com wrote: Hi, Could someone provide some links on Clickstream and video Analysis

Re: Clickstream and video Analysis

2012-02-22 Thread Prashant Sharma
Tubemogul is one of them. On Thu, Feb 23, 2012 at 11:00 AM, shreya@cognizant.com wrote: Hi, Could someone provide some links on Clickstream and video Analysis in Hadoop. Thanks and Regards, Shreya Pal This e-mail and any files transmitted with it are for the sole use of the

Problem in setting up Hadoop Multi-Node Cluster using a ROUTER

2012-02-22 Thread Guruprasad B
Hello everyone. I have facing a problem on installation of hadoop multinode cluster when I connecte all the linux box througth a router. I have succeeded in installing single node cluster and multi node cluster over internet(LAN). I want to test the multi node cluster by establishing private

Re: Backupnode in 1.0.0?

2012-02-22 Thread Suresh Srinivas
Joey, Can you please answer the question from in the context of Apache releases. Not sure if CDH4b1 needs to be mentioned in the context of this mailing list. Regards, Suresh On Wed, Feb 22, 2012 at 5:24 PM, Joey Echeverria j...@cloudera.com wrote: Check out this branch for the 0.22 version

RE: Problem in setting up Hadoop Multi-Node Cluster using a ROUTER

2012-02-22 Thread Jun Ping Du
Hi Guruprasad, Do you have the valid IP--hostname setting in /etc/hosts so that each nodes can be accessed by hostname? I guess the configuration over public network can work may because it can get hostname resolved by DNS. Thanks, Junping -Original Message- From: Guruprasad B

Re: Problem in setting up Hadoop Multi-Node Cluster using a ROUTER

2012-02-22 Thread Harsh J
I'm able to make it work with a simple router without issues (Works just as well as vmnet8). Beyond Junping's point, also check that you don't have a firewall acting between your nodes. On Thu, Feb 23, 2012 at 12:45 PM, Jun Ping Du j...@vmware.com wrote: Hi Guruprasad,    Do you have the valid