reducers and data locality

2012-04-26 Thread mete
Hello folks, I have a lot of input splits (10k-50k - 128 mb blocks) which contains text files. I need to process those line by line, then copy the result into roughly equal size of "shards". So i generate a random key (from a range of [0:numberOfShards]) which is used to route the map output to d

Re: DFSClient error

2012-04-26 Thread Harsh J
Is only the same IP printed in all such messages? Can you check the DN log in that machine to see if it reports any form of issues? Also, did your jobs fail or kept going despite these hiccups? I notice you're threading your clients though (?), but I can't tell if that may cause this without furth

Re: Changing the Java heap

2012-04-26 Thread Harsh J
Deepak is right here. The line-reading technique is explained in further detail at http://wiki.apache.org/hadoop/HadoopMapReduce. On Fri, Apr 27, 2012 at 2:37 AM, Deepak Nettem wrote: > HDFS doesn't care about the contents of the file. The file gets divided > into 64MB Blocks. > > For example, If

DFSClient error

2012-04-26 Thread Mohit Anchlia
I had 20 mappers in parallel reading 20 gz files and each file around 30-40MB data over 5 hadoop nodes and then writing to the analytics database. Almost midway it started to get this error: 2012-04-26 16:13:53,723 [Thread-8] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputS

Re: Changing the Java heap

2012-04-26 Thread Deepak Nettem
HDFS doesn't care about the contents of the file. The file gets divided into 64MB Blocks. For example, If your input file contains data in custom format (like Paragraphs) and you want the files to split as per paragraphs, HDFS isn't responsible - and rightly so. The application developer needs to

RE: Changing the Java heap

2012-04-26 Thread Barry, Sean F
I guess what I meant to say was, how does hadoop make 64M blocks without cutting off parts of words at the end of each block? Does it only make blocks at whitespace? -SB -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Thursday, April 26, 2012 1:56 PM To:

Re: Changing the Java heap

2012-04-26 Thread Michael Segel
Not sure of your question. Java child Heap size is independent of how files are split on HDFS. I suggest you look at Tom White's book on HDFS and how files are split in to blocks. Blocks are split on set sizes. 64MB by default. Your record boundaries are not necessarily on file block bounda

Changing the Java heap

2012-04-26 Thread Barry, Sean F
Within my small 2 node cluster I set up my 4 core slave node to have 4 task trackers and I also limited my java heap size to -Xmx1024m Is there a possibility that when the data gets broken up that it will break it at a place in the file that is not a whitespace? Or is that already handled when

Re: Passing a value from main() to map()

2012-04-26 Thread 王瑞军
hi ,thank you for reply me.i don't know whether the Job configuration can work,now i solve it in another way. i want to pass a value from main to map because i want to do a judge in map.now i write several map class and judge it in main(). 在 2012年4月26日 下午5:58,Devaraj k 写道: > Hi Wang Ruijun, > >

Re: Design question

2012-04-26 Thread Mohit Anchlia
Ant suggestion or pointers would be helpful. Are there any best practices? On Mon, Apr 23, 2012 at 3:27 PM, Mohit Anchlia wrote: > I just wanted to check how do people design their storage directories for > data that is sent to the system continuously. For eg: for a given > functionality we get d

Re: HDFS mounting issue using Hadoop-Fuse on Fully Distributed Cluster?

2012-04-26 Thread alo alt
Manu, for clarifying: root has no access to the mounted HDFS. Just follow the howto: 1. create the group and the users on ALL nodes: groupadd hdfs-user && adduser USERNAME -G hdfs-user 2. sudo into hdfs: su - hdfs 3. create a directory in hdfs and change the rights: hadoop fs -mkdir /someone

Re: No Space left on device

2012-04-26 Thread Chris Curtin
Look at how many job_ directories there are on your slave nodes. We're using Cloudera so they are under the 'userlogs' directory, not sure on 'pure' Apache where they are. As we approach 30k we see this.(we run a monthly report does 10s of thousands of jobs in a few days) We've tried tuning the #

Re: HDFS mounting issue using Hadoop-Fuse on Fully Distributed Cluster?

2012-04-26 Thread alo alt
Yes, as I wrote. You can't use root as user for writing, root (or superuser) has another context in hdfs. Just change into hdfs (su - hdfs) and try again. For all user who should have access to the mounted fs you should create a group and chown them in hdfs (maybe /tmp/group or similar) best, A

Re: HDFS mounting issue using Hadoop-Fuse on Fully Distributed Cluster?

2012-04-26 Thread Manu S
Yeah Alex, I tried. But still I am not able to make it [root@namenode ~]# *echo "hadoop-fuse-dfs#dfs://namenode:8020 /hdfs_mount fuse usetrash,rw 0 0" >> /etc/fstab* [root@namenode ~]# *mount -a INFO fuse_options.c:162 Adding FUSE arg /hdfs_mount* [root@namenode ~]# *mount | grep fuse fuse on /hd

Re: HDFS mounting issue using Hadoop-Fuse on Fully Distributed Cluster?

2012-04-26 Thread alo alt
Manu, did you mount hdfs over fstab: hadoop-fuse-dfs#dfs://namenode.local: /hdfs-mount fuse usetrash,rw 0 0" ? You could that do with: "mkdir -p /hdfs-mount && chmod 777 /hdfs-mount && echo "hadoop-fuse-dfs#dfs://NN.URI: /hdfs-mount fuse usetrash,rw 0 0" >> /etc/fstab && mount -a ; mount" -

Re: hadoop on fedora 15

2012-04-26 Thread Marcos Ortiz
On 04/26/2012 01:49 AM, john cohen wrote: I had the same issue. My problem was the use of VPN connected to work, and at the same time working with M/R jobs on my Mac. It occurred to me that maybe Hadoop was binding to the wrong IP (the IP given to you after connecting through VPN), bottom lin

Re: HDFS mounting issue using Hadoop-Fuse on Fully Distributed Cluster?

2012-04-26 Thread Manu S
Thanks a lot Alex. Actually I didn't tried the NFS option, as I am trying to sort out this hadoop-fuse mounting issue. I can't change the ownership of mount directory after hadoop-fuse mount. [root@namenode ~]# ls -ld /hdfs_mount/ drwxrwxr-x 11 nobody nobody 4096 Apr 9 12:34 /hdfs_mount/ [root@

Berlin Buzzwords program is online

2012-04-26 Thread Isabel Drost
This is to announce the Berlin Buzzwords program. The Program Committee has completed reviewing all submissions and set up the schedule containing a great lineup of speakers for this years Berlin Buzzwords program. Among the speakers we have Leslie Hawthorn (Red Hat), Alex Lloyd (Google), Michae

Re: HDFS mounting issue using Hadoop-Fuse on Fully Distributed Cluster?

2012-04-26 Thread alo alt
Hi, I wrote a small writeup about: http://mapredit.blogspot.de/2011/11/nfs-exported-hdfs-cdh3.html As you see, the FS is mounted as nobody and you try as root. Change the permissions in your hdfs: hadoop -dfs chmod / chown - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Apr 26

HDFS mounting issue using Hadoop-Fuse on Fully Distributed Cluster?

2012-04-26 Thread Manu S
Dear All, I have installed *Hadoop-fuse* to mount the HDFS filesystem locally . I could mount the HDFS without any issues.But I am not able to do any file operations like *delete, copy, move* etc directly. The directory ownership automatically changed to *nobody:nobody* while mounting. *[root@nam

RE: Passing a value from main() to map()

2012-04-26 Thread Devaraj k
Hi Wang Ruijun, You can do this way, 1. Set the value in Job configuration with some property name before submitting the job. 2. Get the value in map() function using the property name from the configuration and you can perform the business logic. Thanks Devaraj

Re: Passing a value from main() to map()

2012-04-26 Thread Ajay Srivastava
Hi, It seems that you are trying to code a hadoop job which takes input from stdin ? As far as I know it is not possible as hadoop job works on input coming in batches not on real time streams. Please wait for others response as well as I am a beginner and may be missing something here. Ajay Sr

Re: No Space left on device

2012-04-26 Thread JunYong Li
maybe exists file hole, are -sh and du -sch /tmp results same?

Re: Hadoop cluster development under Eclipse - how to handle multiple users?

2012-04-26 Thread JunYong Li
add configuration: dfs.permissions false 2012/4/26 Lukáš Kryške > > Hello, I'm using Ubuntu Linux and Eclipse IDE for my Hadoop cluster > development. I've two users in my OS: - hadoop (local cluster user)- user > (local user where I use the Eclipse IDE, SVN for my

AutoInputFormat

2012-04-26 Thread 黄 山
I use org.apache.hadoop.streaming.AutoInputFormat to handle sequence file input for streaming, but I found that it provide format below for . ( key is a string , value is binary) "keystring\tvalue\n" since value is binary, there is a lot '\n' within value, my mapper can't distinguish it. in

Passing a value from main() to map()

2012-04-26 Thread 王瑞军
hello everyone: i have a problem.we knew map() has it's own parameter such as map(Object key, Text value, Context context). the following is my structure. public class aaa{ public static class bbb { public void map() { i want use the variable nu

Re: Hadoop cluster development under Eclipse - how to handle multiple users?

2012-04-26 Thread JunYong Li
add configuration: dfs.permissions false 2012/4/26 Lukáš Kryške > > Hello, I'm using Ubuntu Linux and Eclipse IDE for my Hadoop cluster > development. I've two users in my OS: - hadoop (local cluster user)- user > (local user where I use the Eclipse IDE, SVN for

Re: Text Analysis

2012-04-26 Thread praveenesh kumar
Rhive uses Hive Thrift server to connect with Hive. You can execute hive queries and get results back into R data frames. and then play around with it using R libraries. Its pretty interesting project, given that you have Hive setup on top of hadoop. Regards, Praveenesh On Thu, Apr 26, 2012 at 1:

RE: Text Analysis

2012-04-26 Thread Guillaume Polaert
Hello, Yesterday, I've discovered RHive project. It use R-server on each datanode. Does Somebody tried it ? Guillaume -Message d'origine- De : Devi Kumarappan [mailto:kpala...@att.net] Envoyé : mercredi 25 avril 2012 22:56 À : common-user@hadoop.apache.org Objet : Re: Text Analysis RH

Hadoop cluster development under Eclipse - how to handle multiple users?

2012-04-26 Thread Lukáš Kryške
Hello, I'm using Ubuntu Linux and Eclipse IDE for my Hadoop cluster development. I've two users in my OS: - hadoop (local cluster user)- user (local user where I use the Eclipse IDE, SVN for my project etc.) When I am developing my program under 'user' I am not able to debug my program directly

Re: No Space left on device

2012-04-26 Thread Sidney Simmons
Might be worth checking your inode usage as lack of inodes will result in the same error HTH On 26 Apr 2012, at 07:01, Harsh J wrote: > The transient Map->Reduce files do not go to the DFS, but rather onto > the local filesystem directories specified by the "mapred.local.dir" > parameter. I