HDFS-347 and HDFS-2246 issues different

2012-10-08 Thread jlei liu
The two issues both implement DFSClient to directly open data blocks that happen to be on the same machine function. What are advantage of HDFS-347? Thanks, LiuLei

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-08 Thread Bertrand Dechoux
Have you looked at graph processing for Hadoop? Like Hama ( http://hama.apache.org/) or Giraph (http://incubator.apache.org/giraph/). I can't say for sure it would help you but it seems to be in the same problem domain. With regard to the chaining reducer issue this is indeed a general implementat

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-08 Thread Fabio Pitzolu
Isn't also of some help using Cascading (http://www.cascading.org/) ? *Fabio Pitzolu* Consultant - BI & Infrastructure Mob. +39 3356033776 Telefono 02 87157239 Fax. 02 93664786 *Gruppo Consulenza Innovazione - http://www.gr-ci.com* 2012/10/8 Bertrand Dechoux > Have you looked at graph proce

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-08 Thread Bertrand Dechoux
The question is not how to sequence all. Cascading could indeed help in that case. But how to skip the map phase and do the split/local sort directly at the end of the reduce so that the next reduce need only to do a merge on the sorted files obtained from the previous reduce. This is basically a

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-08 Thread Edward J. Yoon
> call context.write() in my mapper class)? If not, are there any other > MR platforms that can do this? I've been searching around and couldn't You can use Hama BSP[1] instead of Map/Reduce. No stable release yet but I confirmed that large graph with billions of nodes and edges can be crunched i

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-08 Thread Michael Segel
I don't believe that Hama would suffice. In terms of M/R where you want to chain reducers... Can you chain combiners? (I don't think so, but you never know) If not, you end up with a series of M/R jobs and the Mappers are just identity mappers. Or you could use HBase, with a small caveat...

Re: HDFS-347 and HDFS-2246 issues different

2012-10-08 Thread Harsh J
As HDFS-2246 itself states in one comment: """ HDFS-347 discusses ways to optimize reads for local clients. A clean design is fairly involved. A shortcut has been proposed where the client access the hdfs file blocks directly; this works if the client is the same user/group as the DN daemon. This

Re: sym Links in hadoop

2012-10-08 Thread Visioner Sadak
I tried using FileUtil class for creating a symlink within hadoop actually i want to create a symlink for my har directory so my code looks like FileUtil.symLink("/user2/","har:///user/5oct2012.har") but getting error like this org.apache.hadoop.fs.FileUtil - Command 'ln -s /user2/ har://user/5

How to change topology

2012-10-08 Thread Shinichi Yamashita
Hi, I know that DataNode and TaskTracker must restart to change topology. Is there the method to execute the topology change without restart of DataNode and TaskTracker? In other words, can I change the topology by a command? Thanks in advance! Shinichi

Re: sym Links in hadoop

2012-10-08 Thread Dave Beech
Hi, The FileUtil.symlink command does nothing more than call the unix "ln" command, so it has no knowledge of how to work with Hadoop archive files, only plain files and directories. Is your archive on local disk, or in HDFS? Cheers, Dave On 8 October 2012 13:43, Visioner Sadak wrote: > I tried u

Collecting error messages from Mappers

2012-10-08 Thread Terry Healy
Hi- Is there a simple way for error output / debug information generated by a Mapper to be collected in one place for a given M/R job run? I guess what I'm hoping for is sort of the reverse of a Distributed Cache function. Can anyone suggest an approach? Thanks, Terry

Re: sym Links in hadoop

2012-10-08 Thread Visioner Sadak
thanks dave its in hdfs onlyany other methods of creating a symlink On Mon, Oct 8, 2012 at 7:00 PM, Dave Beech wrote: > Hi, > The FileUtil.symlink command does nothing more than call the unix "ln" > command, so it has no knowledge of how to work with Hadoop archive > files, only plain files

One file per mapper?

2012-10-08 Thread Terry Healy
Hello- I know that it is contrary to normal Hadoop operation, but how can I configure my M/R job to send one complete file to each mapper task? This is intended to be used on many files in the 1.5 MB range as the first step in a chain of processes. thanks.

Re: sym Links in hadoop

2012-10-08 Thread Visioner Sadak
act i have to access my archived hadoop har files thru http webhdfs normal files i am able to read thru http but once my files are HAR archived i m not able to read it ...thts why creating a symlink so tht the url remains same On Mon, Oct 8, 2012 at 7:50 PM, Visioner Sadak wrote: > thanks dave

Re: One file per mapper?

2012-10-08 Thread Bejoy Ks
Hi Terry If you are having files smaller than hdfs block size and if you are using Default TextInputFormat with the default properties for split sizes there would be just one file per mapper. If you are having larger file sizes, greater than the size of a hdfs block. Please take a look at a sampl

Re: Task Attempt Failed

2012-10-08 Thread David Rosenstrauch
I get those too. It's probably this issue: https://issues.apache.org/jira/browse/MAPREDUCE-2374 If you go look in the task tracker log on the node that had the failed task you'll probably see that "Text File Busy" message in there. Looks like this is fixed in newer releases of CDH. HTH, DR

Re: Task Attempt Failed

2012-10-08 Thread Bejoy Ks
Hi Dave Can you post the task logs corresponding to this. You can browse the web Ui till the failed task log, it'll conatin more information to help you analyze task failure reasons. On Mon, Oct 8, 2012 at 5:58 PM, Dave Shine < dave.sh...@channelintelligence.com> wrote: > I’m starting to see th

Re: Collecting error messages from Mappers

2012-10-08 Thread Bertrand Dechoux
If it is only to get a parse overview of the run you can use counters. But you shouldn't overuse them. You can for example count exceptions by type (but by message is not a good approach unless you are sure that the message is constant.) Regards Bertrand On Mon, Oct 8, 2012 at 4:13 PM, Terry Hea

Re: One file per mapper?

2012-10-08 Thread Terry Healy
thanks Bejoy. ...Feeling a bit foolish as Tom White's book was 2 feet away On 10/08/2012 10:28 AM, Bejoy Ks wrote: > Hi Terry > > If you are having files smaller than hdfs block size and if you are > using Default TextInputFormat with the default properties for split > sizes there would be j

Re: sym Links in hadoop

2012-10-08 Thread Terry Healy
Visioner- I hope it is not heresy to mention here, but I believe the MapR implementation of Hadoop supports symlink via NFS. -Terry On 10/08/2012 10:20 AM, Visioner Sadak wrote: > thanks dave its in hdfs onlyany other methods of creating a symlink > > On Mon, Oct 8, 2012 at 7:00 PM, Dave Be

Re: sym Links in hadoop

2012-10-08 Thread Colin McCabe
You can create an HDFS symlink by using the FileContext#createSymlink function. I don't think this can be done through the "hadoop fs" command, so you're going to have to write some Java code to do this. We should consider adding this functionality to the "hadoop fs" command in the future. Colin

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-08 Thread Jim Twensky
Thank you for the comments. Some similar frameworks I looked at include Haloop, Twister, Hama, Giraph and Cascading. I am also doing large scale graph processing so I assumed one of them could serve the purpose. Here is a summary of what I found out about them that is relevant: 1) Haloop and Twist

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-08 Thread Michael Segel
Well I was thinking ... Map -> Combiner -> Reducer -> Identity Mapper -> combiner -> reducer -> Identity Mapper -> combiner -> reducer... May make things easier. HTH 0Mike On Oct 8, 2012, at 2:09 PM, Jim Twensky wrote: > Thank you for the comments. Some similar frameworks I looked at > in

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-08 Thread Jim Twensky
Hi Mike, I'm already doing that but the output of the reduce goes straight back to HDFS to be consumed by the next Identity Mapper. Combiners just reduce the amount of data between map and reduce whereas I'm looking for an optimization between reduce and map. Jim On Mon, Oct 8, 2012 at 2:19 PM,

Question regarding input path for a map reduce job.

2012-10-08 Thread Jane Chen
It seems that MapReduce allows the input path for a job to be a string containing regex patterns. Yet, Path.normalizePath() will replace \\ with /. So how should one escape the regex special characters if needed when setting a job's input paths? Jane Chen Lead Engineer MarkLogic Corporation

Secure hadoop and group permission on HDFS

2012-10-08 Thread Koert Kuipers
With secure hadoop the user name is authenticated by the kerberos server. But what about the groups that the user is a member of? Are these simple the groups that the user is a member of on the namenode machine? Is it viable to manage access to files on HDFS using groups on a secure hadoop cluster?

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-08 Thread Edward J. Yoon
> asking for. If anyone who used Hama can point a few articles about how > the framework actually works and handles the messages passed between > vertices, I'd really appreciate that. Hama Architecture: https://issues.apache.org/jira/secure/attachment/12528219/ApacheHamaDesign.pdf Hama BSP progra

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-08 Thread Edward J. Yoon
P.S., giraph is different in the sense that it runs as a map-only job. On Tue, Oct 9, 2012 at 7:45 AM, Edward J. Yoon wrote: >> asking for. If anyone who used Hama can point a few articles about how >> the framework actually works and handles the messages passed between >> vertices, I'd really ap

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-08 Thread Michael Segel
Jim, You can use the combiner as a reducer albeit you won't get down to a single reduce file output. But you don't need that. As long as the output from the combiner matches the input to the next reducer you should be ok. Without knowing the specifics, all I can say is TANSTAAFL that is to sa

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-08 Thread Edward J. Yoon
Mike, just FYI, it's my 08's approach[1]. 1. https://blogs.apache.org/hama/entry/how_will_hama_bsp_different On Tue, Oct 9, 2012 at 7:50 AM, Michael Segel wrote: > Jim, > > You can use the combiner as a reducer albeit you won't get down to a single > reduce file output. But you don't need that.

Re: Secure hadoop and group permission on HDFS

2012-10-08 Thread Harsh J
Koert, If you use the org.apache.hadoop.security.ShellBasedUnixGroupsMapping class (via hadoop.security.group.mapping), then yes the NameNode's view of the local unix groups (and the primary group) of the user is the final say on what groups the user belongs to. This can be relied on - but note th

Re: Secure hadoop and group permission on HDFS

2012-10-08 Thread Ivan Frain
Hi Koert, Another option is to use the LdapGroupsMapping which picks up the group membership from a LDAP directory. You can find more details on the JIRA issue: https://issues.apache.org/jira/browse/HADOOP-8121 Up to now, it is available for ActiveDirectory and released in hadoop-2.0.0-alpha and n

Re: HDFS-347 and HDFS-2246 issues different

2012-10-08 Thread jlei liu
Hi Harsh, thank you for your replay. In HDFS-2246, when Local DataNode is dead, the DFSClient still can read data form local file. I think that may lead to DFSCient read wrong data. HDFS-347 use domain socket to implement Local Read, when Local DataNode is dead, can DFSClient read data from local

Re: sym Links in hadoop

2012-10-08 Thread Visioner Sadak
Thanks Colin I tried using FileContext but the class is showing as depricated On Tue, Oct 9, 2012 at 12:02 AM, Colin McCabe wrote: > You can create an HDFS symlink by using the FileContext#createSymlink > function. I don't think this can be done through the "hadoop fs" > command, so you're going