Combines multiple InputSplits per Mapper (CombineFileSplit), read in
serial. Reduces # of mappers for inputs that carry several (usually
small) files/blocks.
On Fri, Sep 28, 2012 at 6:54 AM, Jay Vyas jayunit...@gmail.com wrote:
Its not clear to me what the CombineInputFormat really is ? Can
Hello,
We have 15 node cluster and right now we dont have Kerberos implemented.
But on urgent basis we want to secure the cluster.
Right now anyone who know IP of Namenode can just download the Hadoop jar ,
configure xml files and say
hadoop fs -ls /
And he can see the data.
How to
What you are looking for is not related to Hadoop in the end. It is how to
restrict requests in a network.
'Firewall' is a broad term. iptables can allow you to do so quickly. You
drop everything and then accept only from a set of IPs.
You may receive answers using this mailing list but its
Hello Bertrand ,
Thanks for your reply.
Apology if this confused you. Yes IP Tables is one of the way to go but my
question is more if there is configuration within hadoop xml files to say if
this user is there then only allow to see HDFS.
I can see that we can do something for Map reduce
You need a stronger authentication method (Kerberos), period. It isn't
just fs -ls / you should be scared
about. Read Natty's post here, on what it means to run an insecure
cluster when you have secure requirements:
http://www.cloudera.com/blog/2012/03/authorization-and-authentication-in-hadoop/.
ACLs are a good way to control roles of users, but in insecure mode
users can easily be impersonated, rendering ACLs useless as a 'secure'
measure.
On Fri, Sep 28, 2012 at 3:15 PM, Shin Chan had...@gmx.com wrote:
Hello Bertrand ,
Thanks for your reply.
Apology if this confused you. Yes IP
Harsh is right. It is important to know what is the difference between
authorization and authentication.
However if you do not want anybody to write to your cluster from outside
then a firewall might be enough.
You block everything but you allow access to the webinterfaces (without
private actions
Yes - for new API MultipleOutput, use LazyOutputFormat as job's output
format, and for old API, use the NullOutputFormat as the jobconf's
output format.
On Fri, Sep 28, 2012 at 5:14 PM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
This came up recently on the forums, IIRC. The answer was to
That's definitely clearer (and it makes sense).
Thanks a lot.
Bertrand
On Fri, Sep 28, 2012 at 11:56 AM, Harsh J ha...@cloudera.com wrote:
I don't know how much of this is 1.x compatible:
- When a transaction is logged and sync'd, and a single edits storage
location fails during write,
Hi,
Modularity!
I've always had the same question before. However, Tom White put that
thought to rest:
It’s possible to make map and reduce functions even more composable
than we have done. A mapper commonly performs input format parsing,
projection (selecting the relevant fields), and
On Fri 28 Sep 2012 09:39:13 AM EDT, Harsh J wrote:
Modularity!
Exactly! Write a mapper that operates as a filter on something about
your keys, then use it in whatever jobs you want. Your job needs to
operate on data subset A? chain it with the filter mapper that picks
out A. Your next one
This document has clear description, although I don't know if it
applies to hadoop2.0.
http://hadoop.apache.org/docs/r1.0.3/hdfs_permissions_guide.html
I quote some text from this document. Hopefully this can help you.
Overview
The Hadoop Distributed File System (HDFS) implements a permissions
Hi,
we are using FSDataOutputStream.writeBytes() from map/reduce to write to
Hive table path directly instead of context.write() which is working fine
and so far no problems with this approach.
we make sure the file names are distinct by appending taskAttemptId to them
and we use speculative
13 matches
Mail list logo