Re: Multiple cores vs multiple nodes

2012-07-02 Thread Michael Segel
Hi, First, you have to explain what you mean by 'equivalent' . The short answer is that it depends. The longer answer is that you have to consider cost in your design. The whole issue of design is to maintain the correct ratio of cores to memory and cores to spindles while optimizing the box

Re: hadoop security API (repost)

2012-07-02 Thread Ivan Frain
Hi Tony, I am currently working on this to access HDFS securely and programmaticaly. What I have found so far may help even if I am not 100% sure this is the right way to proceed. If you have already obtained a TGT from the kinit command, hadoop library will locate it automatically if the name

RE: hadoop security API (repost)

2012-07-02 Thread Tony Dean
Yes, but this will not work in a multi-tenant environment. I need to be able to create a Kerberos TGT per execution thread. I was hoping through JAAS that I could inject the name of the current principal and authenticate against it. I'm sure there is a best practice for hadoop/hbase client

Re: hadoop security API (repost)

2012-07-02 Thread Alejandro Abdelnur
Tony, If you are doing a server app that interacts with the cluster on behalf of different users (like Ooize, as you mentioned in your email), then you should use the proxyuser capabilities of Hadoop. * Configure user MYSERVERUSER as proxyuser in Hadoop core-site.xml (this requires 2 properties

RE: hadoop security API (repost)

2012-07-02 Thread Tony Dean
Alejandro, Thanks for the reply. My intent is to also be able to scan/get/put hbase tables under a specified identity as well. What options do I have to perform the same multi-tenant authorization for these operations? I have posted this to hbase users distribution list as well, but

Re: hadoop security API (repost)

2012-07-02 Thread Alejandro Abdelnur
On Mon, Jul 2, 2012 at 9:15 AM, Tony Dean tony.d...@sas.com wrote: Alejandro, Thanks for the reply. My intent is to also be able to scan/get/put hbase tables under a specified identity as well. What options do I have to perform the same multi-tenant authorization for these operations? I

Re: Multiple cores vs multiple nodes

2012-07-02 Thread Matt Foley
This is actually a very complex question. Without trying to answer completely, the high points, as I see it, are: a) [Most important] Different kinds of nodes require different Hadoop configurations. In particular, the number of simultaneous tasks per node should presumably be set higher for a

Re: hadoop security API (repost)

2012-07-02 Thread Andrew Purtell
You could do that, but that means your app will have to have keytabs for all the users want to act as. Proxyuser will be much easier to manage. Maybe getting proxyuser support in hbase if it is not there yet I don't think proxy auth is what the OP is after. Do I have that right? Implies the

Dealing with changing file format

2012-07-02 Thread Mohit Anchlia
I am wondering what's the right way to go about designing reading input and output where file format may change over period. For instance we might start with field1,field2,field3 but at some point we add new field4 in the input. What's the best way to deal with such scenarios? Keep a catalog of

Re: Dealing with changing file format

2012-07-02 Thread Robert Evans
There are several different ways. One of the ways is to use something like Hcatalog to track the format and location of the dataset. This may be overkill for your problem, but it will grow with you. Another is to store the scheme with the data when it is written out. Your code may need to the

Re: force minimum number of nodes to run a job?

2012-07-02 Thread Harsh J
If you're talking in per-machine-slot terms, it is possible to do if you use the Capacity Scheduler, and set a memory requirement worthy of 4 slots for your job. This way CS will reserve 4 slots for running a single task (on a single task tracker). If you are instead asking for a way to not run

Re: Dealing with changing file format

2012-07-02 Thread Harsh J
In addition to what Robert says, using a schema-based approach such as Apache Avro can also help here. The schemas in Avro can evolve over time if done right, while not breaking old readers. On Tue, Jul 3, 2012 at 2:47 AM, Robert Evans ev...@yahoo-inc.com wrote: There are several different ways.