Hi,
First, you have to explain what you mean by 'equivalent' .
The short answer is that it depends.
The longer answer is that you have to consider cost in your design.
The whole issue of design is to maintain the correct ratio of cores to memory
and cores to spindles while optimizing the box
Hi Tony,
I am currently working on this to access HDFS securely and programmaticaly.
What I have found so far may help even if I am not 100% sure this is the
right way to proceed.
If you have already obtained a TGT from the kinit command, hadoop library
will locate it automatically if the name
Yes, but this will not work in a multi-tenant environment. I need to be able
to create a Kerberos TGT per execution thread.
I was hoping through JAAS that I could inject the name of the current principal
and authenticate against it. I'm sure there is a best practice for
hadoop/hbase client
Tony,
If you are doing a server app that interacts with the cluster on
behalf of different users (like Ooize, as you mentioned in your
email), then you should use the proxyuser capabilities of Hadoop.
* Configure user MYSERVERUSER as proxyuser in Hadoop core-site.xml
(this requires 2 properties
Alejandro,
Thanks for the reply. My intent is to also be able to scan/get/put hbase
tables under a specified identity as well. What options do I have to perform
the same multi-tenant authorization for these operations? I have posted this
to hbase users distribution list as well, but
On Mon, Jul 2, 2012 at 9:15 AM, Tony Dean tony.d...@sas.com wrote:
Alejandro,
Thanks for the reply. My intent is to also be able to scan/get/put hbase
tables under a specified identity as well. What options do I have to perform
the same multi-tenant authorization for these operations? I
This is actually a very complex question. Without trying to answer
completely, the high points, as I see it, are:
a) [Most important] Different kinds of nodes require different Hadoop
configurations. In particular, the number of simultaneous tasks per node
should presumably be set higher for a
You could do that, but that means your app will have to have keytabs
for all the users want to act as. Proxyuser will be much easier to
manage. Maybe getting proxyuser support in hbase if it is not there
yet
I don't think proxy auth is what the OP is after. Do I have that
right? Implies the
I am wondering what's the right way to go about designing reading input and
output where file format may change over period. For instance we might
start with field1,field2,field3 but at some point we add new field4 in
the input. What's the best way to deal with such scenarios? Keep a catalog
of
There are several different ways. One of the ways is to use something
like Hcatalog to track the format and location of the dataset. This may
be overkill for your problem, but it will grow with you. Another is to
store the scheme with the data when it is written out. Your code may need
to the
If you're talking in per-machine-slot terms, it is possible to do if
you use the Capacity Scheduler, and set a memory requirement worthy of
4 slots for your job. This way CS will reserve 4 slots for running a
single task (on a single task tracker).
If you are instead asking for a way to not run
In addition to what Robert says, using a schema-based approach such as
Apache Avro can also help here. The schemas in Avro can evolve over
time if done right, while not breaking old readers.
On Tue, Jul 3, 2012 at 2:47 AM, Robert Evans ev...@yahoo-inc.com wrote:
There are several different ways.
12 matches
Mail list logo