Hi,
I am setting up a secured hdfs using Kerberos. I got NN, 2NN working just
fine. However, DN cannot talk to NN and throws the following exception. I
disabled the AES256 from keytab, which in theory it should fall back to the
AES128, or whatever encryption on the top of the list, but it still
This is because JAVA only supports AES 128 by default. To support AES 256, you
will need to install the unlimited-JCE policy jar from
http://www.oracle.com/technetwork/java/javase/downloads/index.html
Also, there is another case of Kerberos having issues with hostnames with
some/all letters
Vinod,
Thanks for your reply. I forgot to mention that I have already installed
the JCE policy jar on each node, so that possibility could be ruled out. On
the same token, one of my attempts was removing the AES 256 from the
keytab, but I saw the same error info. I assume AES 128 should be the
Hi Shumin,
Setting up Kerberos can be a pain, but debug output gets you a long way. I'm
not sure whether these semantics still work in 2.0, but something like this
used to give lots of output in 0.20.205:
$ HADOOP_OPTS='-Dsun.security.krb5.debug=true -Djavax.net.debug=all'
Thanks for the tip, Evert! Something new that I learnt today. Will post
more info once I get there.
P.S. I am using jpwd to debug. I wanted to fish a quick answer or two
instead of painful debugging, but looks like I just cannot avoid this
route. :-(
- Shumin
On Wed, Sep 12, 2012 at 1:56 PM,
Thanks a ton guys for showing the right direction i was so wrong with
hftp, will try out web hdfs,is hdfs FUSE mount a good approach by using
that i will have to just mount my existing local java uploads in to hdfs
but can i access Har files using this or will i have to create a symlink
for
Thanks for your reply.
But I'm not sure that woks since the data volume is large, which makes the
cost of shuffling quite high if all the process are applied in Reducer.
I thought the Hadoop would transfer all the output of Mapper to Reducer by
HTTP, right?
2012/9/11 Narasingu Ramesh
All right, I got it~Thank you very much.
2012/9/11 Harsh J ha...@cloudera.com
Hey Jason,
While I am not sure on whats the best way to automatically evaluate
during the execution of a job, the MultipleInputs class offers a way
to run different map implementations within a single job for
Hi! I'm using JobControl (v. 1.0.3) to chain two MapReduce applications. It
works and creates output data, but it doesn't give me back information messages
as number of mappers, number of records in input or in output, etc...
It only returns messages like this :
12/09/12 09:56:38 WARN
Hi,
Thank you for replaying the experiments.
I launched a job through hive with default TextInputFormat.
The job is TPC-H Q1 query, which is a simple selection query for lineitem table.
The each size of data (data01...data14) is about 300GB, so about
4.2TB(=300GB*14) in total.
I really
But as far as I know there is no way to have a snapshot of the JobControl
state.
https://issues.apache.org/jira/browse/MAPREDUCE-3562
I was trying only to get the state of all jobs and it is not possible to
get a consistent view.
For Map/Reduce progress, I guess you could the same by digging into
Hi,
I am using UserGroupInformation.doAs(...) in order to launch a job
programmatically from a remote application.
I was wondering : what is the expected behavior of nested
UserGroupInformation?
Is it the same as with Jaas? Which is, if I am not mistaken, the last inner
'subject' is used?
If
Hi Rekha and Bertrand! Thanks for the answers! Ok I see that in web interface
(_logs-history-job_.) there are infos about executions of jobs.
I hope that this infos will be enough for me.
As I said before, scanning APIs, the only method that I found was
ControlledJob:toString().
Bye! :)
Good that web hdfs is sufficient for now, Piter!
The counters are part of o.a.h.mapreduce.Job so you can get them as
job.getCounters()..etc or via JobInProgress. It is not a JobControl feature as
such, so they will not be directly in JobControl/ControlledJob API.
However Bertrand's point is an
Hi again Nick,
DBInputFormat does use Connection.TRANSACTION_SERIALIZABLE, but this a per
connection attribute. Since every mapper has its own connection, and every
connection is opened in a different time, every connection sees a different
snapshot of the DB and it can cause for example two
I have a hive external table created from a hdfs location. How do I make it
read the data from all the subdirectories also?
Thanks.
***
The information contained in this communication is confidential, is
intended only for
Hi Natraj
Create a partitioned table and add the sub dirs as partitions. You need to have
some logic in place for determining the partitions. Say if the sub dirs denote
data based on a date then make date as the partition.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
Hi,
I'm using Hadoop 1.0.1, I tried to follow
https://ccp.cloudera.com/display/CDHDOC/Configuring+Hadoop+Security+in+CDH3+%28KSSL%29
to
configure hadoop with kerberos authentication. I configured KDC and added
hdfs, mapred, host principles for each node to kerberos and deployed the
keytabs to
with speculative execution enabled Hadoop can run task attempt on more
then 1 node. If mapper is using multipleoutputs then second attempt (or
sometimes even all) fails to create output file because it is being
created by another attempt:
attempt_1347286420691_0011_m_00_0
Hi,
Sorry, i sent to the wrong ML.
Please ignore this.
Thank you.
Hi,
I'm trying to do some text analysis using mahout kmeans (clustering),
processing the data on hadoop.
--numClusters = 160
--maxIter (-x) maxIter = 200
Well my data is small, around 500MB .
I have 4 servers, each
Our hadoop version is hadoop-0.20-append+4.
We have configured the rack awareness in the namenode.
But when I add new datanode, and update the topology data file, and restart
the datanode, I just see the log in the namenode that:
2012-09-13 10:35:25,074 INFO org.apache.hadoop.net.NetworkTopology:
Thanks for your response. Can someone see if this is ok? I am not getting any
records when I query the hive table when I use Partitions.
This is how I am creating the table.
CREATE EXTERNAL TABLE Data (field1 STRING,field2) PARTITIONED BY(year
STRING, month STRING, dayofmonth STRING) ROW
Hey Radim,
Does your job use the FileOutputCommitter?
On Thu, Sep 13, 2012 at 4:21 AM, Radim Kolar h...@filez.com wrote:
with speculative execution enabled Hadoop can run task attempt on more then
1 node. If mapper is using multipleoutputs then second attempt (or sometimes
even all) fails to
Jameson,
The right process to add a new node with the right mapping is:
1. Update topology file for the new DN.
2. Issue a dfsadmin -refreshNodes to get new topology mapping updated in NN.
3. Start the DN only after (2) so it picks up the right mapping and a
default mapping does not get cached.
Thanks Brahmareddy,
Do we need to create include and exclude files, and of which extension.
Please suggest.
Regards
Yogesh Kumar
From: Brahma Reddy Battula [brahmareddy.batt...@huawei.com]
Sent: Wednesday, September 12, 2012 10:16 AM
To: user@hadoop.apache.org
Hi harsh,
I have followed your suggestion operation.
1, stop the new datanode.(I have modified the topology file in the namenode
before.)
2, run 'hadoop dfsadmin -refreshNodes' on the namenode
3, start the new datanode.
But it really not update the new topology mapping.
It just show the start
Hi Jameson,
If the NameNode has cached the wrong value earlier, it will not
refresh that until you restart it.
On Thu, Sep 13, 2012 at 11:21 AM, Jameson Li hovlj...@gmail.com wrote:
Hi harsh,
I have followed your suggestion operation.
1, stop the new datanode.(I have modified the topology
27 matches
Mail list logo