DN cannot talk to NN using Kerberos on secured hdfs

2012-09-12 Thread Shumin Wu
Hi, I am setting up a secured hdfs using Kerberos. I got NN, 2NN working just fine. However, DN cannot talk to NN and throws the following exception. I disabled the AES256 from keytab, which in theory it should fall back to the AES128, or whatever encryption on the top of the list, but it still

Re: DN cannot talk to NN using Kerberos on secured hdfs

2012-09-12 Thread Vinod Kumar Vavilapalli
This is because JAVA only supports AES 128 by default. To support AES 256, you will need to install the unlimited-JCE policy jar from http://www.oracle.com/technetwork/java/javase/downloads/index.html Also, there is another case of Kerberos having issues with hostnames with some/all letters

Re: DN cannot talk to NN using Kerberos on secured hdfs

2012-09-12 Thread Shumin Wu
Vinod, Thanks for your reply. I forgot to mention that I have already installed the JCE policy jar on each node, so that possibility could be ruled out. On the same token, one of my attempts was removing the AES 256 from the keytab, but I saw the same error info. I assume AES 128 should be the

RE: DN cannot talk to NN using Kerberos on secured hdfs

2012-09-12 Thread Evert Lammerts
Hi Shumin, Setting up Kerberos can be a pain, but debug output gets you a long way. I'm not sure whether these semantics still work in 2.0, but something like this used to give lots of output in 0.20.205: $ HADOOP_OPTS='-Dsun.security.krb5.debug=true -Djavax.net.debug=all'

Re: DN cannot talk to NN using Kerberos on secured hdfs

2012-09-12 Thread Shumin Wu
Thanks for the tip, Evert! Something new that I learnt today. Will post more info once I get there. P.S. I am using jpwd to debug. I wanted to fish a quick answer or two instead of painful debugging, but looks like I just cannot avoid this route. :-( - Shumin On Wed, Sep 12, 2012 at 1:56 PM,

Re: Accessing image files from hadoop to jsp

2012-09-12 Thread Visioner Sadak
Thanks a ton guys for showing the right direction i was so wrong with hftp, will try out web hdfs,is hdfs FUSE mount a good approach by using that i will have to just mount my existing local java uploads in to hdfs but can i access Har files using this or will i have to create a symlink for

Re: how to make different mappers execute different processing on same data ?

2012-09-12 Thread Jason Yang
Thanks for your reply. But I'm not sure that woks since the data volume is large, which makes the cost of shuffling quite high if all the process are applied in Reducer. I thought the Hadoop would transfer all the output of Mapper to Reducer by HTTP, right? 2012/9/11 Narasingu Ramesh

Re: How to make different mappers execute different processing on a same data ?

2012-09-12 Thread Jason Yang
All right, I got it~Thank you very much. 2012/9/11 Harsh J ha...@cloudera.com Hey Jason, While I am not sure on whats the best way to automatically evaluate during the execution of a job, the MultipleInputs class offers a way to run different map implementations within a single job for

How get information messages when a JobControl is used ?

2012-09-12 Thread Piter85 Piter85
Hi! I'm using JobControl (v. 1.0.3) to chain two MapReduce applications. It works and creates output data, but it doesn't give me back information messages as number of mappers, number of records in input or in output, etc... It only returns messages like this : 12/09/12 09:56:38 WARN

Re: Question about the task assignment strategy

2012-09-12 Thread Hiroyuki Yamada
Hi, Thank you for replaying the experiments. I launched a job through hive with default TextInputFormat. The job is TPC-H Q1 query, which is a simple selection query for lineitem table. The each size of data (data01...data14) is about 300GB, so about 4.2TB(=300GB*14) in total. I really

Re: How get information messages when a JobControl is used ?

2012-09-12 Thread Bertrand Dechoux
But as far as I know there is no way to have a snapshot of the JobControl state. https://issues.apache.org/jira/browse/MAPREDUCE-3562 I was trying only to get the state of all jobs and it is not possible to get a consistent view. For Map/Reduce progress, I guess you could the same by digging into

Expected behavior of nested UserGroupInformation

2012-09-12 Thread Bertrand Dechoux
Hi, I am using UserGroupInformation.doAs(...) in order to launch a job programmatically from a remote application. I was wondering : what is the expected behavior of nested UserGroupInformation? Is it the same as with Jaas? Which is, if I am not mistaken, the last inner 'subject' is used? If

RE: How get information messages when a JobControl is used ?

2012-09-12 Thread Piter85 Piter85
Hi Rekha and Bertrand! Thanks for the answers! Ok I see that in web interface (_logs-history-job_.) there are infos about executions of jobs. I hope that this infos will be enough for me. As I said before, scanning APIs, the only method that I found was ControlledJob:toString(). Bye! :)

Re: How get information messages when a JobControl is used ?

2012-09-12 Thread Joshi, Rekha
Good that web hdfs is sufficient for now, Piter! The counters are part of o.a.h.mapreduce.Job so you can get them as job.getCounters()..etc or via JobInProgress. It is not a JobControl feature as such, so they will not be directly in JobControl/ControlledJob API. However Bertrand's point is an

Re: Some general questions about DBInputFormat

2012-09-12 Thread Yaron Gonen
Hi again Nick, DBInputFormat does use Connection.TRANSACTION_SERIALIZABLE, but this a per connection attribute. Since every mapper has its own connection, and every connection is opened in a different time, every connection sees a different snapshot of the DB and it can cause for example two

How to make the hive external table read from subdirectories

2012-09-12 Thread Nataraj Rashmi - rnatar
I have a hive external table created from a hdfs location. How do I make it read the data from all the subdirectories also? Thanks. *** The information contained in this communication is confidential, is intended only for

Re: How to make the hive external table read from subdirectories

2012-09-12 Thread Bejoy KS
Hi Natraj Create a partitioned table and add the sub dirs as partitions. You need to have some logic in place for determining the partitions. Say if the sub dirs denote data based on a date then make date as the partition. Regards Bejoy KS Sent from handheld, please excuse typos.

'Can't get service ticket for: host/0.0.0.0' when running hdfs with kerberos

2012-09-12 Thread jack chrispoo
Hi, I'm using Hadoop 1.0.1, I tried to follow https://ccp.cloudera.com/display/CDHDOC/Configuring+Hadoop+Security+in+CDH3+%28KSSL%29 to configure hadoop with kerberos authentication. I configured KDC and added hdfs, mapred, host principles for each node to kerberos and deployed the keytabs to

multipleoutputs does not like speculative execution in map-only job

2012-09-12 Thread Radim Kolar
with speculative execution enabled Hadoop can run task attempt on more then 1 node. If mapper is using multipleoutputs then second attempt (or sometimes even all) fails to create output file because it is being created by another attempt: attempt_1347286420691_0011_m_00_0

Re: Is mahout kmeans slow ?

2012-09-12 Thread Elaine Gan
Hi, Sorry, i sent to the wrong ML. Please ignore this. Thank you. Hi, I'm trying to do some text analysis using mahout kmeans (clustering), processing the data on hadoop. --numClusters = 160 --maxIter (-x) maxIter = 200 Well my data is small, around 500MB . I have 4 servers, each

rack topology data update

2012-09-12 Thread Jameson Li
Our hadoop version is hadoop-0.20-append+4. We have configured the rack awareness in the namenode. But when I add new datanode, and update the topology data file, and restart the datanode, I just see the log in the namenode that: 2012-09-13 10:35:25,074 INFO org.apache.hadoop.net.NetworkTopology:

RE: How to make the hive external table read from subdirectories

2012-09-12 Thread Nataraj Rashmi - rnatar
Thanks for your response. Can someone see if this is ok? I am not getting any records when I query the hive table when I use Partitions. This is how I am creating the table. CREATE EXTERNAL TABLE Data (field1 STRING,field2) PARTITIONED BY(year STRING, month STRING, dayofmonth STRING) ROW

Re: multipleoutputs does not like speculative execution in map-only job

2012-09-12 Thread Harsh J
Hey Radim, Does your job use the FileOutputCommitter? On Thu, Sep 13, 2012 at 4:21 AM, Radim Kolar h...@filez.com wrote: with speculative execution enabled Hadoop can run task attempt on more then 1 node. If mapper is using multipleoutputs then second attempt (or sometimes even all) fails to

Re: rack topology data update

2012-09-12 Thread Harsh J
Jameson, The right process to add a new node with the right mapping is: 1. Update topology file for the new DN. 2. Issue a dfsadmin -refreshNodes to get new topology mapping updated in NN. 3. Start the DN only after (2) so it picks up the right mapping and a default mapping does not get cached.

RE: removing datanodes from clustes.

2012-09-12 Thread yogesh.kumar13
Thanks Brahmareddy, Do we need to create include and exclude files, and of which extension. Please suggest. Regards Yogesh Kumar From: Brahma Reddy Battula [brahmareddy.batt...@huawei.com] Sent: Wednesday, September 12, 2012 10:16 AM To: user@hadoop.apache.org

Re: rack topology data update

2012-09-12 Thread Jameson Li
Hi harsh, I have followed your suggestion operation. 1, stop the new datanode.(I have modified the topology file in the namenode before.) 2, run 'hadoop dfsadmin -refreshNodes' on the namenode 3, start the new datanode. But it really not update the new topology mapping. It just show the start

Re: rack topology data update

2012-09-12 Thread Viji R
Hi Jameson, If the NameNode has cached the wrong value earlier, it will not refresh that until you restart it. On Thu, Sep 13, 2012 at 11:21 AM, Jameson Li hovlj...@gmail.com wrote: Hi harsh, I have followed your suggestion operation. 1, stop the new datanode.(I have modified the topology