Re: Unblacklist a blacklisted tracker (at job level)?

2013-02-15 Thread David Rosenstrauch
I'm once again finding myself in this same situation, but still have no solution. I have 4 task trackers that were black-listed at the job level. I've since fixed the issue that got them blacklisted. But the job still isn't assigning them tasks. Is there any way to clear the blacklist at

Re: How to handle sensitive data

2013-02-15 Thread Michael Segel
Simple, have your app encrypt the field prior to writing to HDFS. Also consider HBase. On Feb 14, 2013, at 10:35 AM, abhishek abhishek.dod...@gmail.com wrote: Hi all, we are having some sensitive data, in some particular fields(columns). Can I know how to handle sensitive in Hadoop.

Re: How to handle sensitive data

2013-02-15 Thread Marcos Ortiz Valmaseda
Regards, abhishek. I´m agree with Michael. You can encrypt your incoming data from your application. I recommend to use HBase too. - Mensaje original - De: Michael Segel michael_se...@hotmail.com Para: common-user@hadoop.apache.org CC: cdh-u...@cloudera.org Enviados: Viernes, 15 de

Running hadoop on directory structure

2013-02-15 Thread Max Lebedev
Hi, I am a CS undergraduate working with hadoop. I wrote a library to process logs, my input directory has the following structure: logs_hourly ├── dt=2013-02-15 │   ├── ts=1360887451 │   │   └── syslog-2013-02-15-1360887451.gz │   └── ts=1360891051 │      └── syslog-2013-02-15-1360891051.gz ├──

Re: Running hadoop on directory structure

2013-02-15 Thread Harsh J
You should be able to use http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html to achieve this. It accepts subdirectory creation (under main job output directory). However, the special chars may be an issue (i.e. -, =, etc.), for which you'll either need

Re: Managing space in Master Node

2013-02-15 Thread Arko Provo Mukherjee
Hello Charles, Thanks a lot for your reply and help! Yes, the NN data (image, edit files) is kept separate from the data files. dfs.name.dir=/hadoop/hdfs/name whereas dfs.data.dir=/hadoop/hdfs/data. Their contents match with the description you specified Can I safely go ahead and delete all

RE: Managing space in Master Node

2013-02-15 Thread Charles Baker
Hey Arko. It should be safe to delete then. -Chuck -Original Message- From: Arko Provo Mukherjee [mailto:arkoprovomukher...@gmail.com] Sent: Friday, February 15, 2013 11:56 AM To: hdfs-user@hadoop.apache.org Subject: Re: Managing space in Master Node Hello Charles, Thanks a lot for

Correct way to unzip locally an archive in Yarn

2013-02-15 Thread Sebastiano Vigna
I am trying to upload (using the -archives option of ToolRunner) a .zip archive so that it is unzipped locally. I have tried all possible combinations of command line options, getResource()/DistributedCache.getSmthng() with no luck. I have found a post suggesting that in Yarn this is all

setup hadoop 1.0.4 in debugging mode in ubuntu

2013-02-15 Thread Agarwal, Nikhil
Hello, I am new to Hadoop. I have installed Ubuntu i386 on my VM and want to debug hadoop. Can you please guide me as to which stable release should I download and exactly how should I use ANT debugger to debug hadoop. Also, I would like to know how the MapReduce interacts with other

How to install Oozie 3.3.1 on Hadoop 1.1.1

2013-02-15 Thread anand verma
Hi, I am struggling for many days to install Oozie 3.3.1 on Hadoop 1.1.1. Oozie documentation is very poorly written I am not able to figure it out. While installing I got an error saying it doesn't support Hadoop v1.1.1. Please help me out. -- Regards Ananda Prakash Verma

getimage failed in Name Node Log

2013-02-15 Thread janesh mishra
Hi, I am new in Hadoop and i set the hadoop cluster with the help of Michell Noll Multi-Node setup ( http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/). When i setup the single Node Hadoop then every things works fine. But in Multi Node setup i found that

RE: getimage failed in Name Node Log

2013-02-15 Thread Vijay Thakorlal
Hi Janesh, I think your SNN may be starting up with the wrong IP, I'm sure the machine parameter should say 192.168.0.101? http://namenode:50070/getimage?putimage=1 http://namenode:50070/getimage?putimage=1port=50090machine=0.0.0.0token= -32:1989419481:0:136084943:1360849122845

Re: Correct way to unzip locally an archive in Yarn

2013-02-15 Thread Robert Evans
Are you trying to run a Map/Reduce job or are you writing a new YARN application? If it is a MR job, then it should work mostly the same as before (on 1.x). If you are writing a new YARN application then there is a separate Map in the ContainerLaunchContext that you need to fill in. --Bobby

Re: Hadoop Distcp [ Incomplete HDFS URI ]

2013-02-15 Thread Alejandro Abdelnur
it seems you are having an extra : before the first / in your uris. thx Alejandro (phone typing) On Feb 15, 2013, at 8:22 AM, Dhanasekaran Anbalagan bugcy...@gmail.com wrote: HI Guys, we have two cluster running with CDH4.0.1 I am trying data one cluster to another cluster. It's says

RE: Regarding Hadoop

2013-02-15 Thread Shah, Rahul1
I would suggest you to read Hadoop The Definitive Guide 2nd Edition by Tom white. I too started few weeks back and still learning it. :) hope you like it too From: SrinivasaRao Kongar [mailto:ksrinu...@gmail.com] Sent: Thursday, February 14, 2013 11:38 PM To: user@hadoop.apache.org Subject:

Re: .deflate trouble

2013-02-15 Thread Keith Wiley
I might contact them but we are specifically avoiding EMR for this project. We have already successfully deployed EMR but we want more precise control over the cluster, namely the ability to persist and reawaken it on demand. We really want a direct Hadoop installation instead of an EMR-based

RE: .deflate trouble

2013-02-15 Thread Barr, Jeffrey
Hi Marcos and Keith, Thanks for bringing this to our attention. Saurabh is currently OOF, so I’ll pass this along to the EMR team. Jeff; From: Marcos Ortiz Valmaseda [mailto:mlor...@uci.cu] Sent: Thursday, February 14, 2013 7:10 PM To: user@hadoop.apache.org Cc: Saurabh Baji; Barr, Jeffrey

Re: Sorting huge text files in Hadoop

2013-02-15 Thread Jay Vyas
i don't think you can't do an embarassingly parallel sort of a randomly ordered file without merging results. However, if you know that the file is psudeoordered: 1123 1232 1000 19991019 20200222 30111 3000 Then you can (maybe) sort the individual blocks in mappers using

Re: Regarding Hadoop

2013-02-15 Thread Mohammad Tariq
This book is actually a good way to start with. But I would suggest you to go for the 3rd edition. 2nd edition covers the old API. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, Feb 15, 2013 at 10:26 PM, Shah, Rahul1 rahul1.s...@intel.comwrote: I would suggest

Hadoop 2.0.3 namenode issue

2013-02-15 Thread Dheeren Bebortha
HI, In one of our test clusters that has Namenode HA using QJM+ YARN + HBase 0.94, namenode came down with following logs. I am trying to root cause the issue. Any help is appreciated. = 2013-02-13 10:18:27,521 INFO hdfs.StateChange - BLOCK* NameSystem.fsync: file

Re: Regarding Hadoop

2013-02-15 Thread Chaitanya Nanduri
This link might help : http://www.cloudera.com/content/cloudera/en/developer-community/developer-admin-resources/new-to-hadoop.html#books On Fri, Feb 15, 2013 at 1:17 PM, Mohammad Tariq donta...@gmail.com wrote: This book is actually a good way to start with. But I would suggest you to go

Re: .deflate trouble

2013-02-15 Thread Marcos Ortiz Valmaseda
Yes, I know, Keith. I know that you want more control over your Hadoop cluster, so I recommend you three things: - You can use Whirr to manage your Hadoop clusters installations en EC2 [1] - You can create your own Hadoop-focused AMI based in your requirements (my favorite choice here) - Or

Re: Hadoop 2.0.3 namenode issue

2013-02-15 Thread Harsh J
I don't see a crash log in your snippets. Mind pastebinning the NN crash log up somewhere? Did both NNs go down? In any case, the log below is due to a client attempting to connect with an older HDFS library. This would log such warns (and also indicate the client IP/attempt port as you notice),

Re: Hadoop 2.0.3 namenode issue

2013-02-15 Thread Marcos Ortiz Valmaseda
Regards, Dheeren. It seems that you are using an incompatible version of HDFS with this version of HBase. Can you provide the exact version of your HBase package? - Mensaje original - De: Dheeren Bebortha dbebor...@salesforce.com Para: user@hadoop.apache.org Enviados: Viernes, 15

Re: Sorting huge text files in Hadoop

2013-02-15 Thread Michael Segel
Why not? Who said you had to parallelize anything? On Feb 15, 2013, at 12:09 PM, Jay Vyas jayunit...@gmail.com wrote: i don't think you can't do an embarassingly parallel sort of a randomly ordered file without merging results. However, if you know that the file is psudeoordered:

Re: Sorting huge text files in Hadoop

2013-02-15 Thread Jay Vyas
well.. ok... i guess you could have a 1TB block do an in place sort on the file, write it to a tmp directory, and then spill the records in order or something. at that point might as well not use hadoop.

Re: Sorting huge text files in Hadoop

2013-02-15 Thread Michael Segel
Why do you need a 1TB block? On Feb 15, 2013, at 1:29 PM, Jay Vyas jayunit...@gmail.com wrote: well.. ok... i guess you could have a 1TB block do an in place sort on the file, write it to a tmp directory, and then spill the records in order or something. at that point might as well not

Re: Sorting huge text files in Hadoop

2013-02-15 Thread Jay Vyas
Maybe im mistaken about what is meant by map-only. Does a map-only job still result in standard shuffle-sort ? Or does that get cut short? hmmm i think I see what you mean, i guess a map-only sort is possible as long as you use a custom partitioner and you let the shuffle/sort run to

Re: Sorting huge text files in Hadoop

2013-02-15 Thread Sandy Ryza
A map-only job does not result in the standard shuffle-sort. Map outputs are written directly to HDFS. -Sandy On Fri, Feb 15, 2013 at 12:23 PM, Jay Vyas jayunit...@gmail.com wrote: Maybe im mistaken about what is meant by map-only. Does a map-only job still result in standard shuffle-sort ?

Can anyone point me to a good Map Reduce in memory Join implementation?

2013-02-15 Thread Yunming Zhang
Hi, I am trying to do some work with in memory Join Map Reduce implementation, it can be summarized as a a join between two data set, R and S, one of them is too large to fit into memory, the other one can fit into memory reasonably well, (size of R size of S). The typical implementation

Re: Can anyone point me to a good Map Reduce in memory Join implementation?

2013-02-15 Thread Viral Bajaria
Why not look at HIVE ? It already implements the JOIN that you are looking for and has features to do MAPJOIN i.e. load small file into memory. On Fri, Feb 15, 2013 at 1:25 PM, Yunming Zhang zhangyunming1...@gmail.comwrote: Hi, I am trying to do some work with in memory Join Map Reduce

Re: Sorting huge text files in Hadoop

2013-02-15 Thread Azuryy Yu
This is a typical total sort using map/reduce. it can be done with both map and reduce. On Fri, Feb 15, 2013 at 10:39 PM, Arun Vasu arun...@gmail.com wrote: Hi, Is it possible to sort a huge text file lexicographically using a mapreduce job which has only map tasks and zero reduce tasks?

Re: Can anyone point me to a good Map Reduce in memory Join implementation?

2013-02-15 Thread David Boyd
Use PIG it has specific directives for in memory joins of small data sets. The whole thing might require a half a dozen lines of code. On 2/15/2013 4:25 PM, Yunming Zhang wrote: Hi, I am trying to do some work with in memory Join Map Reduce implementation, it can be summarized as a a join

Re: Can anyone point me to a good Map Reduce in memory Join implementation?

2013-02-15 Thread Prashant Kommireddi
Specifically, replicated join - http://pig.apache.org/docs/r0.10.0/perf.html#replicated-joins On Fri, Feb 15, 2013 at 6:22 PM, David Boyd db...@lorenzresearch.comwrote: Use PIG it has specific directives for in memory joins of small data sets. The whole thing might require a half a dozen

Re: How to install Oozie 3.3.1 on Hadoop 1.1.1

2013-02-15 Thread Jagat Singh
Hi, I can see that in pom.xml the supported hadoop version is hadoop.version1.0.1/hadoop.version You can try to build your self with version you want to see if it works. Also try to ask your question on oozie mailing list. Regards, Jagat Singh On Sat, Feb 16, 2013 at 12:45 PM, Hemanth

why my test result on dfs short circuit read is slower?

2013-02-15 Thread Liu, Raymond
Hi I tried to use short circuit read to improve my hbase cluster MR scan performance. I have the following setting in hdfs-site.xml dfs.client.read.shortcircuit set to true dfs.block.local-path-access.user set to MR job runner. The cluster is 1+4 node

答复: why my test result on dfs short circuit read is slower?

2013-02-15 Thread 谢良
Hi Raymond, did you enable security feature in your cluster? there'll be no obvious benefit be found if so. Regards, Liang ___ 发件人: Liu, Raymond [raymond@intel.com] 发送时间: 2013年2月16日 11:10 收件人: user@hadoop.apache.org 主题: why my test result on dfs short

Re: Need Information on Hadoop Cluster Set up

2013-02-15 Thread pabbathi venki
HELLO GOOD MORNING HADOOP KINGS On Thu, Feb 14, 2013 at 1:25 AM, Yusaku Sako yus...@hortonworks.com wrote: Hello Seema, Yes, you can use Apache Ambari to set up and manage a single node cluster. Yusaku On Wed, Feb 13, 2013 at 11:48 AM, Hadoop seemami...@gmail.com wrote: Hi All, Good to

RE: why my test result on dfs short circuit read is slower?

2013-02-15 Thread Liu, Raymond
Hi Liang Did you mean set dfs.permissions to false? Is that all I need to do to disable security feature? Cause It seems to me that without change dfs.block.local-path-access.user, dfs.permissions alone doesn't works. HBASE still fall back to go through datanode to read data. Hi Raymond,

答复: why my test result on dfs short circuit read is slower?

2013-02-15 Thread 谢良
I'm not very clear about your senario, just a kindly reminder: If security is on, the feature can be used only for user that has kerberos credentials at the client, therefore map reduce tasks cannot benefit from it in general, see HDFS-2246's release note for more info If you didn't enable

Re: Hadoop-2.0.3 HA and federation configure

2013-02-15 Thread Azuryy Yu
Sorry all, I maybe get the answer because I didn't read doc carefully. clustera is for n1, n2 clusterb is for n3,n4 so the following configuration would be answer my question, does that okay? property namedfs.nameservices/name valueclustera, clusterb/value /property property

Re: why my test result on dfs short circuit read is slower?

2013-02-15 Thread Harsh J
If you want HBase to leverage the shortcircuit, the DN config dfs.block.local-path-access.user should be set to the user running HBase (i.e. hbase, for example), and the hbase-site.xml should have dfs.client.read.shortcircuit defined in all its RegionServers. Doing this wrong could result in

RE: why my test result on dfs short circuit read is slower?

2013-02-15 Thread Liu, Raymond
Hi Harsh Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml. And I have double confirmed that local reads are performed, since there are no Error in datanode logs, and by watching lo network IO. If you want HBase to leverage the shortcircuit, the DN config

RE: why my test result on dfs short circuit read is slower?

2013-02-15 Thread Liu, Raymond
Hi Arpit Gupta Yes, this way also confirms that short circuit read is enabled on my cluster. 13/02/16 14:07:34 DEBUG hdfs.DFSClient: Short circuit read is true 13/02/16 14:07:34 DEBUG hdfs.DFSClient: New BlockReaderLocal for file

RE: why my test result on dfs short circuit read is slower?

2013-02-15 Thread Liu, Raymond
It seems to me that, with short circuit read enabled, the BlockReaderLocal read data in 512/4096 bytes unit(checksum check enabled/skiped) While when It go through datanode, the BlockSender.sendChunks will read and sent data in 64K bytes units? Is that true? And if so, won't it explain that