New Hadoop Eclipse functionality

2013-06-11 Thread Srimanth Gunturi
Hello, I wanted to invite all users and developers to try out the new Hadoop functionality in Eclipse. Initial features include a HDFS Explorer (built on top of Eclipse File System), and a ZooKeeper explorer. The project can be accessed at http://people.apache.org/~srimanth/hadoop-eclipse/. Please

Re: New Hadoop Eclipse functionality

2013-06-11 Thread Harsh J
Hi Srimanth, This is great, I just went over the pages - many thanks for sharing! Does this support hadoop-2 as well? Note though that the Apache Incubator also hosts a project whose goal is similar to what you're targeting here: http://hdt.incubator.apache.org. I'd urge you to help improve the e

Re: New Hadoop Eclipse functionality

2013-06-11 Thread maisnam ns
Hi Srimanth, Will try it on my system, anyways thanks for sharing it, much needed. Regards Niranjan Singh On Tue, Jun 11, 2013 at 1:06 PM, Harsh J wrote: > Hi Srimanth, > > This is great, I just went over the pages - many thanks for sharing! > Does this support hadoop-2 as well? > > Note thou

Application Master getting started very late

2013-06-11 Thread Krishna Kishore Bonagiri
Hi, I have been using YARN since quite sometime, and recently moved to release 2.0.4. I recently started running huge number of Application Masters (applications) one after another, and observed that sometimes in that sequence the Application Master takes around 1 minute or little more than that

Re: HDFS to a different location other than HADOOP HOME

2013-06-11 Thread Mohammad Mustaqeem
You can mentioned the "hadoop.tmp.dir" property in the core-site.xml as - hadoop.tmp.dir /app/software/app1 "/app/software/app1" will be used as location for storing the hdfs files. On Tue, Jun 11, 2013 at 7:52 PM, Raj Hadoop wrote: > Hi, > > I have a one node Hadoop cluster for my POC. The

HDFS to a different location other than HADOOP HOME

2013-06-11 Thread Raj Hadoop
Hi,   I have a one node Hadoop cluster for my POC. The HADOOP_HOME is under a directory   /usr/home/hadoop   I dont have much space on /usr and want to use other accounts/applications storage which are located at /app/software/app1 etc.,   How can I use the /app/software/app1 location for my HDFS

Re: HDFS to a different location other than HADOOP HOME

2013-06-11 Thread Raj Hadoop
  Thanks Mustaqeem.   hadoop.tmp.dir - Can this store HDFS files? I mean - is there any difference in files from the one we create under HADOOP_HOME.     From: Mohammad Mustaqeem <3m.mustaq...@gmail.com> To: user ; Raj Hadoop Sent: Tuesday, June 11, 2013 10:28

replace separator in output.collect?

2013-06-11 Thread Pedro Sá da Costa
the "output.collect(key, value)" puts the key and the value separated by \t. Is there a way to replace it by ':'? -- Best regards,

Re: New Hadoop Eclipse functionality

2013-06-11 Thread Srimanth Gunturi
Hi Harsh, Thank you for pointing to HDT incubator project. I was not aware of its existence. I am working on a WebHDFS adapter which will begin supporting Hadoop-2 HDFS. I hope to provide that capability soon. Regards, Srimanth On Tue, Jun 11, 2013 at 12:36 AM, Harsh J wrote: > Hi Srimanth,

Two Datanodes - Incompatible Cluster IDs

2013-06-11 Thread Michael Namaiandeh
I am trying to setup a 4 node Cloudera Hadoop cluster. However; two of my data nodes are showing up as "dead" nodes. After looking at the logs, I found that the dead nodes have a different ClusterID than my working/"live" nodes. How do I configure the dead nodes with the correct ClusterID. I ca

Re: Two Datanodes - Incompatible Cluster IDs

2013-06-11 Thread Ian Wrigley
If you just blow away the contents of the dfs.data.dir directories then start the DataNode daemon again you should be fine. (That's where the ClusterID is stored. When they next connect to the NameNode, because they don't have one now that you've deleted it, they'll be given the one from the Nam

RE: Two Datanodes - Incompatible Cluster IDs

2013-06-11 Thread Michael Namaiandeh
Brilliant, Ian! That worked. Now, all my nodes are showing up as live. From: Ian Wrigley [mailto:i...@cloudera.com] Sent: Tuesday, June 11, 2013 11:43 AM To: user@hadoop.apache.org Subject: Re: Two Datanodes - Incompatible Cluster IDs If you just blow away the contents of the dfs.data.dir direc

Shuffle design: optimization tradeoffs

2013-06-11 Thread John Lilley
I am curious about the tradeoffs that drove design of the partition/sort/shuffle (Elephant book p 208). Doubtless this has been tuned and measured and retuned, but I'd like to know what observations came about during the iterative optimization process to drive the final design. For example:

Slice MapWritable on Map

2013-06-11 Thread Darren Lee
Hello, I am working on a hadoop based solr indexing system. The reason we are using hadoop is because we need to prepare the data (compute values and add them to the solr documents). For a full index I am reading in the records and outputting a MapWritable with all the fields I want to index.

Re: HDFS to a different location other than HADOOP HOME

2013-06-11 Thread Mohammad Tariq
Hello Raj, Although the way which Mustaqeem has told is correct, there is more appropriate way to do this. Add the "dfs.data.dir" property in your hdfs-site.xml and give the "path you want to use for HDFS data storage" as its value. The directory pointed by hadoop.tmp.dir also contains the met

read lucene index in mapper

2013-06-11 Thread parnab kumar
Hi , I need to read an existing lucene index in a map.can someone point me to the right direction. Thanks, Parnab

Re: HDFS to a different location other than HADOOP HOME

2013-06-11 Thread Raj Hadoop
Hi Tariq,   What is the default value of dfs.data.dir? My hdfs-site.xml doesnt have this value defined. So what is the default value?   Thanks, Raj From: Mohammad Tariq To: "user@hadoop.apache.org" ; Raj Hadoop Sent: Tuesday, June 11, 2013 12:49 PM Subject: R

Re: HDFS to a different location other than HADOOP HOME

2013-06-11 Thread Pramod N
Hi Raj, Extract from source suggests the following dfs.data.dir ${hadoop.tmp.dir}/dfs/data Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically

Re: HDFS to a different location other than HADOOP HOME

2013-06-11 Thread Shahab Yunus
http://hadoop.apache.org/docs/stable/hdfs-default.html dfs.data.dir${hadoop.tmp.dir}/dfs/dataDetermines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different

Re: HDFS to a different location other than HADOOP HOME

2013-06-11 Thread Mohammad Tariq
Thank you Pramod and Shahab. Warm Regards, Tariq cloudfront.blogspot.com On Tue, Jun 11, 2013 at 11:03 PM, Shahab Yunus wrote: > http://hadoop.apache.org/docs/stable/hdfs-default.html > dfs.data.dir ${hadoop.tmp.dir}/dfs/dataDetermines where on the local > filesystem an DFS data node should sto

YARN Container's App ID

2013-06-11 Thread Brian C. Huffman
How can a YARN container get its own Application ID? I tried getting the ApplicationConstants.AM_APP_ATTEMPT_ID_ENV similar to how the Distributed Shell example does for the AppMaster, but that variable doesn't seem to exist in the environment for the container. Does the App Master have to se

Re: YARN Container's App ID

2013-06-11 Thread Hitesh Shah
Hello Brian, org.apache.hadoop.yarn.api.ApplicationConstants.Environment should have a list of all the information set in the environment. One of these is the container ID. ApplicationAttemptID can be obtained from a container ID object which in turn can be used to get the App Id. -- Hitesh

Re: YARN Container's App ID

2013-06-11 Thread Brian C. Huffman
Hitesh, Is this only in trunk? I'm currently running 2.0.3-alpha and I don't see it there. I also don't see it in the latest 2.0.5. Thanks, Brian On 06/11/2013 02:54 PM, Hitesh Shah wrote: Hello Brian, org.apache.hadoop.yarn.api.ApplicationConstants.Environment should have a list of all

Re: SSD support in HDFS

2013-06-11 Thread Chris Nauroth
Hi Lucas, HDFS does not have this capability right now, but there has been some preliminary discussion around adding features to support it. You might want to follow jira issues HDFS-2832 and HDFS-4672 if you'd like to receive notifications about the discussion. https://issues.apache.org/jira/br

Re: YARN Container's App ID

2013-06-11 Thread Hitesh Shah
Yes - this is currently on trunk. There were some changes done to make the container id available to all containers ( and not only the AM ). However, for 2.0.3-alpha, I believe you should have access to ApplicationConstants.AM_CONTAINER_ID_ENV in the AM's environment. -- Hitesh On Jun 11, 201

Reading multiple files of a directory using a Single LOAD Command in PIG

2013-06-11 Thread Mix Nin
I have a directory "Output2. It has file names as below - _SUCCESS part-m-0 part-m-1 part-m-2 part-m-3 . . . . part-m-00100 - The above files are produced by PIG output STORE command . I want to read the files starting with "part-m-" using PIG comm

Re: Shuffle design: optimization tradeoffs

2013-06-11 Thread Albert Chu
On Tue, 2013-06-11 at 16:00 +, John Lilley wrote: > I am curious about the tradeoffs that drove design of the > partition/sort/shuffle (Elephant book p 208). Doubtless this has been > tuned and measured and retuned, but I’d like to know what observations > came about during the iterative optim

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

2013-06-11 Thread Prashant Kommireddi
What is the error? The LoadFunc should be ignoring any filenames that begin with "_" or a period "." If you are trying to skip the _SUCCESS file, the loader you are using (PigStorage) already handles that. Also, can you double check your path is not "/Output/part-m* as opposed to backward slashes

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

2013-06-11 Thread Mix Nin
Hi, My mistake, I gave backward slashes and so was getting error. I gave forward slashes and it is working fine. Good to know that LOAD ignores filenames that begin with "_" or a period ".". So , in that case can I directly give LOAD /Output/* instead of LOAD /Output/part-m*? Thanks On Tu

Re: read lucene index in mapper

2013-06-11 Thread Azuryy Yu
you need to add lucene index tar.gz in the distributed cache as archive, then create index reader in the mapper's setup. --Send from my Sony mobile. On Jun 12, 2013 12:50 AM, "parnab kumar" wrote: > Hi , > > I need to read an existing lucene index in a map.can someone point > me to the r

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

2013-06-11 Thread Harsh J
Yes, you can do that - it will still apply the filter to the globbed results. On Wed, Jun 12, 2013 at 3:45 AM, Mix Nin wrote: > Hi, > > My mistake, I gave backward slashes and so was getting error. I gave > forward slashes and it is working fine. > > Good to know that LOAD ignores filenames that

Now give .gz file as input to the MAP

2013-06-11 Thread samir das mohapatra
Hi All, Did any one worked on, how to pass the .gz file as file input for mapreduce job ? Regards, samir.

Re: Now give .gz file as input to the MAP

2013-06-11 Thread Sanjay Subramanian
hadoopConf.set("mapreduce.job.inputformat.class", "com.wizecommerce.utils.mapred.TextInputFormat"); hadoopConf.set("mapreduce.job.outputformat.class", "com.wizecommerce.utils.mapred.TextOutputFormat"); No special settings required for reading Gzip except these above I u want to output Gzip h

Re: Now give .gz file as input to the MAP

2013-06-11 Thread Rahul Bhattacharjee
Nothing special is required for process .gz files using MR. however , as Sanjay mentioned , verify the codec's configured in core-site and another thing to note is that these files are not splittable. You might want to use bz2 , these are splittable. Thanks, Rahul On Wed, Jun 12, 2013 at 10:14

Re: New Hadoop Eclipse functionality

2013-06-11 Thread Henry Junyoung Kim
Can anybody download the plugin from the update-site? the body from the url is empty. (people.apache.org/~srimanth/hadoop-eclipse/update-site/) Srimanth Gunturi 2013. 6. 11., 오후 4:27, Srimanth Gunturi 작성: > Hello, > I wanted to invite all users and developers to try out the new Hadoop

RE: HDFS to a different location other than HADOOP HOME

2013-06-11 Thread Sandeep L
Instead of using ${hadoop.tmp.dir}/dfs/data you give multiple directories with absolute paths.Just try as follows:hadoop.tmp.dir/home1/user1/dir1,/home2/user2/dir2This will allow HDFS to write data into all specified directories. From: donta...@gmail.com Date: Tue, 11 Jun 2013 23:14:08 +0530 Sub

Re: New Hadoop Eclipse functionality

2013-06-11 Thread Srimanth Gunturi
Hi Henry, The URL is supposed to be entered into Eclipse's dialogs for installing/updating new software as mentioned in the Download section. Eclipse automatically finds the various files it needs relative to that URL. This is documented in Eclipse help sections ( http://help.eclipse.org/juno/index

RE: HDFS to a different location other than HADOOP HOME

2013-06-11 Thread Sandeep L
Instead of using ${hadoop.tmp.dir}/dfs/data you give multiple directories with absolute paths.Just try as follows:hadoop.data.dir/home1/user1/dir1,/home2/user2/dir2This will allow HDFS to write data into all specified directories. From: sandeepvre...@outlook.com To: user@hadoop.apache.org Subjec

About hadoop-2.0.5 release

2013-06-11 Thread Ramya S
Hi, When will be the release of stable version of hadoop-2.0.5-alpha? Thanks & Regards, Ramya.S