hadoop queue -list

2013-05-22 Thread Pedro Sá da Costa
1 - I am looking to the queue list in my system, and I have several queues defined. And, in one of the queue I have this info: Scheduling Info : Capacity: 1.0, MaximumCapacity: 1.0, CurrentCapacity: 77.534035 Why the current capacity is much bigger than the maximum capacity? 2 - With the queue

Mapreduce queues

2013-05-22 Thread Pedro Sá da Costa
Hi, When a cluster has several queues, the JobTracker has to manage all clusters? -- Best regards,

AUTO: Prabhat Pandey is out of the office (returning 06/10/2013)

2013-05-22 Thread Prabhat Pandey
I am out of the office until 06/10/2013. I am out of the office until 06/10/2013. For any issues please contact Dispatcher:dispatcherdb...@us.ibm.com Thanks. Prabhat Pandey Note: This is an automated response to your message Changing the maximum tasks per node on a per job basis sent on

Re: Hadoop Development on cloud in a secure and economical way.

2013-05-22 Thread Sai Sai
Is it possible to do Hadoop development on cloud in a secure and economical way without worrying about our source being taken away. We would like to have Hadoop and eclipse installed on a vm in cloud and our developers will log into the cloud on a daily basis and work on the cloud. Like this

Re: Hadoop Development on cloud in a secure and economical way.

2013-05-22 Thread Rahul Bhattacharjee
Amazon elastic cloud computer. Pay per use Thanks, Rahul On Wed, May 22, 2013 at 11:41 AM, Sai Sai saigr...@yahoo.in wrote: Is it possible to do Hadoop development on cloud in a secure and economical way without worrying about our source being taken away. We would like to have Hadoop and

Re: Hadoop Development on cloud in a secure and economical way.

2013-05-22 Thread Ellis Miller
Configure private cloud: install VMWare / VirtualBox / KVM on internal server / cluster and levearge either Cloudera Hadoop (free version) or Hortonworks (Yahoo introduced Hortonworks and where Cloudera is exceptional but proprietary Hortonworks requires some configuration and tuning of Hadoop in

Get Hadoop update

2013-05-22 Thread Vimal Jain
Hi, I would like to receive Hadoop notifications. -- Thanks and Regards, Vimal Jain

How is sharing done in HDFS ?

2013-05-22 Thread Agarwal, Nikhil
Hi, Can anyone guide me to some pointers or explain how HDFS shares the information put in the temporary directories (hadoop.tmp.dir, mapred.tmp.dir, etc.) to all other nodes? I suppose that during execution of a MapReduce job, the JobTracker prepares a file called jobtoken and puts it in the

Re: How is sharing done in HDFS ?

2013-05-22 Thread Kun Ling
Hi, Agarwal, Hadoop just put the jobtoken, _partitionlst, and some other files that needed to share in a directory located in hdfs://namenode:port/tmp//. And all the TaskTracker will access these files from the shared tmp directory, just like the way they share the input file in the

Re: How is sharing done in HDFS ?

2013-05-22 Thread Harsh J
The job-specific files, placed by the client, are downloaded individually by every tasktracker from the HDFS (The process is called localization of the task before it starts up) and then used. On Wed, May 22, 2013 at 1:59 PM, Agarwal, Nikhil nikhil.agar...@netapp.comwrote: Hi, ** **

Auto created 'target' folder?

2013-05-22 Thread Taco Jan Osinga
Hi all, Quite a newby here. I'm creating an application for internal use, which deploys several demo-sites of our HBase application. These demos should contain a blank state (fixtures) with some data. Therefor I have created export files which needed to be imported (using the MapReduce-way of

Re: ETL Tools

2013-05-22 Thread Lenin Raj
We have used Pentaho in our projects.. it meets all your conditions. It can connect to hadoop. Good community support too. -- Lenin. Sent from my Android. On May 22, 2013 2:19 AM, Aji Janis aji1...@gmail.com wrote: Thanks for the suggestion. What about Clover or Talend? Have any of you tried

Rack-awareness in Hadoop-2.0.3-alpha

2013-05-22 Thread Mohammad Mustaqeem
Is Hadoop-2.0.3-alpha does not support Rack-awarness? I am trying to make Hadoop cluster Rack-Aware for a week but I haven't succeed. What I am doing. I am adding following property in etc/hadoop/core-site.xml : property namenet.topology.script.file.name/name

RE: Shuffle phase replication factor

2013-05-22 Thread John Lilley
Oh I see. Does this mean there is another service and TCP listen port for this purpose? Thanks for your indulgence... I would really like to read more about this without bothering the group but not sure where to start to learn these internals other than the code. john From: Kai Voigt

YARN in 2.0 and 0.23

2013-05-22 Thread John Lilley
We intend to use the YARN APIs fairly soon. Are there notable differences in YARNs classes, interfaces or semantics between 0.23 and 2.0? It seems to be supported on both versions. Thanks, John

RE: Shuffle phase replication factor

2013-05-22 Thread John Lilley
This brings up another nagging question I've had for some time. Between HDFS and shuffle, there seems to be the potential for every node connecting to every other node via TCP. Are there explicit mechanisms in place to manage or limit simultaneous connections? Is the protocol simply robust

Re: Shuffle phase replication factor

2013-05-22 Thread Rahul Bhattacharjee
There are properties/configuration to control the no. of copying threads for copy. tasktracker.http.threads=40 Thanks, Rahul On Wed, May 22, 2013 at 8:16 PM, John Lilley john.lil...@redpoint.netwrote: This brings up another nagging question I’ve had for some time. Between HDFS and shuffle,

RE: Shuffle phase replication factor

2013-05-22 Thread John Lilley
U, is that also the limit for the number of simultaneous connections? In general, one does not need a 1:1 map between threads and connections. If this is the connection limit, does it imply that the client or server side aggressively disconnects after a transfer? What happens to the

RE: Shuffle phase

2013-05-22 Thread John Lilley
I was reading the elephant book trying to understand which process actually serves up the HTTP transfer on the mapper side. Is it the each map task? Or is there some persistent task on each worker that serves up mapper output for all map tasks? Thanks, John From: Kai Voigt

RE: Viewing snappy compressed files

2013-05-22 Thread Robert Rapplean
Thanks! This shortcuts my current process considerably, and should take the pressure off for the short term. I'd still like to be able to analyze the data in a python script without having to make a local copy, but that can wait. Best, Robert Rapplean Senior Software Engineer 303-872-2256

Re: Rack-awareness in Hadoop-2.0.3-alpha

2013-05-22 Thread Chris Nauroth
common-dev and hdfs-dev removed/bcc'd Hi Mohammad, Rack awareness is supported in 2.0.3-alpha. The only potential problem I see in your configuration is that topology.sh contains a definition for HADOOP_CONF that points back at your hadoop-0.22.0/conf directory. If that directory doesn't

Re: Auto created 'target' folder?

2013-05-22 Thread Chris Nauroth
Can you provide additional information about the exact commands that you are trying to run? target/test-dir is something that gets created during the Hadoop codebase's Maven build process. Are you running Maven commands? If so, are you running Maven commands as a user different from tomcat7?

Re: Get Hadoop update

2013-05-22 Thread Chris Nauroth
Hi Vimal, Full information on how to subscribe and unsubscribe from the various lists is here: http://hadoop.apache.org/mailing_lists.html Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, May 22, 2013 at 1:01 AM, Vimal Jain vkj...@gmail.com wrote: Hi, I would like to receive

Re: Please, Un-subscribe!

2013-05-22 Thread Chris Nauroth
Hi Simone, Please see this wiki page for full information on how to subscribe or unsubscribe from the various mailing lists: http://hadoop.apache.org/mailing_lists.html Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, May 22, 2013 at 7:49 AM, Simone Martinelli

Re: Rack-awareness in Hadoop-2.0.3-alpha

2013-05-22 Thread Patai Sangbutsarakum
I believe that his topology.sh and .data files are already correct. bash topology.sh 172.31.13.133 mustaqeem-1 mustaqeem-4 /default/rack /rack2 /rack3 output looks exactly the same as mine. Mohammad, 1. did you restart namenode after you modified the configuration? in 0.20, restart namenode is

Hive tmp logs

2013-05-22 Thread Raj Hadoop
Hi,   My hive job logs are being written to /tmp/hadoop directory. I want to change it to a different location i.e. a sub directory somehere under the 'hadoop' user home directory. How do I change it.   Thanks, Ra

Eclipse plugin

2013-05-22 Thread Bharati
Hi, I am trying to get or build eclipse plugin for 1.2.0 All the methods I found on the web did not work for me. Any tutorial, methods to build the plugin will help. I need to build a hadoop map reduce project and be able to debug in eclipse. Thanks, Bharati Sent from my iPadFortigate

Sqoop Import Oracle Error - Attempted to generate class with no columns!

2013-05-22 Thread Raj Hadoop
Hi,   I just finished setting up Apache sqoop 1.4.3. I am trying to test basic sqoop import on Oracle.   sqoop import --connect jdbc:oracle:thin:@//intelli.dmn.com:1521/DBT --table usr1.testonetwo --username usr123 --password passwd123     I am getting the error as 13/05/22 17:18:16 INFO

Re: Eclipse plugin

2013-05-22 Thread Jing Zhao
Hi Bharati, Usually you only need to run ant clean jar jar-test and ant eclipse on your code base, and then import the project into your eclipse. Can you provide some more detailed description about the problem you met? Thanks, -Jing On Wed, May 22, 2013 at 2:25 PM, Bharati

Re: YARN in 2.0 and 0.23

2013-05-22 Thread Arun C Murthy
I'd use the 2.0 APIs, they are days away from getting frozen and will be supported compatibly for foreseeable future. Details to track here: https://issues.apache.org/jira/browse/YARN-386 hth, Arun On May 22, 2013, at 7:38 AM, John Lilley wrote: We intend to use the YARN APIs fairly soon.

Re: Eclipse plugin

2013-05-22 Thread Bharati
Hi Jing, I want to be able to open a project as map reduce project in eclipse instead of java project as per some of the videos on youtube. For now let us say I want to write a wordcount program and step through it with hadoop 1.2.0 How can I use eclipse to rewrite the code. The goal here

Re: Eclipse plugin

2013-05-22 Thread Sanjay Subramanian
Hi I don't use any need any special plugin to walk thru the code All my map reduce jobs have a JobMapper.java JobReducer.java JobProcessor.java (set any configs u like) I create a new maven project in eclipse (easier to manage dependencies) ….the elements are in the order as they should

Re: Eclipse plugin

2013-05-22 Thread Sanjay Subramanian
Forgot to add, if u run Windows and Eclipse and want to do Hadoop u have to setup Cygwin and add $CYGWIN_PATH/bin to PATH Good Luck Sanjay From: Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com Reply-To:

Re: Eclipse plugin

2013-05-22 Thread Bharati Adkar
Hi, I am using a mac. I have not used maven before, I am new to hadoop and eclipse. Any directions to start a project as map reuce as per all the videos on youtube. Thanks, Bharati On May 22, 2013, at 4:23 PM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com wrote: Hi I don't

Re: Sqoop Import Oracle Error - Attempted to generate class with no columns!

2013-05-22 Thread Venkat Ranganathan
Resending, as the last one bounced. You need to specify username and tablename in uppercase, otherwise the job will fail Thanks Venkat On Wed, May 22, 2013 at 4:19 PM, Venkat venkat...@gmail.com wrote: You need to specify username and tablename in uppercase Venkat On Wed, May 22, 2013

RE: YARN in 2.0 and 0.23

2013-05-22 Thread John Lilley
We don't necessarily have the freedom to choose; we are an application provider and we desire compatibility for as many versions as possible so as to fit into existing Hadoop installations. For example, we currently read and write HDFS for 0.23, 1.0, 1.1, and 2.0. Given that, am I trying to

Re: Shuffle phase replication factor

2013-05-22 Thread Kun Ling
Hi John, 1. for the number of simultaneous connection limitations. You can configure this using the mapred.reduce.parallel.copies flag. the default is 5. 2. For the aggressively disconnect implication, I am afraid it is only a little. Normally, each reducer will connect to each mapper