How do I get started with hadoop

2014-04-24 Thread ????
Hi, I'm new in hadoop, can I get some useful links about hadoop so I can get started with it step by step. Thank you very much!

How do I get started with hadoop on windows system

2014-04-24 Thread ????
Hi everyone, I have subscribed hadoop mail list this morning. How do I get started with hadoop on my windows 7 PC. Thanks!

Re: Differences between HistoryServer and Yarn TimeLine server?

2014-04-24 Thread Ashwin Shankar
Thanks Zhijie ! I had few more questions : 1. I played around with the timeline server ui today which showed the generic application history details, but I couldn't find any page for application specific data. Is the expectation that every application needs to build their own UI using the exposed

Re: Client usage with multiple clusters

2014-04-24 Thread Stanley Shi
My guess is to put two set of this dfs.ha.namenodes.clusterA=nn1,nn2 dfs.namenode.rpc-address.clusterA.nn1= dfs.namenode.http-address.clusterA.nn1= dfs.namenode.rpc-address.clusterA.nn2= dfs.namenode.http-address.clusterA.nn2= to the client setting, and then access it like hdfs://clusterA/tmp ...

Re: Sqoop import/export tool fails with PriviledgedActionException

2014-04-24 Thread Sergey Murylev
Hi Kuchekar, > I do have the mentioned jar (avro-mapred-1.5.3.jar) in the mentioned > location. Not sure, what I am missing. Make sure that you can read this file as same user as you use to run sqoop. According to logs you run sqoop as root. I not sure that root has such privileges. You can try to

Re: configure HBase

2014-04-24 Thread Azuryy Yu
That's what I want, Thanks Harsh. On Thu, Apr 24, 2014 at 11:32 PM, Harsh J wrote: > A better JIRA to read would be > https://issues.apache.org/jira/browse/HBASE-4391 instead. Also read up > on the mlock call which this basically invokes: > http://linux.die.net/man/2/mlock > > On Thu, Apr 24, 2

Question about YARN security tokens

2014-04-24 Thread Robert Chu
Hi Everyone, I'm new to YARN and was trying to write a simple YARN application that starts up an ApplicationMaster that starts up an in-process Jetty server just a simple test (this application requests no further resource containers). An attempt is made to register as an application with the Reso

Re: running YARN in Production

2014-04-24 Thread Alexander Alten-Lorenz
Matt, Apache Yarn is quite stable, and works in production well - so far. Additional, yarn is backward compatible, so MRv1 jobs running too. For vendor specific questions contact your distribution vendor. - Alex Sent from my iPad > On 24 Apr 2014, at 22:17, Matt K wrote: > > We run a numb

Re: Using Eclipse for Hadoop code

2014-04-24 Thread Maisnam Ns
This link should help you get started with eclipse http://wiki.apache.org/hadoop/EclipseEnvironment On Fri, Apr 25, 2014 at 9:28 AM, All In A Days Work wrote: > How does one sets Eclipse to browse/compile Hadoop code? > > Cheers, >

Using Eclipse for Hadoop code

2014-04-24 Thread All In A Days Work
How does one sets Eclipse to browse/compile Hadoop code? Cheers,

Re: hadoop+python+text mining

2014-04-24 Thread Peyman Mohajerian
At the high level I think you have these choices and more: 1) Hadoop Streaming, leverage some of your python could, but not all b/c you have to deal with map/reduce. 2) Use Mahout. 3) Use a distro of R that works with Hadoop .. On Thu, Apr 24, 2014 at 1:58 PM, qiaoresearcher wrote: > I have Hado

Re: running YARN in Production

2014-04-24 Thread Matt K
Allow me to remove "Cloudera" and CDH5 from the question then, and replace it with "Hadoop 2.3.0". On Thu, Apr 24, 2014 at 5:49 PM, Marco Shaw wrote: > A bit too many mentions of "Cloudera" for this list... Please consider > going to a Cloudera list and asking this there asking for specific ex

Re: Yarn hangs @Scheduled

2014-04-24 Thread Jay Vyas
I fixed the issue by setting yarn.scheduler.minimum-allocation-mb=1024 I'm thinking this happens a lot in VMs where you run w low memory. If memory too low, I think other failures will occur at runtime when you start daemons or tasks...If too high, then the tasks will hang... > On Apr 24, 201

Re: Yarn hangs @Scheduled

2014-04-24 Thread Vinod Kumar Vavilapalli
How much memory do you see as available on the RM web page? And what are the memory requirements for this app? And this is a MR job? +Vinod Hortonworks Inc. http://hortonworks.com/ On Thu, Apr 24, 2014 at 1:23 PM, Jay Vyas wrote: > Hi folks : My yarn jobs seem to be hanging in the "SHEDULED"

Re: running YARN in Production

2014-04-24 Thread Marco Shaw
A bit too many mentions of "Cloudera" for this list... Please consider going to a Cloudera list and asking this there asking for specific examples. Marco On Thu, Apr 24, 2014 at 5:17 PM, Matt K wrote: > We run a number of mission-critical MapReduce jobs daily in our production > cluster, most

Yarn hangs @Scheduled

2014-04-24 Thread Jay Vyas
Hi folks : My yarn jobs seem to be hanging in the "SHEDULED" state. I've restarted my nodemanager a few times , but no luck. What are the possible reasons that YARN job submission hangs ? I know one is resource availability, but this is a fresh cluster on a VM with only one job, one NM, and one

running YARN in Production

2014-04-24 Thread Matt K
We run a number of mission-critical MapReduce jobs daily in our production cluster, mostly on top of HBase. In the past, we've hit a number of Hadoop bugs, and found it difficult to maintain a solid SLA. We are now moving to CDH5 and evaluating if we should move to YARN or keep running Hadoop 1. Y

hadoop+python+text mining

2014-04-24 Thread qiaoresearcher
I have Hadoop and python installed with nltk. Now I have an large input file which has three columns: column 1 | column 2 | column 3 positive id1 some tweet message negative id2 other tweet message positive id3 tweet message negative id4

Sqoop import/export tool fails with PriviledgedActionException

2014-04-24 Thread Kuchekar
Hi, I have a Hadoop 2 Standalone mode installed on my local along with sqoop 1.4.4. When I try to run the import/export sqoop commands on my local, I get following error : 14/04/24 13:03:33 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.F

Predictive analytics using hadoop

2014-04-24 Thread Shashidhar Rao
Hi All, Can anyone provide me link to full sample of predictive analytics using HADOOP any domain Telecom,health or retail. Links should be fine . Regards Shashi

Re: map execute twice

2014-04-24 Thread Vinod Kumar Vavilapalli
This can happen when maps are marked as failed *after* they have successfully completed the map operation. One common reason when this can happen is reducers faiingl to fetch the map-outputs due to the node that ran the mapper going down, the machine freezing up etc. +Vinod Hortonworks Inc. http:/

Re: What codes to chmod 755 to "yarn.nodemanager.log-dirs"?

2014-04-24 Thread Vinod Kumar Vavilapalli
Which version of Hadoop are you using? This part of code changed a little, so asking. Also, is this in secure or non-secure mode (DefaultContainerExecutor vs LinuxContainerExecutor)? Either of those two classes do some more permission magic and you may be running into those. +Vinod Hortonworks In

What codes to chmod 755 to "yarn.nodemanager.log-dirs"?

2014-04-24 Thread sam liu
Hi Experts, When the nodemanager log-dirs not exists, I think LocalDirsHandlerService#serviceInit will invoke DirectoryCollection#createDir to create the log dirs, and chmod 755 to it. However, when nodemanager log-dirs already exists and with a non 755 permission(like 775), I found its permissio

Re: configure HBase

2014-04-24 Thread Harsh J
A better JIRA to read would be https://issues.apache.org/jira/browse/HBASE-4391 instead. Also read up on the mlock call which this basically invokes: http://linux.die.net/man/2/mlock On Thu, Apr 24, 2014 at 5:46 PM, Ted Yu wrote: > Please take a look at https://issues.apache.org/jira/browse/HBASE

Re: HBase checksum vs HDFS checksum

2014-04-24 Thread Ted Yu
Please take a look at the following: http://hbase.apache.org/book.html#perf.hdfs.configs.localread http://hbase.apache.org/book.html#hbase.regionserver.checksum.verify On Thu, Apr 24, 2014 at 5:55 AM, Krishna Rao wrote: > Hi all, > > I understand that there is a significant improvement gain wh

HBase checksum vs HDFS checksum

2014-04-24 Thread Krishna Rao
Hi all, I understand that there is a significant improvement gain when turning on short circuit reads, and additionally by setting HBase to do checksums rather than HDFS. However, I'm a little confused by this, do I need to turn of checksum within HDFS for the entire file system? We don't just us

Re: Reuse of YARN container

2014-04-24 Thread Tsuyoshi OZAWA
Hi, As Oleg mentioned, container-reuse is done at ApplicationMaster level currently. Tez's ApplicationMaster is one of them. Thanks, - Tsuyoshi On Thu, Apr 24, 2014 at 2:03 AM, Oleg Zhurakousky wrote: > While YARN-373 addresses a bit of a different problem the use case of reuse > of existing re

Re: configure HBase

2014-04-24 Thread Ted Yu
Please take a look at https://issues.apache.org/jira/browse/HBASE-6567 Cheers On Apr 24, 2014, at 3:33 AM, Azuryy Yu wrote: > Hi, > > what's mean of HBASE_REGIONSERVER_MLOCK? I cannot find the doucment for it. > > there is only "Uncomment and adjust to keep all the Region Server pages > mapp

configure HBase

2014-04-24 Thread Azuryy Yu
Hi, what's mean of HBASE_REGIONSERVER_MLOCK? I cannot find the doucment for it. there is only "Uncomment and adjust to keep all the Region Server pages mapped to be memory resident" in the hbase-env.sh, can you explain in detail? Thanks for any inputs.

Re: Hadoop 2.2.0-cdh5.0.0-beta-1 - MapReduce Streaming - Failed to run on a larger jobs

2014-04-24 Thread Silvina CaĆ­no Lores
Hi! I've faced the same issue a couple of times and I found nothing in the logs that lead me to the source of the error. However, I've found out that smart container and block configuration can prevent these issues First of all, check RM logs to find any problematic container since the same task

Re: Differences between HistoryServer and Yarn TimeLine server?

2014-04-24 Thread Zhijie Shen
Ashwin, YARN-321 focuses on the issue in the scope of generic application history service, while YARN-1530 covers the framework specific data service. And yes, the timeline server is going to cover both. We've not such a Jira before, but it is described in YARN-321's design doc. Anyway, I open a