RE: I need help talking to HDFS over a firewall

2011-09-23 Thread Aaron Baff
Are you sure you have the right port number? As you say, if it's been reconfigured, could they have changed the port the NN runs on? Also, could they have changed the hostname of the NN? Instead of connecting to the NN you actually are trying to connect to one of the datanodes? --Aaron -Or

I need help talking to HDFS over a firewall

2011-09-23 Thread Steve Lewis
I have a small piece of code which opens hdfs. When I run the code my a machine riunning Windows 7 from work it connects perfectly String host = "myhost" int port = 9000; String connectString = "hdfs://" + host + ":" + port + "/"; Configuration config = new Configuration(); co

How to run Hadoop in standalone mode in Windows

2011-09-23 Thread Mark Kerzner
Hi, I have cygwin, and I have NetBeans, and I have a maven Hadoop project that works on Linux. How do I combine them to work in Windows? Thank you, Mark

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-23 Thread Meng Mao
Hmm, I must have really missed an important piece somewhere. This is from the MapRed tutorial text: "DistributedCache is a facility provided by the Map/Reduce framework to cache files (text, archives, jars and so on) needed by applications. Applications specify the files to be cached via urls (hd

Getting the cpu, memory usage of map/reduce tasks

2011-09-23 Thread bikash sharma
Hi -- Is it possible to get the cpu and memory usage of individual map/reduces tasks when any mapreduce job is run. I came across this jira issue, but was not sure about the exact ways to access in the current hadoop distriubtion https://issues.apache.org/jira/browse/MAPREDUCE-220 Any help is high

Re: formatting hdfs without user interaction

2011-09-23 Thread Ivan.Novick
On 9/23/11 9:46 AM, "Harsh J" wrote: >Ivan, > >On Fri, Sep 23, 2011 at 9:22 PM, wrote: >[snip] >> Which parameter are you referring to? I am planning on using 2 >>directories >> in dfs.name.dir, one is local and the other is an NFS mount of a 2nd >> machine running the secondary namenode. > >I'

Re: formatting hdfs without user interaction

2011-09-23 Thread Harsh J
Ivan, On Fri, Sep 23, 2011 at 9:22 PM, wrote: [snip] > Which parameter are you referring to? I am planning on using 2 directories > in dfs.name.dir, one is local and the other is an NFS mount of a 2nd > machine running the secondary namenode. I'm slightly confused here. Do you mean to say a 'Se

Re: Maintaining map reduce job logs - The best practices

2011-09-23 Thread bejoy . hadoop
Great!.. Thanks Raj and Mathias Just a clarification query on top of my question. I wanna log some information of my processing/data logged into my log files. I'm planning to log it by LOG.debug() , if I do so in my mapper or reducer it'd be availabe under HADOOP_HOME/logs/history dir, right? Sec

Re: formatting hdfs without user interaction

2011-09-23 Thread Ivan.Novick
On 9/23/11 9:01 AM, "Edward Capriolo" wrote: >On Fri, Sep 23, 2011 at 11:52 AM, wrote: > >> Hi Harsh, >> >> On 9/22/11 8:48 PM, "Harsh J" wrote: >> >> >Ivan, >> > >> >Writing your own program was overkill. >> > >> >The 'yes' coreutil is pretty silly, but nifty at the same time. It >> >accepts

Re: formatting hdfs without user interaction

2011-09-23 Thread Edward Capriolo
On Fri, Sep 23, 2011 at 11:52 AM, wrote: > Hi Harsh, > > On 9/22/11 8:48 PM, "Harsh J" wrote: > > >Ivan, > > > >Writing your own program was overkill. > > > >The 'yes' coreutil is pretty silly, but nifty at the same time. It > >accepts an argument, which it would repeat infinitely. > > > >So: >

Re: formatting hdfs without user interaction

2011-09-23 Thread Ivan.Novick
Hi Harsh, On 9/22/11 8:48 PM, "Harsh J" wrote: >Ivan, > >Writing your own program was overkill. > >The 'yes' coreutil is pretty silly, but nifty at the same time. It >accepts an argument, which it would repeat infinitely. > >So: > >$ yes Y | hadoop namenode -format > >Would do it for you. Nice!

RE: Environment consideration for a research on scheduling

2011-09-23 Thread GOEKE, MATTHEW (AG/1000)
If you are starting from scratch with no prior Hadoop install experience I would configure stand-alone, migrate to pseudo distributed and then to fully distributed verifying functionality at each step by doing a simple word count run. Also, if you don't mind using the CDH distribution then SCM /

Environment consideration for a research on scheduling

2011-09-23 Thread Merto Mertek
Hi, in the first phase we are planning to establish a small cluster with few commodity computer (each 1GB, 200GB,..). Cluster would run ubuntu server 10.10 and a hadoop build from the branch 0.20.204 (i had some issues with version 0.20.203 with missing libraries

Re: Unsubscribe from jira issues

2011-09-23 Thread Merto Mertek
hehe :) you are right :) On 23 September 2011 16:21, Harsh J wrote: > Merto, > > Am sure your mail client has some form of filtering available in that case! > :-) > > On Fri, Sep 23, 2011 at 7:49 PM, Merto Mertek wrote: > > Probably there is not any option just to disable jira issues.. I will >

Re: Unsubscribe from jira issues

2011-09-23 Thread Harsh J
Merto, Am sure your mail client has some form of filtering available in that case! :-) On Fri, Sep 23, 2011 at 7:49 PM, Merto Mertek wrote: > Probably there is not any option just to disable jira issues.. I will > probably need the common-dev list so I will stay subscribed.. > > Thank you... > >

Re: Unsubscribe from jira issues

2011-09-23 Thread Merto Mertek
Probably there is not any option just to disable jira issues.. I will probably need the common-dev list so I will stay subscribed.. Thank you... On 23 September 2011 16:11, Harsh J wrote: > Merto, > > You need common-dev-unsubscribe@ > > The common-dev list receives just JIRA opened/resolved/re

Re: many killed tasks, long execution time

2011-09-23 Thread Robert Evans
Sofia, Speculative execution is great so long as you are not writing data off to HDFS on the side. If you use a normal output format it can handle putting your output in a temporary location with a unique name, and then in the cleanup method when all tasks have finished it moves the files to t

Re: Unsubscribe from jira issues

2011-09-23 Thread Harsh J
Merto, You need common-dev-unsubscribe@ The common-dev list receives just JIRA opened/resolved/reopened messages. The common-issues receives everything. On Fri, Sep 23, 2011 at 7:27 PM, Merto Mertek wrote: > Hi, > i am receiving messages from two mailing lists ("common-dev","common-user") > and

Re: many killed tasks, long execution time

2011-09-23 Thread Sofia Georgiakaki
Mr. Bobby, thank you for your reply. The IOException was related with the speculative execution. In my Reducers I create some files written on the HDFS, so in some occasions multiple tasks attempted to write the same file. I turned the speculative mode off for the reduce tasks, and the problem w

Unsubscribe from jira issues

2011-09-23 Thread Merto Mertek
Hi, i am receiving messages from two mailing lists ("common-dev","common-user") and I would like to disable receiving msg from jira. I am not a member of "common-issues-unsubscribe" list. Can I anyhow disable this? Thank you

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-23 Thread Robert Evans
Meng Mao, The way the distributed cache is currently written, it does not verify the integrity of the cache files at all after they are downloaded. It just assumes that if they were downloaded once they are still there and in the proper shape. It might be good to file a JIRA to add in some so

Re: Development enviroment problems - eclipse, hadoop 0.20.203

2011-09-23 Thread mertoz
Thank you Thomas, it worked.. On 22 July 2011 14:54, Thomas Graves [via Hadoop Common] < ml-node+3191276-178468340-416...@n3.nabble.com> wrote: > You can try the branch 0.20.204 (branch-0.20-security-204) as I fixed a > JIRA > to automatically update eclipse classpath file > (https://issues.apach

Re: many killed tasks, long execution time

2011-09-23 Thread Robert Evans
Can you include the complete stack trace of the IOException you are seeing? --Bobby Evans On 9/23/11 2:15 AM, "Sofia Georgiakaki" wrote: Good morning! I would be grateful if anyone could help me about a serious problem that I'm facing. I try to run a hadoop job on a 12-node luster (has 48

Re: Maintaining map reduce job logs - The best practices

2011-09-23 Thread Mathias Herberts
> You can find the job specific logs in two places. The first one is in the > hdfs ouput directory. The second place is under $HADOOP_HOME/logs/history > ($HADOOP_HOME/logs/history/done) > > Both these paces have the config file and the job logs for each submited job. Those logs in 'history/done

Re: Maintaining map reduce job logs - The best practices

2011-09-23 Thread Raj Vishwanathan
Bejoy You can find the job specific logs in two places. The first one is in the hdfs ouput directory. The second place is under $HADOOP_HOME/logs/history ($HADOOP_HOME/logs/history/done) Both these paces have the config file and the job logs for each submited job. Sent from my iPad Please ex

How to run java code using Mahout from commandline ?

2011-09-23 Thread praveenesh kumar
Hey, I have this code written using mahout. I am able to run the code from eclipse How can I run the code written in mahout from command line ? My question is do I have to make a jar file and run it as hadoop jar jarfilename.jar class or shall I run it using simple java command ? Can anyone solve

Maintaining map reduce job logs - The best practices

2011-09-23 Thread Bejoy KS
Hi All I do have a query here on maintaining Hadoop map-reduce logs. In default the logs appear in respective task tracker nodes which you can easily drill down from the job tracker web UI at times of any failure.(Which I was following till now) . Now I need to get into the next level

many killed tasks, long execution time

2011-09-23 Thread Sofia Georgiakaki
Good morning! I would be grateful if anyone could help me about a serious problem that I'm facing. I try to run a hadoop job on a 12-node luster (has 48 task capacity), and I have problems when dealing with big input data (10-20GB) which gets worse when I increase the number of reducers. Many