Re: which is better : form reducer or Driver

2013-11-06 Thread Chris Mawata
Try it both ways and look at the numbers. Measurement is the ultimate way to get the answer. My bet: The difference is so small I would worry about where if makes your code more maintainable than tuning before coding. Chris On 11/5/2013 9:59 PM, unmesha sreeveni wrote: i am dealing with

Re: which is better : form reducer or Driver

2013-11-06 Thread unmesha sreeveni
ok i will check them in both ways :) On Wed, Nov 6, 2013 at 2:10 PM, Chris Mawata chris.maw...@gmail.com wrote: Try it both ways and look at the numbers. Measurement is the ultimate way to get the answer. My bet: The difference is so small I would worry about where if makes your code more

Alternatives to Tuxedo suite (Bowtie, Tophat, and Cufflinks) based on Hadoop

2013-11-06 Thread Luiz Antonio Falaguasta Barbosa
Hi guys, Please, does anybody know something about Hadoop based tools that could be used as alternative to Tuxedo suite (Bowtie, Tophat, and Cufflinks)? I would like to try some tools that could reduce time of running tools of tuxedo suite, so I thought there might be some tools over Hadoop to

Re: only one map or reduce job per time on one node

2013-11-06 Thread DSuiter RDX
I suspect that the reason no-one is responding with good answers is that fundamentally, it seems like what you are trying to do runs against the reason Hadoop is designed the way it is. A parallel process framework is defeated if you force it to not work concurrently... Maybe you should look into

Re: Error while running Hadoop Source Code

2013-11-06 Thread Basu,Indrashish
Hi Vinod, Thanks for your help regarding this. I checked the task logs, this is what it is giving as output. 2013-11-06 06:40:05,541 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201311060636_0001_m_1100862588 2013-11-06 06:40:05,553 INFO

hadoop-2.2.0 windows installation

2013-11-06 Thread Palanivelu, Sridharan (SCR US)
Guys, Anyone, tried to install Hadoop 2.2 on Windows? Is there any installation instruction available? Thanks, Sridharan This message and any attachments are solely for the use of intended recipients. The information contained herein may include trade secrets, protected health or personal

Sample code for testing hadoop 2.2.0

2013-11-06 Thread Ping Luo
I am completely new to Hadoop. I recently installed hadoop 2.2.0 and need to find a sample application to test my installation, something similar to the word count in Hadoop 1.2.1. I will need very detailed instruction on how to compile and run the code. thanks. Ping

EclipsePlugin source in branch-2.x.x

2013-11-06 Thread Javi Roman
Hi! I'm trying to find the source code of EclipsePlugin [1] in the brand new release Hadoop 2.2.0 GA, however the last reference I've been able to find about is in the branch-1.2.1 in the folder src/contrib/eclipse-plugin. Is this plugin unmaintained or removed from the current branches? Any

Re: Error while running Hadoop Source Code

2013-11-06 Thread Basu,Indrashish
Can anyone please assist regarding this ? Thanks in advance Regards, Indra On Wed, 06 Nov 2013 09:50:02 -0500, Basu,Indrashish wrote: Hi Vinod, Thanks for your help regarding this. I checked the task logs, this is what it is giving as output. 2013-11-06 06:40:05,541 INFO

access to hadoop cluster to post tasks remotely

2013-11-06 Thread Sergey Gerasimov
Hello, I have problems with posting jar to my cluster remotely from client machine located somewhere in the Web. I use original hadoop-1.2.1. I installed hadoop on client machine (same version as in the cluster), configured fs.default.name and mapred.job.tracker. Access to DFS works fine

Re: access to hadoop cluster to post tasks remotely

2013-11-06 Thread Harsh J
Data in HDFS is read and written via the individual DN's 50010 ports, which you would also need to open up to avoid these errors. Data isn't written/read through the NameNode. On Thu, Nov 7, 2013 at 4:50 AM, Sergey Gerasimov gerasi...@mlab.cs.msu.su wrote: Hello, I have problems with posting

Re: EclipsePlugin source in branch-2.x.x

2013-11-06 Thread Harsh J
The Eclipse/Other IDE integration efforts are now being lead by the Hadoop Developer Tools project at http://hdt.incubator.apache.org. On Thu, Nov 7, 2013 at 1:49 AM, Javi Roman javiro...@kernel-labs.org wrote: Hi! I'm trying to find the source code of EclipsePlugin [1] in the brand new

RE: access to hadoop cluster to post tasks remotely

2013-11-06 Thread Sergey Gerasimov
Oooops. Not all hadoop fs commands works fine.. -ls is OK -put/-get give similar error. Looks like port 50010 of data nodes should be accessible externally.. Does anybody know some config param to work around? But I still don't understand why hadoop engine tries to connect to

Re: access to hadoop cluster to post tasks remotely

2013-11-06 Thread Roman Shaposhnik
On Wed, Nov 6, 2013 at 3:55 PM, Sergey Gerasimov gerasi...@mlab.cs.msu.su wrote: But I still don’t understand why hadoop engine tries to connect to DataNodes from client(!) machine during posting jar from client machine to the cluster. Only metadata traffic goes to the NN, once metadata

issue about add disk on DN

2013-11-06 Thread ch huang
hi,all: i have a DN,and i mount the two disk ,one disk for /data/dataspace/1, and one disk for /data/dataspace/2, the two disk is almost full, so i add a new disk,and modify the config file,now disk3 mount on /data/dataspace/3 , is it possible even distribution of data on the three disks?

Re: Error while running Hadoop Source Code

2013-11-06 Thread Vinod Kumar Vavilapalli
Don't see anything in the logs that you pasted. Can you paste the following in say pastebin? - All of the TaskTracker log - The task-logs. These are syslog, stderr, stdout files for a specific TaskAttempt. - And specific TaskAttempt's TaskAttemtpID that is failing. Thanks, +Vinod On Nov 6,

Re: issue about add disk on DN

2013-11-06 Thread 金杰
yes, you can rebalance them. try hdfs rebalance. Or, you can first increment the replica by 1, then decrement by 1. Best Regards 金杰 (Jay Jin) On Thu, Nov 7, 2013 at 8:34 AM, ch huang justlo...@gmail.com wrote: hi,all: i have a DN,and i mount the two disk ,one disk for

Re: issue about add disk on DN

2013-11-06 Thread ch huang
这个好像不太合理吧,如果我原来3块盘的话,变成6块盘需要由3个拷贝改成六个拷贝? 那我如果弄完后改回3个拷贝原来的数据还是存6拷贝的呀,那不是浪费空间? On Thu, Nov 7, 2013 at 9:44 AM, 金杰 hellojin...@gmail.com wrote: yes, you can rebalance them. try hdfs rebalance. Or, you can first increment the replica by 1, then decrement by 1. Best Regards 金杰 (Jay Jin) On

Volunteer

2013-11-06 Thread Mike
Hi guys I would like to volunteer and help with hadoop. Could you point me in the right direction? Best regards Mike

Re: Volunteer

2013-11-06 Thread Mirko Kämpf
What is the field you want to working in, core hadoop development, scripting and testing, documentation, tool development, app development, benchmarking? What is your level of experience? What programming languages do you use? I think you can just start with building hadoop and its related

Re: issue about add disk on DN

2013-11-06 Thread Andrew Wright
It is possible to rebalance across hosts but I do not believe it is possible to rebalance within a data node. Best chance to decommission the host have all the data redistribute to other nodes and then add the node back into the cluster and rebalance then. Also, the dn will identify when the

Re: issue about add disk on DN

2013-11-06 Thread 金杰
You can setup a test environment, and have a try. By decrement 1, the replica blocks will be reduced. Best Regards 金杰 (Jie Jin) On Thu, Nov 7, 2013 at 10:39 AM, ch huang justlo...@gmail.com wrote: 这个好像不太合理吧,如果我原来3块盘的话,变成6块盘需要由3个拷贝改成六个拷贝? 那我如果弄完后改回3个拷贝原来的数据还是存6拷贝的呀,那不是浪费空间? On Thu, Nov 7,

Mapper input as argument

2013-11-06 Thread unmesha sreeveni
My driver code is FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job,new Path(args[1])); and My mapper is public void map(Object key, Text value, Context context) throws IOException, InterruptedException { where value.tostring()

Re: issue about add disk on DN

2013-11-06 Thread ch huang
yes ,but if one of disks is full,the totol io bandwidth also will reduce On Thu, Nov 7, 2013 at 11:31 AM, Andrew Wright agwli...@gmail.com wrote: It is possible to rebalance across hosts but I do not believe it is possible to rebalance within a data node. Best chance to decommission the host

Re: Mapper input as argument

2013-11-06 Thread unmesha sreeveni
one more doubt : how to copy each input split entering into mapper into a file for computation? On Thu, Nov 7, 2013 at 10:35 AM, unmesha sreeveni unmeshab...@gmail.comwrote: My driver code is FileInputFormat.setInputPaths(job, new Path(args[0]));

Re: Mapper input as argument

2013-11-06 Thread Sonal Goyal
Hi Unmesha, What is the computation you are trying to do? If you are interested in computing over multiple lines instead of a single line, have a look at NLineInputFormat. Best Regards, Sonal Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Thu, Nov 7, 2013

Re: Mapper input as argument

2013-11-06 Thread unmesha sreeveni
Am i able to get the entire split data from mapper. i dnt need as line by line. my input is of say 50 lines.so these files can be splited into different mappers right. how to get each split data. are we able to get that data? On Thu, Nov 7, 2013 at 11:39 AM, Sonal Goyal sonalgoy...@gmail.com

Re: Mapper input as argument

2013-11-06 Thread Sonal Goyal
If you dont need line by line but you want to get a number of lines together, use NLineInputFormat. If you dont want to split at all, override isSplitable in FileInputFormat. Or you can use FileInputFormat, get each line as key/value and compute over it, saving the results and emitting only as