Try it both ways and look at the numbers. Measurement is the ultimate
way to get the answer.
My bet: The difference is so small I would worry about where if makes
your code more maintainable than tuning before coding.
Chris
On 11/5/2013 9:59 PM, unmesha sreeveni wrote:
i am dealing with
ok i will check them in both ways :)
On Wed, Nov 6, 2013 at 2:10 PM, Chris Mawata chris.maw...@gmail.com wrote:
Try it both ways and look at the numbers. Measurement is the ultimate
way to get the answer.
My bet: The difference is so small I would worry about where if makes your
code more
Hi guys,
Please, does anybody know something about Hadoop based tools that could be
used as alternative to Tuxedo suite (Bowtie, Tophat, and Cufflinks)?
I would like to try some tools that could reduce time of running tools of
tuxedo suite, so I thought there might be some tools over Hadoop to
I suspect that the reason no-one is responding with good answers is that
fundamentally, it seems like what you are trying to do runs against the
reason Hadoop is designed the way it is. A parallel process framework is
defeated if you force it to not work concurrently...
Maybe you should look into
Hi Vinod,
Thanks for your help regarding this. I checked the task
logs, this is what it is giving as output.
2013-11-06 06:40:05,541
INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM
ID: jvm_201311060636_0001_m_1100862588
2013-11-06 06:40:05,553 INFO
Guys,
Anyone, tried to install Hadoop 2.2 on Windows? Is there any installation
instruction available?
Thanks,
Sridharan
This message and any attachments are solely for the use of intended recipients.
The information contained herein may include trade secrets, protected health or
personal
I am completely new to Hadoop. I recently installed hadoop 2.2.0 and need to
find a sample application to test my installation, something similar to the
word count in Hadoop 1.2.1. I will need very detailed instruction on how to
compile and run the code.
thanks.
Ping
Hi!
I'm trying to find the source code of EclipsePlugin [1] in the brand new
release Hadoop 2.2.0 GA, however the last reference I've been able to find
about is in the branch-1.2.1 in the folder src/contrib/eclipse-plugin.
Is this plugin unmaintained or removed from the current branches?
Any
Can anyone please assist regarding this ?
Thanks in advance
Regards,
Indra
On Wed, 06 Nov 2013 09:50:02 -0500, Basu,Indrashish
wrote:
Hi Vinod,
Thanks for your help regarding this. I
checked the task logs, this is what it is giving as output.
2013-11-06 06:40:05,541 INFO
Hello,
I have problems with posting jar to my cluster remotely from client machine
located somewhere in the Web. I use original hadoop-1.2.1.
I installed hadoop on client machine (same version as in the cluster),
configured fs.default.name and mapred.job.tracker.
Access to DFS works fine
Data in HDFS is read and written via the individual DN's 50010 ports,
which you would also need to open up to avoid these errors. Data isn't
written/read through the NameNode.
On Thu, Nov 7, 2013 at 4:50 AM, Sergey Gerasimov
gerasi...@mlab.cs.msu.su wrote:
Hello,
I have problems with posting
The Eclipse/Other IDE integration efforts are now being lead by the
Hadoop Developer Tools project at http://hdt.incubator.apache.org.
On Thu, Nov 7, 2013 at 1:49 AM, Javi Roman javiro...@kernel-labs.org wrote:
Hi!
I'm trying to find the source code of EclipsePlugin [1] in the brand new
Oooops.
Not all hadoop fs commands works fine..
-ls is OK
-put/-get give similar error.
Looks like port 50010 of data nodes should be accessible externally.. Does
anybody know some config param to work around?
But I still don't understand why hadoop engine tries to connect to
On Wed, Nov 6, 2013 at 3:55 PM, Sergey Gerasimov
gerasi...@mlab.cs.msu.su wrote:
But I still don’t understand why hadoop engine tries to connect to
DataNodes from client(!) machine during posting jar from client machine to
the cluster.
Only metadata traffic goes to the NN, once metadata
hi,all:
i have a DN,and i mount the two disk ,one disk for
/data/dataspace/1, and one disk for /data/dataspace/2,
the two disk is almost full, so i add a new disk,and modify the config
file,now
disk3 mount on /data/dataspace/3 , is it possible even distribution of data
on the three disks?
Don't see anything in the logs that you pasted.
Can you paste the following in say pastebin?
- All of the TaskTracker log
- The task-logs. These are syslog, stderr, stdout files for a specific
TaskAttempt.
- And specific TaskAttempt's TaskAttemtpID that is failing.
Thanks,
+Vinod
On Nov 6,
yes, you can rebalance them. try hdfs rebalance. Or, you can first
increment the replica by 1, then decrement by 1.
Best Regards
金杰 (Jay Jin)
On Thu, Nov 7, 2013 at 8:34 AM, ch huang justlo...@gmail.com wrote:
hi,all:
i have a DN,and i mount the two disk ,one disk for
这个好像不太合理吧,如果我原来3块盘的话,变成6块盘需要由3个拷贝改成六个拷贝?
那我如果弄完后改回3个拷贝原来的数据还是存6拷贝的呀,那不是浪费空间?
On Thu, Nov 7, 2013 at 9:44 AM, 金杰 hellojin...@gmail.com wrote:
yes, you can rebalance them. try hdfs rebalance. Or, you can first
increment the replica by 1, then decrement by 1.
Best Regards
金杰 (Jay Jin)
On
Hi guys
I would like to volunteer and help with hadoop. Could you point me in the right
direction?
Best regards
Mike
What is the field you want to working in, core hadoop development,
scripting and testing, documentation, tool development, app development,
benchmarking?
What is your level of experience?
What programming languages do you use?
I think you can just start with building hadoop and its related
It is possible to rebalance across hosts but I do not believe it is
possible to rebalance within a data node. Best chance to decommission the
host have all the data redistribute to other nodes and then add the node
back into the cluster and rebalance then.
Also, the dn will identify when the
You can setup a test environment, and have a try.
By decrement 1, the replica blocks will be reduced.
Best Regards
金杰 (Jie Jin)
On Thu, Nov 7, 2013 at 10:39 AM, ch huang justlo...@gmail.com wrote:
这个好像不太合理吧,如果我原来3块盘的话,变成6块盘需要由3个拷贝改成六个拷贝?
那我如果弄完后改回3个拷贝原来的数据还是存6拷贝的呀,那不是浪费空间?
On Thu, Nov 7,
My driver code is
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
and My mapper is
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
where value.tostring()
yes ,but if one of disks is full,the totol io bandwidth also will reduce
On Thu, Nov 7, 2013 at 11:31 AM, Andrew Wright agwli...@gmail.com wrote:
It is possible to rebalance across hosts but I do not believe it is
possible to rebalance within a data node. Best chance to decommission the
host
one more doubt : how to copy each input split entering into mapper into a
file for computation?
On Thu, Nov 7, 2013 at 10:35 AM, unmesha sreeveni unmeshab...@gmail.comwrote:
My driver code is
FileInputFormat.setInputPaths(job, new Path(args[0]));
Hi Unmesha,
What is the computation you are trying to do? If you are interested in
computing over multiple lines instead of a single line, have a look at
NLineInputFormat.
Best Regards,
Sonal
Nube Technologies http://www.nubetech.co
http://in.linkedin.com/in/sonalgoyal
On Thu, Nov 7, 2013
Am i able to get the entire split data from mapper. i dnt need as line by
line.
my input is of say 50 lines.so these files can be splited into different
mappers right. how to get each split data. are we able to get that data?
On Thu, Nov 7, 2013 at 11:39 AM, Sonal Goyal sonalgoy...@gmail.com
If you dont need line by line but you want to get a number of lines
together, use NLineInputFormat. If you dont want to split at all, override
isSplitable in FileInputFormat. Or you can use FileInputFormat, get each
line as key/value and compute over it, saving the results and emitting only
as
28 matches
Mail list logo