I'm still trying to solve this problem. One person mentioned that mappers have
to sort the data and that the sort buffer may be relevant but I'm seeing the
same linear slowdown from the reducer, and more importantly, my data sizes are
so small (a few MBs) that if the Hadoop settings
Hello,
Currently I've been playing around with my single node cluster.
I'm planning to test my code on a real cluster in the next few weeks.
I've read some manuals on how to deploy it. It seems that still a lot has to be
done manually.
As the cluster I will be working on will probably format
How many nodes?
Sent from my mobile. Please excuse the typos.
On 2011-03-10, at 7:05 AM, Lai Will l...@student.ethz.ch wrote:
Hello,
Currently I've been playing around with my single node cluster.
I'm planning to test my code on a real cluster in the next few weeks.
I've read some
Sorry, and where are you hosting the cluster? Cloud? Physical? Garage?
Sent from my mobile. Please excuse the typos.
On 2011-03-10, at 7:05 AM, Lai Will l...@student.ethz.ch wrote:
Hello,
Currently I've been playing around with my single node cluster.
I'm planning to test my code on a
Hi,
The last couple of days we have been seeing 10's of thousands of these
errors in the logs:
INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file
/offline/working/3/aat/_temporary/_attempt_201103100812_0024_r_03_0/4129371_172307245/part-3
retrying...
When this is going on
[moving to common-user, since this spans both MR and HDFS - probably
easier than cross-posting]
Can you check the DN logs for exceeds the limit of concurrent
xcievers? You may need to bump the dfs.datanode.max.xcievers
parameter in hdfs-site.xml, and also possibly the nfiles ulimit.
-Todd
On
Dear users,
hope this is the right list to submit this one, otherwise I apologize.
I'd like to have your opinion about a problem that I'm facing on MapReduce
framework. I am writing my code in Java and running on a grid.
I have a textual input structured in key, value pairs. My task is to
If I understand your problem correctly you actually need some way of
knowing if you need to chop a large set with a specific key in to
subsets.
In mapreduce the map only has information about a single key at a
time. So you need something extra.
One way of handling this is to start by doing a
Dear Niels,
thanks for the quick response. So in your opinion there is nothing like a
hadoop embedded tool to do this. This is what I suspected indeed.
Since the key, value pairs in the initial dataset are randomly partitioned in
the input files, I suppose that I can avoid the initial statistic
Hello,
My main function prepares an HDFS file called inputPaths with all the
input files' path such that each path is printed in a line.
I set the job input path to be this hdfs file inputPaths
Hence each mapper's value is something like this: -
Hi Luca,
2011/3/10 Luca Aiello alu...@yahoo-inc.com:
thanks for the quick response. So in your opinion there is nothing like a
hadoop embedded tool to do this. This is what I suspected indeed.
The mapreduce model simply uses the key as the pivot of the
processing. In your application
Luca,
You can avoid post-processing step if you use composite keys as output
from map.
So if you know your input composed by 70% of A keys, 20% of Bs and 10% of
Cs, you can emit from the mapper
{{key, mod(key_count++,key_probability/lowest_key_probability)},{value}}.
Number of reducers can
Hi Alex,
sure it helps!
You are right, I can avoid the post-processing by cleverly adding an additional
field just for partitioning purpose.
When I said calculate probability on the fly I meant something similar to
what you said: re-calculate key probability on every row you process in each
On Thu, Mar 10, 2011 at 12:48 AM, Adarsh Sharma
adarsh.sha...@orkash.com wrote:
Thanks Harsh, i.e why if we again format namenode after loading some data
INCOMATIBLE NAMESPACE ID's error occurs.
Best Regards,
Adarsh Sharma
Harsh J wrote:
Formatting the NameNode initializes the
On the first run you want namenode to initialize its directories (where it
store VERSION file, fsimage and edits).
On the subsequent formats - you are making sure you have a new EMPTY file
system. If you don't do format NameNode will load up fsimage and edits.
There is also matter of generating
Once you have a JobConf/Configuration conf object in your Mapper (via
setup/configure methods), you can do the following to get the default
file-system impl:
FileSystem fs = FileSystem.get(conf); // Gets the fs.default.name
file-system impl.
Then use fs to open/create/etc. any file you need
Thanks for the reply Harsh as usual :)
Yet, the problem is in the value of the mapper being =
hdfs://localhost:9000/tmp/in/file1 , I thought I wasn't using the same HDFS but
in fact I was ! using the same idea you presented.
The problem however, is that the map value =
How do you store the filenames into the file? Instead of storing the
entire Path URI (if that is the trouble [mustn't be if both your
driver and cluster's fs.def.name is the same]), you can store just the
Name component of the path (i.e. just /user/me/blah.txt instead of the
whole proper URI).
18 matches
Mail list logo