Hi,
We have thousands of files, each dedicated to a user. (Each user has
access to other users' files, but they do this not very often.)
Each user runs map-reduce jobs on the cluster.
So we should seperate his/her files equally across the cluster,
so that every machine can take part in the
in hadoop-site.xml
change valuemaster:54311/value
to valuehdfs://master:54311/value
--nitesh
On Tue, Feb 10, 2009 at 9:50 PM, shefali pawar shefal...@rediffmail.comwrote:
I tried that, but it is not working either!
Shefali
On Sun, 08 Feb 2009 05:27:54 +0530 wrote
I ran into this
[I'm starting a new thread because there is sufficiently new/weird info but
I've attached an initial thread in case that might be useful.]
I'm running Hadoop 0.19.0. I have an input file consisting of several lines
that each contain a name and URL. The idea is pretty simple: download the
contents
Hi,
I'm new to Hadoop and I'm wondering what the recommended method is for
using native libraries in mapred jobs.
I've tried the following separately:
1. set LD_LIBRARY_PATH in .bashrc
2. set LD_LIBRARY_PATH and JAVA_LIBRARY_PATH in hadoop-env.sh
3. set -Djava.library.path=... for
On Feb 10, 2009, at 11:06 AM, Mimi Sun wrote:
Hi,
I'm new to Hadoop and I'm wondering what the recommended method is
for using native libraries in mapred jobs.
I've tried the following separately:
1. set LD_LIBRARY_PATH in .bashrc
2. set LD_LIBRARY_PATH and JAVA_LIBRARY_PATH in
We need to implement a version of Integer.parseInt/atoi from byte[] instead of
String to avoid the high cost of creating a String object.
I wanted to take the open jdk code but the license is GPL:
http://www.docjar.com/html/api/java/lang/Integer.java.html
Does anybody know an implementation
I have question. I've dabbled with different ways of tokenizing an
input file line for processing. I've noticed in my somewhat limited
tests that there seem to be some pretty reasonable performance
differences between different tokenizing methods. For example, roughly
it seems to split a
I see UnsatisfiedLinkError. Also I'm calling
System.getProperty(java.library.path) in the reducer and logging it.
The only thing that prints out is ...hadoop-0.18.2/bin/../lib/native/
Mac_OS_X-i386-32
I'm using Cascading, not sure if that affects anything.
- Mimi
On Feb 10, 2009, at
Hey Rasit,
I'm not sure I fully understand your description of the problem, but
you might want to check out the JIRA ticket for making the replica
placement algorithms in HDFS pluggable
(https://issues.apache.org/jira/browse/HADOOP-3799) and add your use
case there.
Regards,
Jeff
On Tue, Feb
Hey Mark,
In NameNode.java, the DEFAULT_PORT specified for NameNode RPC is 8020.
From my understanding of the code, your fs.default.name setting should
have overridden this port to be 9000. It appears your Hadoop
installation has not picked up the configuration settings
appropriately. You might
Hi,
Could someone help me to find some real Figures (transfer rate) about
Hadoop File transfer from local filesystem to HDFS, S3 etc and among Storage
Systems (HDFS to S3 etc)
Thanks,
Wasim
On Feb 10, 2009, at 4:10 PM, Wasim Bari wrote:
Hi,
Could someone help me to find some real Figures (transfer rate)
about Hadoop File transfer from local filesystem to HDFS, S3 etc
and among Storage Systems (HDFS to S3 etc)
Thanks,
Wasim
What are you looking for? Maximum possible
Brian, I have a similar question: why does transfer from a local filesystem
to SequenceFile takes so long (about 1 second per Meg)?
Thank you,
Mark
On Tue, Feb 10, 2009 at 4:46 PM, Brian Bockelman bbock...@cse.unl.eduwrote:
On Feb 10, 2009, at 4:10 PM, Wasim Bari wrote:
Hi,
Could someone
On Feb 10, 2009, at 4:53 PM, Mark Kerzner wrote:
Brian, I have a similar question: why does transfer from a local
filesystem
to SequenceFile takes so long (about 1 second per Meg)?
Hey Mark,
I saw your question about speed the other day ... unfortunately, I
didn't have any specific
With my setup. I have been able to get 10MBps write speed, 40MBps read
speed while writing multiple files (ranging a few Bytes to 100MB) into
SequenceFiles, and reading them back. The cluster has 1Gbps backbone.
On Tue, Feb 10, 2009 at 5:53 PM, Mark Kerzner markkerz...@gmail.com wrote:
Brian,
Do you know how to access those counters programmatically after the job has
run?
S D-5 wrote:
This does it. Thanks!
On Thu, Feb 5, 2009 at 9:14 PM, Arun C Murthy a...@yahoo-inc.com wrote:
On Feb 5, 2009, at 1:40 PM, S D wrote:
Is there a way to use the Reporter interface (or
I'd like to write a combiner that shares a lot of code with a reducer,
except that the reducer updates an external database at the end. As far as I
can tell, since both combiners and reducers must implement the Reducer
interface, there is no way to have this be the same class. Is there a
Hey zheng,
Maybe you can try ragel ,which can compile very effective codes for your fsm
from regex. The atoi function produced by ragel can run faster than
glibc's. It also targets java.
http://www.complang.org/ragel/
On Wed, Feb 11, 2009 at 4:18 AM, Zheng Shao zs...@facebook.com wrote:
We
I have some unit tests which run MapReduce jobs and test the inputs/
outputs in standalone mode. I recently started using DistributedCache
in one of these jobs, but now my tests fail with errors such as:
Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///
tmp/file.data
at
On Feb 10, 2009, at 5:20 PM, Matei Zaharia wrote:
I'd like to write a combiner that shares a lot of code with a reducer,
except that the reducer updates an external database at the end.
The right way to do this is to either do the update in the output
format or do something like:
class
Hi!
Kind of novice question, but I need to know, what Hadoop version is
considered stable. I was
trying to run version 0.19, and I've seen numerous stability issues
with it. Maybe version 0.18
is better suited for production environment?
Vadim
Nathan Marz wrote:
I have some unit tests which run MapReduce jobs and test the
inputs/outputs in standalone mode. I recently started using
DistributedCache in one of these jobs, but now my tests fail with
errors such as:
Caused by: java.io.IOException: Incomplete HDFS URI, no host:
Hi, the Quick
Starthttp://hadoop.apache.org/core/docs/current/quickstart.htmlhas
this sample configuration
namefs.default.name/name
valuehdfs://localhost:9000/value
but it does not seem to work: even though the daemons do listen to 9000, the
following command always uses 8020
hadoop fs
Brian, large files using command-line hadoop go fast, so it is something
about my computer or network. I won't worry about this now, especially in
light of Amit reporting fast writes and reads.
Mark
On Tue, Feb 10, 2009 at 5:00 PM, Brian Bockelman bbock...@cse.unl.eduwrote:
On Feb 10, 2009,
On Feb 10, 2009, at 11:09 PM, Mark Kerzner wrote:
Brian, large files using command-line hadoop go fast, so it is
something
about my computer or network. I won't worry about this now,
especially in
light of Amit reporting fast writes and reads.
You're creating files using SequenceFile,
Brian, I saw that Stuart
herehttp://stuartsierra.com/2008/04/24/a-million-little-filesmentions
slow writes to SequenceFile. If so, I will either use his tar
approach or try to parallelize it if I can.
On Tue, Feb 10, 2009 at 11:14 PM, Brian Bockelman bbock...@cse.unl.eduwrote:
On Feb 10, 2009,
Just to toss out some numbers (and because our users are making
interesting numbers right now)
Here's our external network router:
http://mrtg.unl.edu/~cricket/?target=%2Frouter-interfaces%2Fborder2%2Ftengigabitethernet2_2;view=Octets
Here's the application-level transfer graph:
I say, that's very interesting and useful.
On Tue, Feb 10, 2009 at 11:37 PM, Brian Bockelman bbock...@cse.unl.eduwrote:
Just to toss out some numbers (and because our users are making
interesting numbers right now)
Here's our external network router:
We got the same problem as you when using MultipleOutputFormat both on
hadoop 0.18 and 0.19. On hadoop 0.18, increasing the xceivers count does
not fix the problem. But we found many error message complaining that
xceiverCount exceeded the limit of concurrent xcievers in datanode
(running on
29 matches
Mail list logo