I think you may need to use ulimit in addition to setting
dfs.datanode.max.xcievers. For example, on one of our boxes:
~ $ ulimit -a
core file size(blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
open files
I think the balancing bandwidth property you are looking for is in
hdfs-site.xml:
dfs.balance.bandwidthPerSec
402653184
Set the value that makes most sense for your NIC. But I thought this is only
for balancing.
On Jan 20, 2012, at 3:43 PM, Michael Segel wrote:
> Ste
Hi all,
For the TextInputFormat class, the input key is a file position. This is
working well. But when I switch to LzoTextInputFormat to read LZO files, the
key does not make sense. It does not indicate file position. Is the file
position supported with LzoTextInputFormat?
Here is a job that
Our Hadoop journey included a brief stint running on our own virtualised
infrastructure. Our pre-Hadoop application was already running on the VM
infrastructure so we set up a small cluster as virtual machines on the SAN.
It worked ok for a while but as our usage grew we ditched it for a couple
sudo chown -R hadoop:hadoop /usr/local/hadoop.
That will give the directory ownership over to your hadoop account.
On Fri, Jul 1, 2011 at 5:07 AM, Dhruv Kumar wrote:
> It is a permission issue. Are you sure that the account "hadoop" has read
> and write access to /usr/local/* directories?
>
> Th
Hi,
I'm not familiar with wukong, but Mandy has some scripts that wrap the hadoop
commands- the default behaviour IIRC is to package the folder the script is in.
This is then distributed so the app carries all its dependencies with it.
Happy to hear -files works for you.
Sent from my iPhone
O
and compile in your tree.
Hth,
Paul
Sent from my iPhone
On 12 Oct 2010, at 18:10, Steve Lewis wrote:
> Look at the classes org.apache.hadoop.mapreduce.lib.input.LineRecordReader
> and org.apache.hadoop.mapreduce.lib.input.TextInputFormat
>
> What you need to do is copy those and
d up using. I
also posted something on my blog about it all [2], and a little about my
understanding (so far) of input formats and record readers etc.
Hope that helps,
Paul
1.
http://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/
Just a heads up on this, we've run into problems when trying to use fuse to
mount dfs running on port :8020. However, it works fine when we ran it on
:9000.
-paul
On Thu, Apr 22, 2010 at 7:59 PM, Brian Bockelman wrote:
> Hey Christian,
>
> I've run into this before.
>
our systems guys has recommended
using a PXE boot image? Are there any other similar tools that people could
recommend?
Thanks,
Paul
ed invoking directly doesn't seem
to create the parent directories- it just returns false?
If I want to move a file (creating any parent directories) on HDFS is there an
existing class I can use?
Thanks,
Paul
the data from the other splits?
Do I need to write a custom InputFormat to perform splits that honour
the record boundaries?
Thanks,
Paul
fine detail
at any point, looking for correlation between metrics.
cheers,
Paul
On 11/12/2009, at 12:40 PM, Matt Massie wrote:
> If you're looking for ganglia gmetric scripts for Disk I/O, take a look at
> http://ganglia.info/gmetric/ or http://ben.hartshorne.net/ganglia/. At the
>
If your distro is Redhat based, you may also want to consider a system like
Spacewalk:
http://www.redhat.com/spacewalk/
https://fedorahosted.org/spacewalk/
We've found it very useful in our environment for a lot of purposes.
-paul
On Wed, Dec 9, 2009 at 2:26 PM, John Martyniak
om
On Sat, Nov 7, 2009 at 6:45 AM, Xiance SI(司宪策)
wrote:
I just fall back to old mapred.* APIs, seems MultipleOutputs only
works for
the old API.
wishes,
Xiance
On Mon, Nov 2, 2009 at 9:12 AM, Paul Smith wrote:
Totally stuck here, I can't seem to find a way to resolve this,
but I
eate a File-per-metric name (there's only 5).
thoughts?
Paul
Check out the bottom of this page:
http://wiki.apache.org/hadoop/DiskSetup
noatime is all we've done in our environment. I haven't found it worth the
time to optimize further since we're CPU bound in most of our jobs.
-paul
On Thu, Oct 8, 2009 at 3:26 PM, Stas Oski
s, or a couple of
gigabytes worth of data.
HTH,
Paul
On 7 Oct 2009, at 10:58, Bob Schulze wrote:
I need a cache, that is read by many nodes often, written by a few
nodes
rarely. Its not too big in size (200.000-2Mio records/1Gb), but may be
too big to fit into one node (so keeping local
On 23/09/2009, at 10:47 AM, Ravi Phulari wrote:
Hello Paul here is quick answer to your question -
You can use dfs.datanode.du.pct and dfs.datanode.du.reserved
property in hdfs-site.xml config file to configure
maximum local disk space used by hdfs and mapreduce
more of the less powerful instances. During the early days of
our experiments with Hadoop and EC2, this was by far and away the most
surprising thing (although in retrospect I guess it's no so strange!)
Not sure it answers your question, but food for thought hopefully.
Thanks,
Paul
On 2
On 25/09/2009, at 8:55 PM, Steve Loughran wrote:
Paul Smith wrote:
On 25/09/2009, at 3:57 PM, Allen Wittenauer wrote:
On 9/24/09 7:38 PM, "Paul Smith" wrote:
"I think this could be one of these "If we build it, they will
come"
issues. most of the Hadoop
interested in asking questions or suggesting crucial
feature sets we'd appreciate it.
cheers (and thanks for getting this far in the email.. :) )
Paul Smith
psmith at aconex.com
psmith at apache.org
[1] Performance Co-Pilot (PCP)
http://oss.sgi.com/projects/pcp/index.html
On 25/09/2009, at 3:57 PM, Allen Wittenauer wrote:
On 9/24/09 7:38 PM, "Paul Smith" wrote:
"I think this could be one of these "If we build it, they will come"
issues. most of the Hadoop committers are working in large scale
homogenous environments (lucky them).
; systems without problems. Perhaps then the patch
will be accepted.
In summary, I wouldn't wait for the committers."
cheers,
Paul
I can raised one if you like, I've been a unwell the last few days and
out of the loop, but happy for this to be my first Hadoop JIRA
contribution. :)
Paul
On 24/09/2009, at 2:44 AM, Eli Collins wrote:
These values determine how much HDFS is *not* allowed to use.
There is no
limit o
with this percentage? ("Only
use 75% of available space on the allocated volumes, leaving 25% free
for non-DFS usage", is that correct reading). If that is the case, I
would only use this option right, and not the 'reserved' one ?
many thanks for an awesomely quick reply
es, but this would allow me to setup a reasonable-sized cluster for
some good experiments without clobbering existing processes and work
that are being done.
cheers,
Paul Smith
like +=, etc, and then watch for a change in key between records.
If the current key is different than the last key, print out the last key
and its aggregate values.
-paul
On Mon, Sep 21, 2009 at 3:00 PM, Alex McLintock wrote:
> I think the default chunk size you are referring to is ab
figure out why it's not being loaded correctly?
Thanks as always!
Paul
ld be able to sort the data properly. I'd like
to be able to run it through a mapper and output results as hive tables so
we can then run our aggregations from there.
Thank you.
Paul
can this session tagging piece be done using hadoop? I'm a little
confused on how a mapper would be able to sort the data properly. I'd like
to be able to run it through a mapper and output results as hive tables so
we can then run our aggregations from there.
Thank you.
Paul
31 matches
Mail list logo