There's a balancer available to re-balance DNs across the HDFS cluster
in general. It is available in the $HADOOP_HOME/bin/ directory as
start-balancer.sh
But what I think sqoop implies is that your data is balanced due to
the map jobs it runs for imports (using a provided split factor
between map
You could enable counter features in MultipleOutputs, and then get
each unique name out from the group of counters it'd have created at
job's end.
On Thu, Mar 17, 2011 at 7:53 AM, Jun Young Kim wrote:
> hi,
>
> after completing a job, I want to know the output file names because I used
> Multiple
Sorry about that
FYI, About 1GB/day across 4 collectors at the moment
On 3/16/11 6:55 PM, James Seigel wrote:
I believe sir there should be a flume support group on cloudera. I'm
guessing most of us here haven't used it and therefore aren't much
help.
This is vanilla hadoop land. :)
Cheers a
hi,
after completing a job, I want to know the output file names because I
used MultipleOutoutput class to generate several output files.
do you know how I can get it?
thanks.
--
Junyoung Kim (juneng...@gmail.com)
I believe sir there should be a flume support group on cloudera. I'm
guessing most of us here haven't used it and therefore aren't much
help.
This is vanilla hadoop land. :)
Cheers and good luck!
James
On a side note, how much data are you pumping through it?
Sent from my mobile. Please excus
Sorry if this is not the correct list to post this on, it was the
closest I could find.
We are using a taildir('/var/log/foo/') source on all of our agents. If
this agent goes down and data can not be sent to the collector for some
time, what happens when this agent becomes available again? Wi
Hello,
I have been struggling with decommissioning data nodes. I have a 50+ data
node cluster (no MR) with each server holding about 2TB of storage. I split
the nodes into 2 racks.
I edit the 'exclude' file and then do a -refreshNodes. I see the node
immediate in 'Decommiosied node' and I also
The sqoop documentation seems to imply that it uses the key information
provided to it on the command line to ensure that the SQL data is distributed
evenly across the DFS. However I cannot see any mechanism for achieving this
explicitly other than relying on the implicit distribution provided b
On Mar 16, 2011, at 10:35 AM, W.P. McNeill wrote:
> On HDFS, anyone can run hadoop fs -rmr /* and delete everything.
In addition to what everyone else has said, I'm fairly certain that
-rmr / is specifically safeguarded against. But /* might have slipped through
the cracks.
> What ar
On Mar 16, 2011, at 10:35 AM, W.P. McNeill wrote:
> On HDFS, anyone can run hadoop fs -rmr /* and delete everything.
In addition to what everyone else has said, I'm fairly certain that
-rmr / is specifically safeguarded against. But /* might have slipped through
the cracks.
> What ar
Note that that comment is now 7 years old.
See Mahout for a more modern take on numerics using Hadoop (and other tools)
for scalable machine learning and data mining.
On Wed, Mar 16, 2011 at 10:43 AM, baloodevil wrote:
> See this for comment on java handling numeric calculations like sparse
> m
Hi W.P.,
Hadoop does apply permissions taken from the shell. So, if the directory is
owned by user "brian" and user "ted" does a "rmr /user/brian", then you get a
permission denied error.
By default, this is not safeguarded against malicious users. A malicious user
will do whatever they want
See this for comment on java handling numeric calculations like sparse
matrices...
http://acs.lbl.gov/software/colt/
--
View this message in context:
http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p2688781.html
Sent from the Hadoop lucene-users mailing list archive at
unsubscribe hadoop
W.P is correct, however, that standard techniques like snapshots and mirrors
and point in time backups do not exist in standard hadoop.
This requires a variety of creative work-arounds if you use stock hadoop.
It is not uncommon for people to have memories of either removing everything
or somebod
Hi all
I am working on Hadoop scheduler. But I do not know where to get log from
Hadoop production clusters. Any suggestions?
Bests
Chen
Yes, ${dfs.name.dir} is a NameNode used prop, while the other's a
DataNode used prop.
On Wed, Mar 16, 2011 at 11:41 PM, Mark wrote:
> Ok thanks for the clarification.
>
> Just to be sure though..
>
> - The master will have the ${dfs.name.dir} but not ${dfs.data.dir}
> - The nodes will have ${dfs.
On 03/16/2011 01:35 PM, W.P. McNeill wrote:
On HDFS, anyone can run hadoop fs -rmr /* and delete everything.
Not sure how you have your installation set but on ours (we installed
Cloudera CDH), only user "hadoop" has full read/write access to HDFS.
Since we rarely either login as user hadoop,
Ok thanks for the clarification.
Just to be sure though..
- The master will have the ${dfs.name.dir} but not ${dfs.data.dir}
- The nodes will have ${dfs.data.dir} but not ${dfs.name.dir}
Is that correct?
On 3/16/11 10:43 AM, Harsh J wrote:
NameNode and JobTracker do not require a lot of stora
NameNode and JobTracker do not require a lot of storage space by
themselves. The NameNode needs some space to store its edits and
fsimage, and both require logging space.
However, you may make use of multiple disks for NameNode, in order to
have a redundant backup copy of the NN image available in
On HDFS, anyone can run hadoop fs -rmr /* and delete everything. The
permissions system minimizes the danger of accidental global deletion on
UNIX or NT because you're less likely to type an administrator password by
accident. But HDFS has no such safeguard, and the typo corollary to
Murphy's Law
Hello again.
I am guessing with the lack of response that there are either no hadoop people
from Calgary, or they are afraid to meetup :)
How about just speaking up if you use hadoop in Calgary :)
Cheers
James.
\
On 2011-03-07, at 8:40 PM, James Seigel wrote:
> Hello,
>
> Just wondering if th
I know the master node is responsible for namenode and job tracker, but
other than that is there any data stored on that machine? Basically what
I am asking is should there be an generous amount of free space on that
machine?
So for example I have a large drive I want to swap out of my master
Hi,
Just do context.progress() after small interval of time inside Your
Map/reduce. That will do. If you are using Older package then, you can use
reporter.progress().
Thanks & Regards,
Nitin Khandelwal
On 16 March 2011 21:30, Baran_Cakici wrote:
>
> Hi Everyone,
>
> I make a Project with Hadoo
Hi Everyone,
I make a Project with Hadoop-MapRedeuce for my master-Thesis. I have a
strange problem on my System.
First of all, I use Hadoop-0.20.2 on Windows XP Pro with Eclipse Plug-In.
When I start a job with big Input(4GB - it`s may be not to big, but
algorithm require some time), then i los
Why don't you write up a typical Hello World in C++, then make that run as a
mapper on Hadoop streaming (or pipes). If you send the "Hello World" to cout
(as opposed to cerr or a file or something like that) it will automatically be
interpreted as Hadoop output. Voila! Your first C++ Hadoop p
Caught something today I missed before:
11/03/16 09:32:49 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 10.120.41.105:50010
11/03/16 09:32:49 INFO hdfs.DFSClient: Abandoning block
blk_-517003810449127046_10039793
11/03/16 09:32:49
Thanks. Spent a lot of time looking at logs and nothing on the reducers
until they start complaining about 'could not complete'.
Found this in the jobtracker log file:
2011-03-16 02:38:47,881 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception for block
blk_3829493
Hi Matthew,
you can use iostat -xm 2 to monitor disk usage.
Look at %util column. When numbers are between 90-100% for some devices, you
start to have some processes that are in disk sleep status and you may have
excessive loads.
Use htop to monitor disk sleep processes. Sort on the S column and w
Hi all,
Can someone give pointers on using Iostat to account for IO overheads
(disk read/writes) in a MapReduce job.
Matthew John
C++ programs run on any of the OS they're written for. Hadoop is to be
used as a platform to make these programs work as part of a Map/Reduce
application.
On Wed, Mar 16, 2011 at 12:53 PM, Manish Yadav wrote:
> please dont give me example of word count.i just want want a simple c++
> program to r
please dont give me example of word count.i just want want a simple c++
program to run on hadoop.
32 matches
Mail list logo