Re: How does sqoop distribute it's data evenly across HDFS?

2011-03-16 Thread Harsh J
There's a balancer available to re-balance DNs across the HDFS cluster in general. It is available in the $HADOOP_HOME/bin/ directory as start-balancer.sh But what I think sqoop implies is that your data is balanced due to the map jobs it runs for imports (using a provided split factor between map

Re: how am I able to get output file names?

2011-03-16 Thread Harsh J
You could enable counter features in MultipleOutputs, and then get each unique name out from the group of counters it'd have created at job's end. On Thu, Mar 17, 2011 at 7:53 AM, Jun Young Kim wrote: > hi, > > after completing a job, I want to know the output file names because I used > Multiple

Re: Cloudera Flume

2011-03-16 Thread Mark
Sorry about that FYI, About 1GB/day across 4 collectors at the moment On 3/16/11 6:55 PM, James Seigel wrote: I believe sir there should be a flume support group on cloudera. I'm guessing most of us here haven't used it and therefore aren't much help. This is vanilla hadoop land. :) Cheers a

how am I able to get output file names?

2011-03-16 Thread Jun Young Kim
hi, after completing a job, I want to know the output file names because I used MultipleOutoutput class to generate several output files. do you know how I can get it? thanks. -- Junyoung Kim (juneng...@gmail.com)

Re: Cloudera Flume

2011-03-16 Thread James Seigel
I believe sir there should be a flume support group on cloudera. I'm guessing most of us here haven't used it and therefore aren't much help. This is vanilla hadoop land. :) Cheers and good luck! James On a side note, how much data are you pumping through it? Sent from my mobile. Please excus

Cloudera Flume

2011-03-16 Thread Mark
Sorry if this is not the correct list to post this on, it was the closest I could find. We are using a taildir('/var/log/foo/') source on all of our agents. If this agent goes down and data can not be sent to the collector for some time, what happens when this agent becomes available again? Wi

decommissioning node woes

2011-03-16 Thread Rita
Hello, I have been struggling with decommissioning data nodes. I have a 50+ data node cluster (no MR) with each server holding about 2TB of storage. I split the nodes into 2 racks. I edit the 'exclude' file and then do a -refreshNodes. I see the node immediate in 'Decommiosied node' and I also

How does sqoop distribute it's data evenly across HDFS?

2011-03-16 Thread BeThere
The sqoop documentation seems to imply that it uses the key information provided to it on the command line to ensure that the SQL data is distributed evenly across the DFS. However I cannot see any mechanism for achieving this explicitly other than relying on the implicit distribution provided b

Re: hadoop fs -rmr /*?

2011-03-16 Thread Allen Wittenauer
On Mar 16, 2011, at 10:35 AM, W.P. McNeill wrote: > On HDFS, anyone can run hadoop fs -rmr /* and delete everything. In addition to what everyone else has said, I'm fairly certain that -rmr / is specifically safeguarded against. But /* might have slipped through the cracks. > What ar

Re: hadoop fs -rmr /*?

2011-03-16 Thread Allen Wittenauer
On Mar 16, 2011, at 10:35 AM, W.P. McNeill wrote: > On HDFS, anyone can run hadoop fs -rmr /* and delete everything. In addition to what everyone else has said, I'm fairly certain that -rmr / is specifically safeguarded against. But /* might have slipped through the cracks. > What ar

Re: Why hadoop is written in java?

2011-03-16 Thread Ted Dunning
Note that that comment is now 7 years old. See Mahout for a more modern take on numerics using Hadoop (and other tools) for scalable machine learning and data mining. On Wed, Mar 16, 2011 at 10:43 AM, baloodevil wrote: > See this for comment on java handling numeric calculations like sparse > m

Re: hadoop fs -rmr /*?

2011-03-16 Thread Brian Bockelman
Hi W.P., Hadoop does apply permissions taken from the shell. So, if the directory is owned by user "brian" and user "ted" does a "rmr /user/brian", then you get a permission denied error. By default, this is not safeguarded against malicious users. A malicious user will do whatever they want

Re: Why hadoop is written in java?

2011-03-16 Thread baloodevil
See this for comment on java handling numeric calculations like sparse matrices... http://acs.lbl.gov/software/colt/ -- View this message in context: http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p2688781.html Sent from the Hadoop lucene-users mailing list archive at

unsubscribe hadoop

2011-03-16 Thread Thanh Pham
unsubscribe hadoop

Re: hadoop fs -rmr /*?

2011-03-16 Thread Ted Dunning
W.P is correct, however, that standard techniques like snapshots and mirrors and point in time backups do not exist in standard hadoop. This requires a variety of creative work-arounds if you use stock hadoop. It is not uncommon for people to have memories of either removing everything or somebod

Any one know where to get Hadoop production cluster log

2011-03-16 Thread He Chen
Hi all I am working on Hadoop scheduler. But I do not know where to get log from Hadoop production clusters. Any suggestions? Bests Chen

Re: Question on Master

2011-03-16 Thread Harsh J
Yes, ${dfs.name.dir} is a NameNode used prop, while the other's a DataNode used prop. On Wed, Mar 16, 2011 at 11:41 PM, Mark wrote: > Ok thanks for the clarification. > > Just to be sure though.. > > - The master will have the ${dfs.name.dir} but not ${dfs.data.dir} > - The nodes will have ${dfs.

Re: hadoop fs -rmr /*?

2011-03-16 Thread David Rosenstrauch
On 03/16/2011 01:35 PM, W.P. McNeill wrote: On HDFS, anyone can run hadoop fs -rmr /* and delete everything. Not sure how you have your installation set but on ours (we installed Cloudera CDH), only user "hadoop" has full read/write access to HDFS. Since we rarely either login as user hadoop,

Re: Question on Master

2011-03-16 Thread Mark
Ok thanks for the clarification. Just to be sure though.. - The master will have the ${dfs.name.dir} but not ${dfs.data.dir} - The nodes will have ${dfs.data.dir} but not ${dfs.name.dir} Is that correct? On 3/16/11 10:43 AM, Harsh J wrote: NameNode and JobTracker do not require a lot of stora

Re: Question on Master

2011-03-16 Thread Harsh J
NameNode and JobTracker do not require a lot of storage space by themselves. The NameNode needs some space to store its edits and fsimage, and both require logging space. However, you may make use of multiple disks for NameNode, in order to have a redundant backup copy of the NN image available in

hadoop fs -rmr /*?

2011-03-16 Thread W.P. McNeill
On HDFS, anyone can run hadoop fs -rmr /* and delete everything. The permissions system minimizes the danger of accidental global deletion on UNIX or NT because you're less likely to type an administrator password by accident. But HDFS has no such safeguard, and the typo corollary to Murphy's Law

Re: YYC/Calgary/Alberta Hadoop Users?

2011-03-16 Thread James Seigel
Hello again. I am guessing with the lack of response that there are either no hadoop people from Calgary, or they are afraid to meetup :) How about just speaking up if you use hadoop in Calgary :) Cheers James. \ On 2011-03-07, at 8:40 PM, James Seigel wrote: > Hello, > > Just wondering if th

Question on Master

2011-03-16 Thread Mark
I know the master node is responsible for namenode and job tracker, but other than that is there any data stored on that machine? Basically what I am asking is should there be an generous amount of free space on that machine? So for example I have a large drive I want to swap out of my master

Re: Lost Task Tracker because of no heartbeat

2011-03-16 Thread Nitin Khandelwal
Hi, Just do context.progress() after small interval of time inside Your Map/reduce. That will do. If you are using Older package then, you can use reporter.progress(). Thanks & Regards, Nitin Khandelwal On 16 March 2011 21:30, Baran_Cakici wrote: > > Hi Everyone, > > I make a Project with Hadoo

Lost Task Tracker because of no heartbeat

2011-03-16 Thread Baran_Cakici
Hi Everyone, I make a Project with Hadoop-MapRedeuce for my master-Thesis. I have a strange problem on my System. First of all, I use Hadoop-0.20.2 on Windows XP Pro with Eclipse Plug-In. When I start a job with big Input(4GB - it`s may be not to big, but algorithm require some time), then i los

Re: c++ problem

2011-03-16 Thread Keith Wiley
Why don't you write up a typical Hello World in C++, then make that run as a mapper on Hadoop streaming (or pipes). If you send the "Hello World" to cout (as opposed to cerr or a file or something like that) it will automatically be interpreted as Hadoop output. Voila! Your first C++ Hadoop p

Re: DFSClient: Could not complete file

2011-03-16 Thread Chris Curtin
Caught something today I missed before: 11/03/16 09:32:49 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 10.120.41.105:50010 11/03/16 09:32:49 INFO hdfs.DFSClient: Abandoning block blk_-517003810449127046_10039793 11/03/16 09:32:49

Re: DFSClient: Could not complete file

2011-03-16 Thread Chris Curtin
Thanks. Spent a lot of time looking at logs and nothing on the reducers until they start complaining about 'could not complete'. Found this in the jobtracker log file: 2011-03-16 02:38:47,881 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_3829493

Re: Iostat on Hadoop

2011-03-16 Thread Jérôme Thièvre INA
Hi Matthew, you can use iostat -xm 2 to monitor disk usage. Look at %util column. When numbers are between 90-100% for some devices, you start to have some processes that are in disk sleep status and you may have excessive loads. Use htop to monitor disk sleep processes. Sort on the S column and w

Iostat on Hadoop

2011-03-16 Thread Matthew John
Hi all, Can someone give pointers on using Iostat to account for IO overheads (disk read/writes) in a MapReduce job. Matthew John

Re: c++ problem

2011-03-16 Thread Harsh J
C++ programs run on any of the OS they're written for. Hadoop is to be used as a platform to make these programs work as part of a Map/Reduce application. On Wed, Mar 16, 2011 at 12:53 PM, Manish Yadav wrote: > please dont give me example of word count.i just want want a simple c++ > program to r

c++ problem

2011-03-16 Thread Manish Yadav
please dont give me example of word count.i just want want a simple c++ program to run on hadoop.