Re: ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Datanode state: LV = -19 CTime = 1294051643891 is newer than the namespace state: LV = -19 CTime = 0

2011-01-09 Thread maha
I also use another solution for the namespace incompatibility which is to run : rm -Rf /tmp/hadoop-/ * then format the namenode. Hope that helps, Maha On Jan 9, 2011, at 9:08 PM, Adarsh Sharma wrote: > Shuja Rehman wrote: >> hi >> >> i have format the name node and now when i restart the

Re: ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Datanode state: LV = -19 CTime = 1294051643891 is newer than the namespace state: LV = -19 CTime = 0

2011-01-09 Thread Adarsh Sharma
Shuja Rehman wrote: hi i have format the name node and now when i restart the cluster, i am getting the strange error. kindly let me know how to fix it. thnx / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = hadoop.zoniversal.com/

Re: Too-many fetch failure Reduce Error

2011-01-09 Thread Adarsh Sharma
Esteban Gutierrez Moguel wrote: Adarsh, Dou you have in /etc/hosts the hostnames for masters and slaves? Yes I know this issue. But did you think the error occurs while reading the output of map. I want to know the proper reason of below lines : org.apache.hadoop.util.DiskChecker$DiskErr

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
Hi Arvind, thanks very much for that. Very good to know. Sounds like Sqoop is just what I'm looking for. cheers, Brian On Sun, Jan 9, 2011 at 9:37 PM, arv...@cloudera.com wrote: > Hi Brian, > > Sqoop supports incremental imports that can be run against a live database > system on a daily basis

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
Thanks Jeff, Great info and I really appreciate it. cheers, Brian On Mon, Jan 10, 2011 at 12:00 AM, Jeff Hammerbacher wrote: > Hey Brian, > > One final point about Sqoop: it's a part of Cloudera's Distribution for > Hadoop, so it's Apache 2.0 licensed and tightly integrated with the other > pla

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
Hi Ted, I agree about reducing the quadratic cost and hopefully my reply to Michael will show what my idea has been in this regard. I really appreciate the pointers on LSH and Mahoot and I'll read up on it and see if it helps out. thanks very much for your help. cheers, Brian On Sun, Jan 9, 20

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
Hi Michael, Firstly, thanks for the reply. Secondly, I have to give you credit for the first person who has ever asked me if I want to open up my kimono a little and also the first person on a tech list who has ever made me laugh out loud. :) Ok, I hear you, and you raise some very valid issues s

Re: Import data from mysql

2011-01-09 Thread Jeff Hammerbacher
Hey Brian, One final point about Sqoop: it's a part of Cloudera's Distribution for Hadoop, so it's Apache 2.0 licensed and tightly integrated with the other platform components. This means, for example, that we have added a Sqoop action to Oozie, which makes integrating data import and export into

Re: hadoop ondemand feature?

2011-01-09 Thread Savannah Beckett
btw, when I said "user's information", it is just my system's data.  It can be any kinds of data, like word count.  e.g., I want to process word count of a specific word immediately. Thanks. From: Savannah Beckett To: common-user@hadoop.apache.org Sent: Sun

hadoop ondemand feature?

2011-01-09 Thread Savannah Beckett
Hi,   I know that hadoop is more suitable for patch processing.  But is there a ondemand feature in hadoop?  I want hadoop to process a specific user's information when the user demands it in the web interface.  I am thinking maybe setting a priority to this specific user's information, so hadoo

Re: Import data from mysql

2011-01-09 Thread arv...@cloudera.com
Hi Brian, Sqoop supports incremental imports that can be run against a live database system on a daily basis for importing the new data. Unless your data is large and cannot be split into comparable slices for parallel imports, I do not see any concerns regarding performance. Regarding the databa

Re: Import data from mysql

2011-01-09 Thread Ted Dunning
You still have to knock down the quadratic cost. Any equality checks you have in your problem can be used to limit the problem to growing quadratically in the number of records equal by that comparison. That may be enough to fix things (for now). Unfortunately heavily skewed data are very common

Re: Import data from mysql

2011-01-09 Thread Black, Michael (IS)
All you're doing is delaying the inevitable by going to hadoop. There's no magic to hadoop. It doens't run as fast as individual processes. There's just the ability to split jobs across a cluster which works for some problems. You won't even get a linear improvement in speed. At least I as

RE: EXTERNAL:Re: Import data from mysql

2011-01-09 Thread Black, Michael (IS)
All you're doing is delaying the inevitable by going to hadoop. There's no magic to hadoop. It doens't run as fast as individual processes. There's just the ability to split jobs across a cluster which works for some problems. You won't even get a linear improvement in speed. At least I as

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
Hi Michael, yeah, sorry, I shouldn't have said a compare as that would be a simplified problem. For each two rows I have to calculate a score based on multiplying some of the column values together, running some functions against each other etc. I could do this as the rows are entered into the db,

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
Thanks Ted, You're right but I suppose I was too brief in my initial statement. I should have said that I have to run an operation on all rows with respect to each other. It's not a case of just comparing them and thus sorting them so unfortunately I don't think this will help much. Some of the va

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
Thanks Konstantin, I had seen sqoop. I wonder is it normally used as a once off process or can it also be effectively used on a live database system on a daily basis to batch export. Are there performance issues with this approach? Or how would it compare to some of the other classes that I have s

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
thanks Sonal, I'll check it out On Sun, Jan 9, 2011 at 2:57 AM, Sonal Goyal wrote: > Hi Brian, > > You can check HIHO at https://github.com/sonalgoyal/hiho which can help > you > load data from any JDBC database to the Hadoop file system. If your table > has a date or id field, or any indicator

RE:Import data from mysql

2011-01-09 Thread Black, Michael (IS)
What kind of compare do you have to do? You should be able to compute a checksum or such for each row when you insert them and only have to look at the subset that matches if you're doing some sort of substring or such. Michael D. Black Senior Scientist Advanced Analytics Directorate Northrop

Memory usage in the reducers

2011-01-09 Thread Benjamin . Hiller
Hello everyone, I do not have much practical experience using Hadoop and there is a thing I'd like to know but can't test it at the moment and I'm not sure about it theoretically ;) On a Reducer, do all calls to reduce() share the main memory the Reducer has, or are all those calls sequenti

Re: Import data from mysql

2011-01-09 Thread Ted Dunning
It is, of course, only quadratic, even if you compare all rows to all other rows. You can reduce this cost to O(n log n) by ordinary sorting and you can reduce further reduce the cost to O(n) using radix sort on hashes. Practically speaking, in either the parallel or non parallel setting try sort