Re: copyToLocal

2010-06-10 Thread Joseph Stein
/2010_06_10/81ae7c24745211df9f6d002590008422 already was existing because my script created it the first time). Still odd behavior, whatever it is fine sorry to bother... i just added to my automation script to remove the directory before i do a copyToLocal. On Thu, Jun 10, 2010 at 7:12 PM, Joseph

copyToLocal

2010-06-10 Thread Joseph Stein
Hi, so ok am using copyToLocal through an automation script we have and seeing odd results. I am not sure if this is something I am doing wrong, defect, or known good reason for it. Let me know I would like to correct this in either my own script, happy to give a try in the fs code fixing a bug

Re: Fundamental question

2010-05-09 Thread Joseph Stein
1) only the namenode is "formatted" and what happens is basically the image file is created and prepped. The image file holds the meta data about how your files are stored on the cluster. 2) The datanodes are not formatted in the conventional sense. Their (datanode) disk usage will grow only wh

Re: hanging task

2010-05-07 Thread Joseph Stein
You need to either report status or increment a counter from within your task. In your Java code there is a little trick to help the job be “aware” within the cluster of tasks that are not dead but just working hard. During execution of a task there is no built in reporting that the job is running

Hadoop Podcast, Guest Spots

2010-04-19 Thread Joseph Stein
Hadoopers & Hadooperets, I wanted to see if any folks would be interested in being a guest on a new podcast we ( www.medialets.com ) are very seriously thinking about producing & hosting specifically to talk about Hadoop. This is still pre-production phase now but we are starting to firm it up as

Re: DeDuplication Techniques

2010-03-31 Thread Joseph Stein
ble in Step > 3, and in Step 2, you just use table.exists(hash-key) to check if it is a > dup. You still need Step 1 to populate the table with your historical data. > > Hope this helps > > Cheers, > jp > > > -Original Message- > From: Joseph Stein [mail

Re: DeDuplication Techniques

2010-03-25 Thread Joseph Stein
you can loop > through them. As the simplest solution, you just take the first one. > > Sincerely, > Mark > > On Thu, Mar 25, 2010 at 1:09 PM, Joseph Stein wrote: > >> I have been researching ways to handle de-dupping data while running a >> map/reduce program (so

DeDuplication Techniques

2010-03-25 Thread Joseph Stein
I have been researching ways to handle de-dupping data while running a map/reduce program (so as to not re-calculate/re-aggregate data that we have seen before[possibly months before]). The data sets we have are littered with repeats of data from mobile devices which continue to come in over time