Re: HDFS out of space

2009-06-22 Thread Kris Jirapinyo
know, >>> there is no way to balance an individual DataNode's hard drives (Hadoop >>> does >>> round-robin scheduling when writing data). >>> >>> Alex >>> >>> On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo >> >>>>

HDFS out of space

2009-06-22 Thread Kris Jirapinyo
Hi all, How does one handle a mount running out of space for HDFS? We have two disks mounted on /mnt and /mnt2 respectively on one of the machines that are used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is there a way to tell the machine to balance itself out? I know for the cluste

Re: Fastlz coming?

2009-06-04 Thread Kris Jirapinyo
at some of the previous lzo threads on this list for > help. > > See: > http://www.mail-archive.com/search?q=lzo&l=core-user%40hadoop.apache.org > > -Matt > > > On Jun 4, 2009, at 10:29 AM, Kris Jirapinyo wrote: > > Is there any documentation on that site on how we

Re: Fastlz coming?

2009-06-04 Thread Kris Jirapinyo
ks, Kris J. On Thu, Jun 4, 2009 at 3:02 AM, Johan Oskarsson wrote: > We're using Lzo still, works great for those big log files: > http://code.google.com/p/hadoop-gpl-compression/ > > /Johan > > Kris Jirapinyo wrote: > > Hi all, > >In the remove lzo JIRA tic

Fastlz coming?

2009-06-03 Thread Kris Jirapinyo
Hi all, In the remove lzo JIRA ticket https://issues.apache.org/jira/browse/HADOOP-4874 Tatu mentioned he was going to port fastlz from C to Java and provide a patch. Has there been any updates on that? Or is anyone working on any additional custom compression codecs? Thanks, Kris J.

Re: Persistent HDFS On EC2

2009-03-11 Thread Kris Jirapinyo
ect/entry.jspa?externalID=873&categoryID=112 > > Regards, > > - Adam > > > > Kris Jirapinyo wrote: > >> Why would you lose the locality of storage-per-machine if one EBS volume >> is >> mounted to each machine instance? When that machine goes down, you

Re: Persistent HDFS On EC2

2009-03-11 Thread Kris Jirapinyo
Why would you lose the locality of storage-per-machine if one EBS volume is mounted to each machine instance? When that machine goes down, you can just restart the instance and re-mount the exact same volume. I've tried this idea before successfully on a 10 node cluster on EC2, and didn't see any

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Kris Jirapinyo
/2/13 Amandeep Khurana > What you can probably do is have the combine function do some reducing > before the single reducer starts off. That might help. > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > 2009/2/13 Kris

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Kris Jirapinyo
are completed. You can set this in your job conf by using > conf.setNumReducers(1) > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > 2009/2/13 Kris Jirapinyo > > > What do you mean when I have only 1 reducer? &g

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Kris Jirapinyo
What do you mean when I have only 1 reducer? On Fri, Feb 13, 2009 at 4:11 PM, Rasit OZDAS wrote: > Kris, > This is the case when you have only 1 reducer. > If it doesn't have any side effects for you.. > > Rasit > > > 2009/2/14 Kris Jirapinyo : > > Is there

Running Map and Reduce Sequentially

2009-02-13 Thread Kris Jirapinyo
Is there a way to tell Hadoop to not run Map and Reduce concurrently? I'm running into a problem where I set the jvm to Xmx768 and it seems like 2 mappers and 2 reducers are running on each machine that only has 1.7GB of ram, so it complains of not being able to allocate memory...(which makes sens

Re: Reducer Out of Memory

2009-02-11 Thread Kris Jirapinyo
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) ... 10 more On Wed, Feb 11, 2009 at 7:02 PM, Rocks Lei Wang wrote: > Maybe you need allocate larger vm- memory to use parameter -Xmx1024m > > On Thu, Feb 12, 2009 at 10:56 AM, Kris Jirapinyo >wrote: > > > Hi all, > >

Re: Reducer Out of Memory

2009-02-11 Thread Kris Jirapinyo
r one directory. On Wed, Feb 11, 2009 at 6:56 PM, Kris Jirapinyo wrote: > Hi all, > I am running a data-intensive job on 18 nodes on EC2, each with just > 1.7GB of memory. The input size is 50GB, and as a result, my mapper splits > it up automatically to 786 map tasks. This ru

Reducer Out of Memory

2009-02-11 Thread Kris Jirapinyo
Hi all, I am running a data-intensive job on 18 nodes on EC2, each with just 1.7GB of memory. The input size is 50GB, and as a result, my mapper splits it up automatically to 786 map tasks. This runs fine. However, I am setting the reduce task number to 18. This is where I get a java heap o

Counters in Hadoop

2009-01-29 Thread Kris Jirapinyo
Hi all, I am using counters in Hadoop via the reporter. I can see this custom counter fine after I run my job. However, if somehow I restart the cluster, then when I look into the Hadoop Job History, I can't seem to find the information of my previous counter values anywhere. Where is it sto