Re: Making sure the tmp directory is cleaned?

2009-06-22 Thread Qin Gao
-only, and our program will generate new temporary files. --Q On Mon, Jun 22, 2009 at 4:19 PM, Pankil Doshi forpan...@gmail.com wrote: Yes, If your job gets completed successfully .possibly it removes after completion of both map and reduce tasks. Pankil On Mon, Jun 22, 2009 at 3:15 PM, Qin

Re: Making sure the tmp directory is cleaned?

2009-06-22 Thread Qin Gao
clean it up yourself it will eat up your disk space. Pankil On Mon, Jun 22, 2009 at 4:24 PM, Qin Gao q...@cs.cmu.edu wrote: Thanks! But what if the jobs get killed or failed? Does hadoop try to clean it? we are considering bad situations - if job gets killed, will the tmp dirs sit

Re: Know how many records remain?

2008-08-21 Thread Qin Gao
, 2008 at 1:43 PM, Qin Gao [EMAIL PROTECTED] wrote: Thanks Chris, that's exactly what I am trying to do. It solves my problem. On Wed, Aug 20, 2008 at 4:36 PM, Chris Dyer [EMAIL PROTECTED] wrote: Qin, since I can guess what you're trying to do with this (emit a bunch of expected counts

Get information of input split from MapRunner?

2008-08-21 Thread Qin Gao
Hi mailing, I want to get information of current input split inside the MapRunner object (or map function), however the only object I can get from the MapRunner is the RecordReader, and I saw no method defined in RecordReader to fetch the InputSplit object. Do you have any suggestions on this?

Know how many records remain?

2008-08-20 Thread Qin Gao
Hi mailing, Are there any way to know whether the mapper is processing the last record that assigned to this node, or know how many records remain to be processed in this node? Qin

Re: Linux server clustered HDFS: access from Windows eclipse Java application

2008-08-05 Thread Qin Gao
I think IBM has a plugin that can access HDFS, I don't know whether it contains source code, but maybe it helps. www.alphaworks.*ibm*.com/tech/mapreducetools On Tue, Aug 5, 2008 at 5:16 AM, Alberto Forcén [EMAIL PROTECTED] wrote: Hi all. I'm running a clustering HDFS on linux and I need to

Re: data partitioning question

2008-08-04 Thread Qin Gao
For the first question, I think it is better to do it at reduce stage, because the partitioner only consider the size of blocks in bytes. Instead you can output the intermediate key/value pair as this: key: 1 if C=1,3,5,7. 0 otherwise value: the tuple. In reducer you can have a reducer deal

allocate/deallocate inside java code?

2008-08-02 Thread Qin Gao
Hi, all I have a question on using hod, I want to allocate a cluster insider the java program and deallocate it, because it take quite a long time only to put and get data to/from HDFS, thus it uses only one machine. However if I allocate the cluster outside, it will not do anything at the

Re: iterative map-reduce

2008-07-29 Thread Qin Gao
if you are using java, just create job configure again and run it, otherwise you just need to write a iterative script. On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen [EMAIL PROTECTED] wrote: Hi, I want to call a map-reduce program recursively until some condition is met. How do I do that?

Re: iterative map-reduce

2008-07-29 Thread Qin Gao
the iterative script be run outside of Hadoop? I was actually trying to figure out if the framework could handle iterations. Shirley On Jul 29, 2008, at 9:10 AM, Qin Gao wrote: if you are using java, just create job configure again and run it, otherwise you just need to write a iterative

Help: specifying different input/output class for combiner and reducer

2008-07-27 Thread Qin Gao
Hi all, I am trying to specify different key/value classes for combiner and reducer in my task, for example, I want the mapper to output integer==(integer,float) pair, and then the combiner outputs integer==some structure. Finally the reducer takes in integer==some structure and output