Get ResourceSchema during putNext in StoreFunc

2011-01-31 Thread Jacob Perkins
Trying to write a simple storefunc that makes use of the input data's field names. Is there a way to gain access to this inside of the call to putNext? Ostensibly you could set a variable with the schema during the call to checkSchema, eg. in HBaseStorage, but as far as I can tell this is null by t


2011-01-31 Thread Dmitriy Ryaboy
In what way are you gathering results? Solutions typically involve a choice of: * don't -- just read directories * openIterator (which is the same thing, really) * use a single reducer * hadoop fs -cat /path/to/output/* > myoutput * use HAR * write your own Pig (and hadoop) don't store your resul


2011-01-31 Thread anindita.mahapatra
Is there a way to tell pig in map red embedded mode to store all my results in single results file instead of all the parts that it creates that I have to merge afterwards if it is not possible then what is the recommended way to gather the results (using openIterator ?) thanks Anindita

Re: failed to produce result

2011-01-31 Thread Dexin Wang
Thanks. That URL doesn't tell much. * * *Job Name:* Job387546913066708402.jar *Job File:* hdfs://hadoop-name01/hadoop/mapred/system/job_201101260357_3230/job.xml *Job Setup:*None *Status:* Failed *Started at:* Mon Jan 31 15:48:36 CST 2011 *Failed at:* Mon Jan 31 15:48:36 CST 2011 *Failed in:* 0sec

Re: failed to produce result

2011-01-31 Thread Thejas M Nair
The logs say that the map-reduce job failed. Can you check the log files of the failed map-reduce tasks ? You can follow the jobtracker url in the log message - http://hadoop-name02:50030/jobdetails.jsp?jobid=job_201101260357_3230 -Thejas On 1/31/11 1:54 PM, "Dexin Wang" wrote: Hi, I found

failed to produce result

2011-01-31 Thread Dexin Wang
Hi, I found similar problems on the web but didn't find a solution for it so I'm asking here. I have some pig job that has been working fine for couple of months and it started failing. But the same job still works if run as another account. I narrowed it a bit and found that the problematic user

Re: How to throw away aliases ?

2011-01-31 Thread Thejas M Nair
Hi Robert, I am not sure if I have understood what you are saying here. An example might help. Not all aliases (ie relation corresponding to them) are stored on disk, only ones stored explicitly using store statement or ones that happen to be at boundaries of map and reduce get (temporarily) store

Re: Problems with STORE

2011-01-31 Thread Dmitriy Ryaboy
The directory it's trying to create is on the local file system of a node (it's temp storage), not in hdfs. Do you have /rawfiles/ set up as temp storage for Hadoop? -D On Mon, Jan 31, 2011 at 10:29 AM, Kris Coward wrote: > > So I have a relation apa which when DUMPed, ends up getting output ju

Splitting Strategy When Records Flow From the Net at Runtime

2011-01-31 Thread Andreas Paepcke
I pull records from a remote Web site. I have a subclass of RecordReader, which knows how to retrieve those records one by one from a Web stream. The Web site is set up such that I can run multiple such readers, each pulling a distinct subset of the records from the site. My strategy plan: In my s

StoreFunc Schema

2011-01-31 Thread Dan Harvey
Hey, I'm just porting a json StoreFunc class method I wrote from pig 0.6 to pig 0.8 so I can take advantage of the schema that the Store's can use from 0.7 onwards. I'm overloaded the method to get the chema when the saving starts but am finding the ResourceSchema object being sent is always null

Problems with STORE

2011-01-31 Thread Kris Coward
So I have a relation apa which when DUMPed, ends up getting output just fine, but when I run STORE apa INTO '/rawfiles/f3453efd460348bbaeee2e9496e25871/1294311600/apa' USING PigStorage(','); I get the following error: Mkdirs failed to create file:/rawfiles/f3453efd460348bba

Re: grouping data based on variable number of keys

2011-01-31 Thread Jonathan Coveney
The only thing I could think of would be to feed all of your potential keys to a UDF which then processes them, creates a tuple which is the new, actual key, and then you group and whatnot on that. 2011/1/28 Kunal Nawale > Hi, > I have a relation R as (a, b, c, d, e) > > I need to group data, b

How to throw away aliases ?

2011-01-31 Thread Robert Waddell
Hey Guys, I was just wondering if anyone knew a way to essentially de-reference aliases. The reason I am asking this as I am looking for performance improvements, and since the aliases need to be stored and loaded back in at the end/start at each mapreduce, I was wondering how I can throw away the