Trying to write a simple storefunc that makes use of the input data's
field names. Is there a way to gain access to this inside of the call to
putNext? Ostensibly you could set a variable with the schema during the
call to checkSchema, eg. in HBaseStorage, but as far as I can tell this
is null by t
In what way are you gathering results?
Solutions typically involve a choice of:
* don't -- just read directories
* openIterator (which is the same thing, really)
* use a single reducer
* hadoop fs -cat /path/to/output/* > myoutput
* use HAR
* write your own
Pig (and hadoop) don't store your resul
Is there a way to tell pig in map red embedded mode
to store all my results in single results file
instead of all the parts that it creates
that I have to merge afterwards
if it is not possible then
what is the recommended way to gather the results (using openIterator ?)
thanks
Anindita
Thanks. That URL doesn't tell much.
*
*
*Job Name:* Job387546913066708402.jar
*Job File:*
hdfs://hadoop-name01/hadoop/mapred/system/job_201101260357_3230/job.xml
*Job Setup:*None
*Status:* Failed
*Started at:* Mon Jan 31 15:48:36 CST 2011
*Failed at:* Mon Jan 31 15:48:36 CST 2011
*Failed in:* 0sec
The logs say that the map-reduce job failed. Can you check the log files of the
failed map-reduce tasks ?
You can follow the jobtracker url in the log message -
http://hadoop-name02:50030/jobdetails.jsp?jobid=job_201101260357_3230
-Thejas
On 1/31/11 1:54 PM, "Dexin Wang" wrote:
Hi,
I found
Hi,
I found similar problems on the web but didn't find a solution for it so I'm
asking here.
I have some pig job that has been working fine for couple of months and it
started failing. But the same job still works if run as another account. I
narrowed it a bit and found that the problematic user
Hi Robert,
I am not sure if I have understood what you are saying here. An example
might help.
Not all aliases (ie relation corresponding to them) are stored on disk, only
ones stored explicitly using store statement or ones that happen to be at
boundaries of map and reduce get (temporarily) store
The directory it's trying to create is on the local file system of a node
(it's temp storage), not in hdfs.
Do you have /rawfiles/ set up as temp storage for Hadoop?
-D
On Mon, Jan 31, 2011 at 10:29 AM, Kris Coward wrote:
>
> So I have a relation apa which when DUMPed, ends up getting output ju
I pull records from a remote Web site. I have a subclass of
RecordReader, which knows how to retrieve those records one by one
from a Web stream. The Web site is set up such that I can run multiple
such readers, each pulling a distinct subset of the records from the
site.
My strategy plan: In my s
Hey,
I'm just porting a json StoreFunc class method I wrote from pig 0.6 to pig
0.8 so I can take advantage of the schema that the Store's can use from 0.7
onwards.
I'm overloaded the method to get the chema when the saving starts but am
finding the ResourceSchema object being sent is always null
So I have a relation apa which when DUMPed, ends up getting output just
fine, but when I run
STORE apa INTO '/rawfiles/f3453efd460348bbaeee2e9496e25871/1294311600/apa'
USING PigStorage(',');
I get the following error:
java.io.IOException: Mkdirs failed to create
file:/rawfiles/f3453efd460348bba
The only thing I could think of would be to feed all of your potential keys
to a UDF which then processes them, creates a tuple which is the new, actual
key, and then you group and whatnot on that.
2011/1/28 Kunal Nawale
> Hi,
> I have a relation R as (a, b, c, d, e)
>
> I need to group data, b
Hey Guys,
I was just wondering if anyone knew a way to essentially de-reference
aliases. The reason I am asking this as I am looking for performance
improvements, and since the aliases need to be stored and loaded back in at
the end/start at each mapreduce, I was wondering how I can throw away the
13 matches
Mail list logo