Re: How to use tuples ?

2012-02-05 Thread Daniel Dai
I guess you mean to load a bag. Your input file should be: {(1,2,3),(2,4,5)} {(2,3,4),(2,3,5)} And load statement should be: z = load 'tmp.txt' as (b:{(a0:int,a1:int,a2:int)}); Daniel On Thu, Feb 2, 2012 at 2:43 AM, praveenesh kumar wrote: > Okie so its wierd. > > I was able to run a pig query

How to load customized map data schema

2012-02-05 Thread Haitao Yao
Hi, all out data format for map is Key:Value|Key:Value , how can I load the data into map type? Can pig define the map delimiter like hive? thanks.

Re: Passing schema inside Load functionc

2012-02-05 Thread praveenesh kumar
Okie.. so how can I make use of -schema option with PigStorage. Suppose my Jscon schema is - { "name":"Student_Data", "properties": { "id": { "type":"INTEGER", "description":"Student id"

Re:Re: Could pig-0.9.1 work with hadoop-1.0.0 and Hbase-0.90.5?

2012-02-05 Thread lulynn_2008
Thank you for your quick response. At 2012-02-06 14:46:46,"Dmitriy Ryaboy" wrote: >It should work with hadoop-1.0 and hbase 0.90.*. Zookeeper is only shipped >as part of hbase. Pig does not use it directly, it's >a transitive dependency via hbase. > >2012/2/5 lulynn_2008 > >> Hello, >> I ha

Re: Async job submission via PigServer

2012-02-05 Thread Daniel Dai
There is no asynchronous API for Pig. However, Pig does have a notification mechanism (See PigRunner.run), you can create a separate thread to simulate the asynchronous call. Daniel On Fri, Feb 3, 2012 at 12:34 AM, Michael Lok wrote: > Hi folks, > > I was wondering if it's possible to submit reg

Re: Passing schema inside Load functionc

2012-02-05 Thread Dmitriy Ryaboy
It's a json serialization of the Pig schema object, and isn't really meant to be created by hand. Patches to make it more human-friendly would be quite welcome. D On Sun, Feb 5, 2012 at 10:35 PM, praveenesh kumar wrote: > Thanks, > I was also looking for -schema option in PigStorage. > But Can a

Re: Could pig-0.9.1 work with hadoop-1.0.0 and Hbase-0.90.5?

2012-02-05 Thread Dmitriy Ryaboy
It should work with hadoop-1.0 and hbase 0.90.*. Zookeeper is only shipped as part of hbase. Pig does not use it directly, it's a transitive dependency via hbase. 2012/2/5 lulynn_2008 > Hello, > I have a question about pig-0.9.1: > Could pig-0.9.1 work with hadoop-1.0.0 and hbase-0.90.5? I plan

Could pig-0.9.1 work with hadoop-1.0.0 and Hbase-0.90.5?

2012-02-05 Thread lulynn_2008
Hello, I have a question about pig-0.9.1: Could pig-0.9.1 work with hadoop-1.0.0 and hbase-0.90.5? I planed to verify this by running UT. Please give your suggestions. Besides, I found zookeeper in pig, my questions are: --what zookeeper is used for pig? --is zookeeper used for pig mainline funct

Re: Passing schema inside Load functionc

2012-02-05 Thread praveenesh kumar
Thanks, I was also looking for -schema option in PigStorage. But Can anyone explain how can we define that json schema file. Some tutorial/small example would be very helpful. Praveenesh On Mon, Feb 6, 2012 at 11:55 AM, Dmitriy Ryaboy wrote: > It's pretty straightforward, that's why the LoadMet

Re: Way of determining the source of data

2012-02-05 Thread Daniel Dai
Check https://cwiki.apache.org/confluence/display/PIG/FAQ#FAQ-Q%3AIloaddatafromadirectorywhichcontainsdifferentfile.HowdoIfindoutwherethedatacomesfrom%3F On Thu, Feb 2, 2012 at 5:11 PM, Ranjan Bagchi wrote: > Hi, > > I've a bunch of [for example] apache logfiles that I'm searching through.  I >

Re: Passing schema inside Load functionc

2012-02-05 Thread Dmitriy Ryaboy
It's pretty straightforward, that's why the LoadMetadata interface exists. You just have to implement it and translate however you store the schema to a Pig Schema object. PigStorageSchema will read a json file that describes the schema, you can look at how that's done there (actually, PigStorage

Re: Bug in REGEX_EXTRACT?

2012-02-05 Thread Dmitriy Ryaboy
I think the intent is to behave the same way as the Pig "matches" operator (which, unsurprisingly, uses the Java matches method). RegexExtractAll becomes quite confusing if it means "extract all matched subexpressions of the first match of the expression" (one might expect "all" to refer to all ma

Re: ONERROR

2012-02-05 Thread Daniel Dai
No, there is no ONERROR handle right now. Daniel On Sat, Feb 4, 2012 at 7:11 PM, Russell Jurney wrote: > Did ONERROR ever get built?  I have a few bad datetimes out of many failing > to parse, and I don't want my entire pig script dying because I lost a few > rows. > > http://wiki.apache.org/pig

Re: Jython UDF problem

2012-02-05 Thread Daniel Dai
Seems like a bug in jython: >>> import time >>> tuple_time = time.strptime('2006-10-16T08:19:39', "%Y-%m-%dT%H:%M:%S") >>> tuple_time.tm_hour Traceback (most recent call last): File "", line 1, in AttributeError: 'tuple' object has no attribute 'tm_hour' >>> tuple_time[3] 8 Change return str(tu

Re: MongoStorage broken

2012-02-05 Thread Russell Jurney
To answer my own question, this is because the schemas differ. The schema in the working case has a named tuple via AvroStorage. Storing to Mongo works when I name the tuple: ... sent_topics = FOREACH froms GENERATE FLATTEN(group) AS (from, to), pairs.subject AS pairs:bag {column:tuple (subject:

Re: MongoStorage broken

2012-02-05 Thread Russell Jurney
sent_topics = LOAD '/tmp/pair_titles.avro' USING AvroStorage(); STORE sent_topics INTO 'mongodb://localhost/test.pigola' USING MongoStorage(); That works. Why is it the case that MongoStorage only works if the intermediate processing doesn't happen? Strangeness. On Sun, Feb 5, 2012 at 12:31 AM,

Re: Jython UDF problem

2012-02-05 Thread Aniket Mokashi
Looks like this is jython bug. Btw, afaik, the return type of this function would be a bytearray if decorator is not specified. Thanks, Aniket On Sat, Feb 4, 2012 at 9:39 PM, Russell Jurney wrote: > Why am I having tuple objects in my python udfs? This isn't how the > examples work. > > Error: