Re: Schema issue while storing multiple pig outputs using CSVExcelStorage

2016-07-11 Thread Rohini Palaniswamy
rath Sasidharan < > ssasidha...@bol.com> wrote: > > > Hi All, > > I have a script which stores 2 relations with different schema using > CSVExcelStorage. > > The issue which i see is that the script picks up the last store function > and takes the schema in that

Re: Schema issue while storing multiple pig outputs using CSVExcelStorage

2016-07-04 Thread Sarath Sasidharan
Hi Eyal, 1.I have created a ticket : PIG-4943<https://issues.apache.org/jira/browse/PIG-4943> Thanks and Regards, Sarath From: Eyal Allweil Reply-To: Eyal Allweil Date: Monday 4 July 2016 at 16:05 To: "user@pig.apache.org" , Sarath Sasidharan Subject: Re: Schema iss

Re: Schema issue while storing multiple pig outputs using CSVExcelStorage

2016-07-04 Thread Eyal Allweil
I can replicate these results on Pig 0.14. Did anyone open a Jira issue for this? On Thursday, March 10, 2016 12:24 PM, Sarath Sasidharan wrote: Hi All, I have a script which stores 2 relations with different schema using CSVExcelStorage. The issue which i see is that the script

Schema issue while storing multiple pig outputs using CSVExcelStorage

2016-03-10 Thread Sarath Sasidharan
Hi All, I have a script which stores 2 relations with different schema using CSVExcelStorage. The issue which i see is that the script picks up the last store function and takes the schema in that and puts it for all store functions , overriding the previous store schemas.Is this a known

Pig - outputSchema - create schema for tuple

2016-01-12 Thread John Smith
Im trying to define output schema which should be Tuple that contains another two tuples, i.e `stats:tuple(c:tuple(),d:tuple)`. The code below doesnt work as it was intended. It somehow produces structure as: stats:tuple(b:tuple(c:tuple(),d:tuple())) Below is output produced by describe

problem with the schema geneation - outputSchema

2015-12-16 Thread John Smith
internal logic inside UDF. My problem is that I cant pass any temporary variable from the exec() method into outputSchema(Schema input) method which is part of the UDF class. The temporary variable contains information needed to generate valid output schema inside outputSchema(), eg. size of the tuples

Re: Schema changes based on subquery

2015-11-20 Thread Debabrata Pani
; generate COUNT_STAR($1) as TARGET; >> }; >> d = limit c 10; >> e = foreach d generate TARGET; >> dump e; >> >> end output ... >> (1) >> >> >> *Cheers !!* >> Arvind >> >> On Sat, Nov 14, 2015 at 12:18 AM, Chr

Re: Schema changes based on subquery

2015-11-20 Thread Debabrata Pani
> end output ... > (1) > > > *Cheers !!* > Arvind > > On Sat, Nov 14, 2015 at 12:18 AM, Christopher Maier < > christopher.ma...@gm.com> wrote: > > > Hi, > > > > I haven't received a response on this, has anyone had a chance to > > r

Re: Schema changes based on subquery

2015-11-15 Thread Arvind S
oduce the error? > > Thanks, > Kit > > From: Christopher Maier > Sent: Tuesday, October 20, 2015 4:02 PM > To: 'user@pig.apache.org' > Subject: Schema changes based on subquery > > Hi, > > I am getting the wrong counts from Pig for a certain query. I have &g

RE: Schema changes based on subquery

2015-11-13 Thread Christopher Maier
Hi, I haven't received a response on this, has anyone had a chance to reproduce the error? Thanks, Kit From: Christopher Maier Sent: Tuesday, October 20, 2015 4:02 PM To: 'user@pig.apache.org' Subject: Schema changes based on subquery Hi, I am getting the wrong counts from Pi

Schema changes based on subquery

2015-10-20 Thread Christopher Maier
Hi, I am getting the wrong counts from Pig for a certain query. I have simplified the query to what's below, which shows as a failure instead of a wrong count. Why does the first line of the subquery cause the output schema to revert to be the same as the input schema? This line shoul

Schema detection problems with Pig 0.14.0

2015-01-04 Thread Berin Loritsch
o collection or from MongoDB, the only exception I get is that Pig doesn't know what the schema is. I've attached the pig latin script for reference, but it's a pretty simple count of times one person emails another. I can run the equivalent map-reduce directly in MondoDB, but the goal he

RE: Group operator and variable schema (reformatted email)

2014-11-16 Thread Gufran Mohammed Pathan
er 13, 2014 12:51 PM To: user@pig.apache.org Subject: Group operator and variable schema (reformatted email) Hi All, I have the following question: Snippet of my sample.txt. First column is id, however each row can have variable number of columns. id1 100 200 300 400 500 id2 10 20 30id1 800 900

Group operator and variable schema (reformatted email)

2014-11-12 Thread Sameer Tilak
'sample.txt' [how should I specify schema here]sample_grpd = GROUP sample by $0;sample_result = FOREACH sample_grpd generate group, FLATTEN(TOBAG([what should go here])) group by id so that the result is: id1 100 200 300 400 500 800 900 600 1 2 3 4 5 6 7 8 9 id2 10 20 30 40 50 60 70 80 90 id

Referencing output data schema from a StoreFunc

2014-08-06 Thread Rodrick
Hi,I would like to create a StoreFunc like MultiStorage but instead of referencing fields to be added to the output path by index, it references them by name (it would construct a map between names and indexes based on the schema of the data to be output). Is there a mechanism for a

Re: Schema for STRSPLIT output

2014-06-25 Thread Andrew Musselman
I think you could specify a comma as the delimiter in your load statement: x = load 'file.txt' using PigStorage(','); You could specify the schema if needed on the way in after the PigStorage call with "as (a:chararray, b:chararray, ..., n:chararray)". But if you

Schema for STRSPLIT output

2014-06-25 Thread Ashish Jain
here any way I can specify the schema of y to be a tuple of various numbers of chararrays? Something on the lines of y = FOREACH x GENERATE STRSPLIT(content, ',') as tuple(chararray(*)) 2) If I try to do the above in an UDF, how do I create output schema which depends on the input? From

Re: Output Schema of Pig UDF that returns a Tuple

2014-06-19 Thread Lorand Bendig
look ugly and error prone with the definition of schema of all 100 columns. My idea was the UDF will return tuple for each record with a self explanatory schema returned by outputSchema() and I can use this to write directly into a Hive Table with HCatStorer(). The HCatStorer expects same name for

Re: Output Schema of Pig UDF that returns a Tuple

2014-06-16 Thread Narayanan K
Hi Lorand Thanks for the reply. My use case has around 100 columns and growing, and I didn't want to make the script look ugly and error prone with the definition of schema of all 100 columns. My idea was the UDF will return tuple for each record with a self explanatory schema return

Re: Output Schema of Pig UDF that returns a Tuple

2014-06-16 Thread Lorand Bendig
Hi, If you flatten a tuple/bag, Pig will prefix the field with a disambiguation string ([prefix]::). (See: http://pig.apache.org/docs/r0.12.0/basic.html#disambiguate). In your example getSchemaName() returns a generated unique name built from the classname + first input schema field + a

Output Schema of Pig UDF that returns a Tuple

2014-06-14 Thread Narayanan K
Hi I am writing a Pig UDF that returns a Tuple as per http://wiki.apache.org/pig/UDFManual . I want the output tuple to have a particular schema, Say {name:chararray, age:int} after I FLATTEN it out after using the UDF. As per the UDFManual, the method below public Schema outputSchema(Schema

Re: Pig Schema to load the data

2014-04-05 Thread Koppula, Abhilash Reddy
the outgoing links from a >> source URL i.e Source URL 1 has 2 and 3 outgoing URLs >> >> 1 2 3 >> >> 2 3 4 >> >> 3 4 >> >> 4 1 >> >> And I would like to load into Pig as below >> >> So, Ou >> &

Re: Pig Schema to load the data

2014-04-05 Thread Lorand Bendig
t the outgoing links from a source URL i.e Source URL 1 has 2 and 3 outgoing URLs 1 2 3 2 3 4 3 4 4 1 And I would like to load into Pig as below So, Ou (1,(2,3)) (2,(3,4)) (3,(4)) (4,(1)) Can I do this using default AS schema or Do I have to write a custom loader function. Thanks, Akoppula

Pig Schema to load the data

2014-04-04 Thread Koppula, Abhilash Reddy
Hi All, I have Input data format as below to represent the outgoing links from a source URL i.e Source URL 1 has 2 and 3 outgoing URLs 1 2 3 2 3 4 3 4 4 1 And I would like to load into Pig as below So, Ou (1,(2,3)) (2,(3,4)) (3,(4)) (4,(1)) Can I do this using default AS schema or

Re: Pig0.12 gets confused about schema after a nested FOREACH

2014-03-26 Thread Cheolsoo Park
Hi Jamin, >> Out of bound access. Trying to access non-existent column: 8. Schema activityID:chararray,reqHost:chararray,rspPylByt:long pylByt:long,reqTime:double,reqDur:double,rspTime:double,rspDur:double has 8 column(s). Did you try to disable ColumnMapKeyPrune optimization? You can do

Pig0.12 gets confused about schema after a nested FOREACH

2014-03-23 Thread XIAMING CHEN
I found that PIG gets confused about the schema after a complicated but correct nested FOREACH operation. My script is attached with no modification and it gives error messages below: Picked up _JAVA_OPTIONS: -Xmx1G 2014-03-24 13:05:18,662 [main] INFO org.apache.pig.Main - Apache Pig version

JsonStorage fails to find schema when LimitAdjuster runs

2014-01-22 Thread Carlo Di Fulco
ed by a limit) yield the following Exception:* *java.io.IOException: Could not find schema in UDF contextat org.apache.pig.builtin.JsonStorage.prepareToWrite(JsonStorage.java:125) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.(PigOutputFormat

Re: A question regarding schema

2013-12-31 Thread centerqi hu
ut.seq' USING $SEQFILE_LOADER ( '-c > $TEXT_CONVERTER', '-c $TEXT_CONVERTER') AS (key: chararray, value: > chararray); > UserItemAssoc = FOREACH A GENERATE myparser.myUDF(key, value) AS {(userid: > chararray, itemtid: How to specify this???)}; > If I want to specify the schem

A question regarding schema

2013-12-30 Thread Sameer Tilak
myparser.myUDF(key, value) AS {(userid: chararray, itemtid: How to specify this???)}; If I want to specify the schema in the AS clause, how do I do it since the number of fields will differ in each row? Is it possible to somehow do this dynamically?

Re: accessing the schema within a LoadFunc

2013-12-29 Thread Costin Leau
Raised https://issues.apache.org/jira/browse/PIG-3646 Until it gets fixed though, are there some Pig internal APIs that I can use to get a hold of the schema? As I've mentioned in my initial email, I can't seem to find a way to get access to the full declaration - even the POStore contai

Re: accessing the schema within a LoadFunc

2013-12-28 Thread Cheolsoo Park
Like Alan said in the thread that you're referring to, user-defined schema in the as-clause is not available within a LoadFunc. HBaseStorage is different since its schema is passed via a constructor parameter. As far as I know, most popular Pig storages do not require users to define schema

Re: accessing the schema within a LoadFunc

2013-12-24 Thread Costin Leau
Thanks for the pointers regarding 1). Any ideas on 2) - namely why only the deferenced schema is available and how to get a hold of the actual user declaration? Cheers and Merry Christmas! On 24/12/2013 1:05 AM, Cheolsoo Park wrote: As for #1, pushdownProject() is called only if it&#

Re: accessing the schema within a LoadFunc

2013-12-23 Thread Cheolsoo Park
[1] http://www.mail-archive.com/user@pig.apache.org/msg06285.html > > > On 19/12/2013 4:08 PM, Costin Leau wrote: > >> Hi, >> >> I'm trying to get a hold of the schema specified for a loader through >> 'AS' using Apache Pig 0.12 : >>

accessing the schema within a LoadFunc

2013-12-19 Thread Costin Leau
Hi, I'm trying to get a hold of the schema specified for a loader through 'AS' using Apache Pig 0.12 : A = LOAD 'pig/tupleartists' USING MyStorage() AS (name: chararray, links: (url:chararray, picture:chararray)); B = FOREACH A GENERATE name, links.url; DUMP B;

Re: accessing the schema within a LoadFunc

2013-12-19 Thread Costin Leau
Forgot to specify the aforementioned thread [1] [1] http://www.mail-archive.com/user@pig.apache.org/msg06285.html On 19/12/2013 4:08 PM, Costin Leau wrote: Hi, I'm trying to get a hold of the schema specified for a loader through 'AS' using Apache Pig 0.12 : A = LOAD &#

Re: read csv file as schema

2013-11-27 Thread meghana narasimhan
We had to do that as well. Meg On Nov 27, 2013 7:19 AM, "Ruslan Al-Fakikh" wrote: > In my company we had to write our own Loader/Storer UDFs for this. > > > On Wed, Nov 27, 2013 at 6:00 PM, Noam Lavie wrote: > > > Hi, > > > > Is there a way to

Re: read csv file as schema

2013-11-27 Thread Ruslan Al-Fakikh
In my company we had to write our own Loader/Storer UDFs for this. On Wed, Nov 27, 2013 at 6:00 PM, Noam Lavie wrote: > Hi, > > Is there a way to load a csv file with header as schema? (the header's > fields are the properties of the schema and the other list in the csv file

read csv file as schema

2013-11-27 Thread Noam Lavie
Hi, Is there a way to load a csv file with header as schema? (the header's fields are the properties of the schema and the other list in the csv file will be in the schema format) For example: Namelast nameage Noamlavie 26 Map r

Re: Pig Schema contains a name that is not allowed in Avro

2013-11-19 Thread Ruslan Al-Fakikh
Hey Johannes! Have you solved the problem? I also see it. But I don't see it when I use the schema as a string to AvroStorage parameter. I see it only when I try to use an external schema file. And if I specify a non-existent external schema file, the error is the same. Ruslan On Tue, O

RE: Java UDF and incompatible schema

2013-11-05 Thread Sameer Tilak
Hi Pradeep, Yes, I implemented the outputSchema method and it fixed that issue. We are also planning to evaluate to store intermediate and final results in Cassandra. > Date: Mon, 4 Nov 2013 17:08:56 -0800 > Subject: Re: Java UDF and incompatible schema > From: pradeep...@gmail.com &

Re: Java UDF and incompatible schema

2013-11-04 Thread Pradeep Gollakota
This is most likely because you haven't defined the outputSchema method of the UDF. The AS keyword merges the schema generated by the UDF with the user specified schema. If the UDF does not override the method and specify the output schema, it is considered null and you will not be able to u

Java UDF and incompatible schema

2013-11-04 Thread Sameer Tilak
following script, I get the following error. Any help with this would be great! ERROR 1031: Incompatable field schema: declared is "bag_0:bag{:tuple(id:int,class:chararray,name:chararray,begin:int,end:int,probone:chararray,probtwo:chararray)}", infered is ":Unknown" J

Re: Pig Schema contains a name that is not allowed in Avro

2013-10-22 Thread Johannes Schwenk
Thanks for your answer! Actually the Avro schema is valid and I can load data with it. The error message states, that pig has a problem with the Pig schema, which has no duplicate names. Johannes Am 21.10.2013 19:29, schrieb j.barrett Strausser: > I'd imagime it is having an issue

Re: Pig Schema contains a name that is not allowed in Avro

2013-10-21 Thread j.barrett Strausser
18:50:15,554 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 2116: > Output Location Validation Failed for: > 'hdfs://path/to/output More info to follow: > Pig Schema contains a name that is not allowed in Avro > Details at logfile: pig_1382374188771.log > > L

Pig Schema contains a name that is not allowed in Avro

2013-10-21 Thread Johannes Schwenk
013-10-21 18:50:15,554 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2116: Output Location Validation Failed for: 'hdfs://path/to/output More info to follow: Pig Schema contains a name that is not allowed in Avro Details at logfile: pig_1382374188771.log Logfile contains: Pig Schem

RE: Loading a custom schema

2013-10-02 Thread Siddhi Borkar
Thanks Prashant, Will try this out. -Original Message- From: Prashant Kommireddi [mailto:prash1...@gmail.com] Sent: Thursday, September 26, 2013 1:47 PM To: user@pig.apache.org Subject: Re: Loading a custom schema Hi Siddhi, PigStorage by default looks for ".pig_schema"

Re: Loading a custom schema

2013-09-26 Thread Prashant Kommireddi
Hi Siddhi, PigStorage by default looks for ".pig_schema" under the input dir. If you would like to use a different filename, you would have to override PigStorage.getSchema(String location, Job job) and define a custom JsonMetadata object. You might want to start here. Using a s

Loading a custom schema

2013-09-26 Thread Siddhi Borkar
Hi, I am trying to load a tsv file using PigStorage input_data = load 'input.tsv' using PigStorage('\t','-schema'); This loads the tsv file as per the .pig_schema file present in the input folder. Is there any way to load the schema from a custom path? For ex, say

Re: schema definition and subschema

2013-08-14 Thread Cheolsoo Park
alSchema. Here is how you can verify it. 1) Debug Pig main in eclipse. 2) Set a breakpoint in the LogicalFieldSchema constructor. 3) Run "a = load '/dev/null' as (i:int, t:tuple(j:int));" on grunt. Thanks, Cheolsoo On Thu, Aug 8, 2013 at 2:42 PM, Keren Ouaknine wrote: >

schema definition and subschema

2013-08-08 Thread Keren Ouaknine
Hi, A schema in Pig (LogicalSchema.java) is defined as an array list of LogicalFieldSchema whose class members are: - String alias - byte type - long uid - LogicalSchema schema I am wondering why is LogicalFieldShema containing a LogicalSchema member? My guess so far is that perhaps there

AILED. Error message is: org.apache.pig.impl.plan.PlanValidationException: ERROR 1108: Duplicate schema alias: msisdn

2013-07-25 Thread Serega Sheypak
point_type; STORE projPivotsWithEndPoints INTO '$validPivots' USING org.apache.pig.piggybank.storage.avro.AvroStorage('index', '3', 'schema', '{"name": "valid_pivots", "doc": "version 0.0.1", "type": "

Push tuple/bag schema to UDF function automatically

2013-07-25 Thread Serega Sheypak
int, --10 is_active: boolean, --11 avg_speed: double,--12 distance: int, --13 not_valid: int); --14 } You can see that relation *routePivots* has

Re: How do I read headers from first line into schema?

2013-07-09 Thread Sajid Raza
Have you considered using a Pig schema? On Jul 9, 2013, at 12:32 PM, "Kimmel, Chad" wrote: > Hi, what I am trying to do is read the headers from the first line as the > field names into the schema. For instance, given the following tab > deliminated file > > --sa

How do I read headers from first line into schema?

2013-07-09 Thread Kimmel, Chad
Hi, what I am trying to do is read the headers from the first line as the field names into the schema. For instance, given the following tab deliminated file --samplefile.txt— Name Job Age Chad Engineer 23 MikeStats34 ChrisIT 25 Instead of deleting the first

Re: Override input schema in AvroStorage

2013-05-01 Thread Enns, Steven
Cool thanks! On 4/30/13 9:10 PM, "Cheolsoo Park" wrote: >Hi Steven, > >The new AvroStorage will let you specify the input schema: >https://issues.apache.org/jira/browse/PIG-3015 > >In fact, somebody made the same request in a comment of the jira that I a

Re: Override input schema in AvroStorage

2013-04-30 Thread Cheolsoo Park
Hi Steven, The new AvroStorage will let you specify the input schema: https://issues.apache.org/jira/browse/PIG-3015 In fact, somebody made the same request in a comment of the jira that I am copying and pasting below: Furthermore, we occasionally have issues with pig jobs picking the old

Re: Override input schema in AvroStorage

2013-04-27 Thread Enns, Steven
Resending now that I am subscribed :) On 4/25/13 4:01 PM, "Enns, Steven" wrote: >Hi everyone, > >I would like to override the input schema in AvroStorage to make a pig >script robust to schema evolution. For example, suppose a new field is >added to an avro schema wit

Override input schema in AvroStorage

2013-04-25 Thread Enns, Steven
Hi everyone, I would like to override the input schema in AvroStorage to make a pig script robust to schema evolution. For example, suppose a new field is added to an avro schema with a default value of null. If the input to a pig script using this field includes both old and new data

Re: JsonLoader schema field order shouldn't matter

2013-04-04 Thread Ruslan Al-Fakikh
lan Gates wrote: > > > I would open a new JIRA, since 1914 is focussed on building an > alternative > > that discovers schema, while you are wanting to improve the existing one. > > > > Alan. > > > > On Jan 7, 2013, at 5:02 PM, Tim Sell wrote: > > &g

Re: String Representation of DataBag and its Schema

2013-03-21 Thread William Oberman
mport org.apache.pig.impl.util.CastUtils; import org.apache.pig.impl.util.Utils; import org.apache.pig.newplan.logical.relational.LogicalSchema; import java.io.IOException; public class CSPigUtils { public static Object getPigRepresentation(String schema, String data) throws IOException { Utf8StorageConv

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
m elements "{(})#," which isn't the case (ie, a serialized json chararray for a field). So I was hoping for a more OTS solution using existing classes and methods given the String and it's Schema a priori. Thank you for your help, and I'll keep this post updated on my progress toward

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
against >> > a >> > > functional requirement in the UDF. >> > > >> > > The UDFs I am testing are part of a larger ETL testing initiative I >> have >> > > been undertaking. To ensure that the various states of legacy data >>

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
he UDF. > > > > > > The UDFs I am testing are part of a larger ETL testing initiative I > have > > > been undertaking. To ensure that the various states of legacy data are > > > correctly extracted and transformed into a Pig context, I am creating > >

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
s of legacy data are > > correctly extracted and transformed into a Pig context, I am creating > > specific JUnit tests per each UDF containing specific use cases as > testing > > methods. > > > > Motivation to use String inputs for the Data Objects and Schema

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
correctly extracted and transformed into a Pig context, I am creating > specific JUnit tests per each UDF containing specific use cases as testing > methods. > > Motivation to use String inputs for the Data Objects and Schema Objects is > the improvement on the conventional approach -

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
Pig context, I am creating specific JUnit tests per each UDF containing specific use cases as testing methods. Motivation to use String inputs for the Data Objects and Schema Objects is the improvement on the conventional approach - creating Java Objects and adding and appending nested Objects to

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
esting of my UDFs. > > -Dan > > On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney >wrote: > > > how was string_databag generated? > > > > > > 2013/3/19 Dan DeCapria, CivicScience > > > > > Expanding upon this, the follow

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
Such that this string_input matches the Schema: String string_databag = "{(apples,(banana,1024),2048)}"; String string_schema = "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}"; Schema schema = Utils.getSchemaFrom

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
ted? > > > 2013/3/19 Dan DeCapria, CivicScience > > > Expanding upon this, the following use case's Schema Object can be > resolved > > from inputs: > > > > String string_databag = "{(a,(b,d),f)}"; > > String string_schema = >

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
how was string_databag generated? 2013/3/19 Dan DeCapria, CivicScience > Expanding upon this, the following use case's Schema Object can be resolved > from inputs: > > String string_databag = "{(a,(b,d),f)}"; > String string_schema = > &quo

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
Expanding upon this, the following use case's Schema Object can be resolved from inputs: String string_databag = "{(a,(b,d),f)}"; String string_schema = "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}"; Schema schema =

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
Thank you for your reply. The problem is I cannot find a methodology to go from a String representation of a complex data type to a nested Object of pig DataTypes. I looked over the pig 0.10.1 docs, but cannot find a way to go from String and Schema to pig DataType Object. For context, I am

Re: String Representation of DataBag and its Schema

2013-03-18 Thread Jonathan Coveney
n > with its schema String to a valid DataBag Object: > > String databag_string = "{(apples,1024)}"; > String schema_string = "b1:bag{t1:tuple(a:chararray,b:long)}"; > > I've tried implementing something along the lines of this, but I believe

String Representation of DataBag and its Schema

2013-03-18 Thread Dan DeCapria, CivicScience
In Java, I am trying to convert a DataBag from it's String representation with its schema String to a valid DataBag Object: String databag_string = "{(apples,1024)}"; String schema_string = "b1:bag{t1:tuple(a:chararray,b:long)}"; I've tried implementing something

Re: Pig job result output and schema

2013-03-05 Thread Johnny Zhang
; > On Tue, Mar 5, 2013 at 11:30 AM, Johnny Zhang > wrote: > > Hi, Jeff: > > Reply inline. > > > > > > On Tue, Mar 5, 2013 at 11:18 AM, Jeff Yuan > wrote: > > > >> I have a couple of questions regarding job result and schema. The > >> context is t

Re: Pig job result output and schema

2013-03-05 Thread Jonathan Coveney
Zhang > wrote: > > Hi, Jeff: > > Reply inline. > > > > > > On Tue, Mar 5, 2013 at 11:18 AM, Jeff Yuan > wrote: > > > >> I have a couple of questions regarding job result and schema. The > >> context is that I'm trying to create a cus

Re: Pig job result output and schema

2013-03-05 Thread Jeff Yuan
astAlias()); ... Thanks, Jeff On Tue, Mar 5, 2013 at 11:30 AM, Johnny Zhang wrote: > Hi, Jeff: > Reply inline. > > > On Tue, Mar 5, 2013 at 11:18 AM, Jeff Yuan wrote: > >> I have a couple of questions regarding job result and schema. The >> context is that I

Re: Pig job result output and schema

2013-03-05 Thread Johnny Zhang
Hi, Jeff: Reply inline. On Tue, Mar 5, 2013 at 11:18 AM, Jeff Yuan wrote: > I have a couple of questions regarding job result and schema. The > context is that I'm trying to create a custom entry point for Pig that > takes a script, executes it, and always stores the last de

Pig job result output and schema

2013-03-05 Thread Jeff Yuan
I have a couple of questions regarding job result and schema. The context is that I'm trying to create a custom entry point for Pig that takes a script, executes it, and always stores the last declared alias/variable in a file. Would appreciate any insights to the 2 questions I have below o

Re: how to stop dereferencing after join - error setting schema

2013-03-02 Thread Michael West
hael West wrote: > >> >> I would like to set the schema after joining so that I do not have to >> always dereference. However, I receive an error when I try this. How can >> I resolve this error? >> >> pig version 0.11 >> >> Error message: >> >

Re: how to stop dereferencing after join - error setting schema

2013-03-02 Thread Bill Graham
Each field needs to be dereferenced individually: A::name AS name, A::age AS age... On Saturday, March 2, 2013, Michael West wrote: > > I would like to set the schema after joining so that I do not have to > always dereference. However, I receive an error when I try this. How can &g

how to stop dereferencing after join - error setting schema

2013-03-02 Thread Michael West
I would like to set the schema after joining so that I do not have to always dereference. However, I receive an error when I try this. How can I resolve this error? pig version 0.11 Error message: [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable field schema

Re: possible to infer schema from TSV header?

2013-01-15 Thread Bill Graham
Take a look at the org.apache.pig.builtin.PigStorage.getSchema(..) method. You can subclass PigStorage and implement that method to read the schema from the first line of the file. Or you can just implement the LoadMetaData in the loader you're using. On Tue, Jan 15, 2013 at 2:27 PM,

Re: possible to infer schema from TSV header?

2013-01-15 Thread Mason
Actually, I'll probably just end up computing positions to use, rather than pasting in a schema, but the general point is that I'd love to do it some other way, because little hacks like these make my data pipeline feel fragile. I'm willing to write some Java if anyone could point

possible to infer schema from TSV header?

2013-01-15 Thread Mason
plug that schema string into the AS portion of my LOAD statement. Then I'll project columns I want and manually typecast them. Is there a better, simple way? -Mason

Re: JsonLoader schema field order shouldn't matter

2013-01-10 Thread Dmitriy Ryaboy
Tim, can you open a github issue with EB about compiling against 0.10? I think this is an easy fix. On Tue, Jan 8, 2013 at 9:38 AM, Alan Gates wrote: > I would open a new JIRA, since 1914 is focussed on building an alternative > that discovers schema, while you are wanting to impro

Re: Pig Avrostorage Issue regarding Schema evaluation

2013-01-09 Thread Russell Jurney
7; using >>>> org.apache.pig.piggybank.storage.avro.AvroStorage( ); >>>> dump employee; >>>> >>>> >>>> Schemas : >>>> >>>> { >>>> "type" : "record", >>>> "name" :

Re: Pig Avrostorage Issue regarding Schema evaluation

2013-01-09 Thread Cheolsoo Park
; >> > >> Schemas : > >> > >> { > >> "type" : "record", > >> "name" : "employee", > >> "fields":[ > >>{"name" : "name", "type" : "string&quo

Re: Pig Avrostorage Issue regarding Schema evaluation

2013-01-09 Thread Russell Jurney
ame", "type" : "string", "default" : "NU"}, >>{"name" : "age", "type" : "int","default" : 0}, >>{"name" : "dept", "type": "string","default&q

Re: Pig Avrostorage Issue regarding Schema evaluation

2013-01-09 Thread Cheolsoo Park
ult" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"}, > {"name" : "salary&q

Pig Avrostorage Issue regarding Schema evaluation

2013-01-09 Thread Milind Vaidya
ot;default" : "OU"}, {"name" : "salary", "type": "float","default" : 0.0} ] } { "type" : "record", "name" : "employee", "fields":[ {"name" : "name", "type"

Re: JsonLoader schema field order shouldn't matter

2013-01-08 Thread Alan Gates
I would open a new JIRA, since 1914 is focussed on building an alternative that discovers schema, while you are wanting to improve the existing one. Alan. On Jan 7, 2013, at 5:02 PM, Tim Sell wrote: > This seems like a bug to me. It makes it risky to work with JSON data > genera

Re: Declaring schema for unknown number of columns

2013-01-07 Thread Jinyuan Zhou
Sorry, Looks like my suggestion won't help unless you were able to specify the schema with the original load statement. If the number of field is ONLY available at runtime but each row have the same number field and you know the position of join key, then I have a ugly approach. First, sample

Re: Declaring schema for unknown number of columns

2013-01-07 Thread Chan, Tim
$4 AS sale_month_3, $5 AS sale_month_4, $6 AS sale_month_5, $7 AS sale_month_6, $8 ..; I still get the same error when I try to join on this relation. On Mon, Jan 7, 2013 at 2:27 PM, Jinyuan Zhou wrote: > If you can load it but join operation need the complete schema, then you > ca

Re: JsonLoader schema field order shouldn't matter

2013-01-07 Thread Tim Sell
gt;> >> And I use >> >> a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray'); >> DUMP a; >> >> I get errors when it tries to force the date fields into an integer. >> >> Shouldn't this work independent of the or

Re: JsonLoader schema field order shouldn't matter

2013-01-07 Thread Tim Sell
https://issues.apache.org/jira/browse/PIG-1914 ~T On 7 January 2013 20:24, Alan Gates wrote: > Currently the JsonLoader does assume ordering of the fields. It does not do > any name matching against the given schema to find the right field. > > Alan. > > On Jan 7, 2013, at 11:56 AM, Ti

Re: Declaring schema for unknown number of columns

2013-01-07 Thread Jinyuan Zhou
If you can load it but join operation need the complete schema, then you can try do a generate statement to project your original relation to produce the one you can define schema for all fields. On Mon, Jan 7, 2013 at 2:19 PM, Chan, Tim wrote: > Is it possible to declare a schema when do

Declaring schema for unknown number of columns

2013-01-07 Thread Chan, Tim
Is it possible to declare a schema when doing a LOAD for data in which you do not know the total number of columns? For instance. I know the data contains 6 or more columns. These columns are of the same data type. I basically want to join this data with another data set, but I was getting the

Re: JsonLoader schema field order shouldn't matter

2013-01-07 Thread meghana narasimhan
input.json' USING JsonLoader('id:int,date:chararray'); > DUMP a; > > I get errors when it tries to force the date fields into an integer. > > Shouldn't this work independent of the ordering of the schema fields? > Json writers generally don't make guarante

Re: JsonLoader schema field order shouldn't matter

2013-01-07 Thread Alan Gates
Currently the JsonLoader does assume ordering of the fields. It does not do any name matching against the given schema to find the right field. Alan. On Jan 7, 2013, at 11:56 AM, Tim Sell wrote: > When using JsonLoader with Pig 0.10.0 > > if I have an input.json file that looks

  1   2   3   >