Re: nested Tuple(Tuple)

2012-06-28 Thread Abhinav Neelam
Yeah, says so in the manual - "A single element enclosed in parens ( ) like (5) is not considered to be a tuple but rather an arithmetic operator." http://pig.apache.org/docs/r0.10.0/basic.html#type-construction We can always use TOTUPLE. This should work: a = load 'a' as (x,y); b = foreach a gen

Re: Passing a BAG to Pig UDF constructor?

2012-06-28 Thread Abhinav Neelam
You're right I guess. There's no reason why the two steps should happen on the same nodes. To get around this, you'd have to make the hash available on all the nodes - through the distributed cache or by putting it on HDFS as Mridul suggested. Speaking of which, what's wrong with Mridul's solution?

Re: suggestion

2012-06-28 Thread Jonathan Coveney
Do you have an example? 2012/6/28 Yang > thanks > > > it was simply "blahblah field does not existing in schema for my_var : > {..} " > > > On Thu, Jun 28, 2012 at 8:24 PM, Jonathan Coveney >wrote: > > > Pig SHOULD parse the whole script, AFAIK. There are certain errors that > > will only s

Re: suggestion

2012-06-28 Thread Yang
thanks it was simply "blahblah field does not existing in schema for my_var : {..} " On Thu, Jun 28, 2012 at 8:24 PM, Jonathan Coveney wrote: > Pig SHOULD parse the whole script, AFAIK. There are certain errors that > will only surface at runtime, but in general, parsing errors should be >

Re: suggestion

2012-06-28 Thread Jonathan Coveney
Pig SHOULD parse the whole script, AFAIK. There are certain errors that will only surface at runtime, but in general, parsing errors should be surfacing early. Do you happen to have an example? 2012/6/28 Yang > let's say my pig script generates 2 MR jobs. > > it seems that currently pig parser w

Re: Custom storage function on python

2012-06-28 Thread Alan Gates
Right now load and store functions have to be in Java. I am not aware of any existing store functions that write to a socket. Alan. On Jun 28, 2012, at 6:15 AM, Fernando Doglio wrote: > Hello everyone, I've been toying around the idea of sending the output of > my pig scripts to Carbon and f

Re: Hive error when loading csv data.

2012-06-28 Thread Thejas Nair
More options - Official apache instructions for 1.0 - http://hadoop.apache.org/common/docs/r1.0.3/single_node_setup.html If you want to try it out on single node on Amazon ec2- Instructions for HDP distro - http://hortonworks.com/community/virtual-sandbox/ If you want a wizard based guided i

Re: Passing a BAG to Pig UDF constructor?

2012-06-28 Thread Dexin Wang
This (your second method) is very neat, thanks a lot Abhinav. Some problems though. First, I would have to do a STORE or DUMP of bag_dummy. Otherwise, Pig won't even run the bag_dummy line. Another problem, is it possible that the invocation of "build" step (that iterates through the bag1) and th

Re: Best Practice: store depending on data content

2012-06-28 Thread Ruslan Al-Fakikh
Hi Markus, Currently I am doing almost the same task. But in Hive. In Hive you can use the native Avro+Hive integration: https://issues.apache.org/jira/browse/HIVE-895 Or haivvreo project if you are not using the latest version of Hive. Also there is a Dynamic Partition feature in Hive that can se

Re: Best Practice: store depending on data content

2012-06-28 Thread Thejas Nair
I am not aware of any work on adding those features to MultiStorage. I think the best way to do this is to use Hcatalog. (It makes the hive metastore available for all of hadoop, so you get metadata for your data as well). You can associate a outputformat+serde for a table (instead of file name

Re: Unable to open iterator for alias A

2012-06-28 Thread Ruslan Al-Fakikh
Hi, It seems that you are using MapReduce 2.0. Why? As far as I know it is an alpha version. Also an extract from here http://hortonworks.com/blog/new-features-in-apache-pig-0-10/ Hadoop 0.23 (a.k.a. Hadoop 2.0) Support Pig 0.10.0 supports Hadoop 0.23.X. All unit and end-to-end tests passed with

Regarding MRAppMaster issue

2012-06-28 Thread Ravi Gurbaxani
Dear All, Am trying to run a Java program that invokes a PIG script using PigServer.registerscript(a.pig) from Windows Eclipse and the program has to run on Cloudera VM. When I run I get the following error in my Windows Eclipse Exception in thread "main" org.apache.pig.impl.logicalLayer.Fro

Custom storage function on python

2012-06-28 Thread Fernando Doglio
Hello everyone, I've been toying around the idea of sending the output of my pig scripts to Carbon and from there, create some graphs using Graphite. Right now, what I'm doing, is streaming that output to a python script, which in turn sends that information to the correct socket (the one Carbon i

Re: Passing a BAG to Pig UDF constructor?

2012-06-28 Thread Abhinav Neelam
You're not passing a bag to your UDF, you're passing a relation. I believe the FOREACH.. GENERATE looks for columns within the relation being iterated on meaning that it's looking for 'bag1' within the schema of 'a' One way of doing this is generating a bag containing all the tuples in relation b,

Re: Best Practice: store depending on data content

2012-06-28 Thread Markus Resch
Thanks Thejas, This _really_ helped a lot :) Some additional question on this: As far as I see, the MultiStorage is currently just capable to write CSV output, right? Is there any attempt ongoing currently to make this storage more generic regarding the format of the output data? For our needs we