Yeah, says so in the manual -
"A single element enclosed in parens ( ) like (5) is not considered to be a
tuple but rather an arithmetic operator."
http://pig.apache.org/docs/r0.10.0/basic.html#type-construction
We can always use TOTUPLE. This should work:
a = load 'a' as (x,y);
b = foreach a gen
You're right I guess. There's no reason why the two steps should happen on
the same nodes. To get around this, you'd have to make the hash available
on all the nodes - through the distributed cache or by putting it on HDFS
as Mridul suggested. Speaking of which, what's wrong with Mridul's
solution?
Do you have an example?
2012/6/28 Yang
> thanks
>
>
> it was simply "blahblah field does not existing in schema for my_var :
> {..} "
>
>
> On Thu, Jun 28, 2012 at 8:24 PM, Jonathan Coveney >wrote:
>
> > Pig SHOULD parse the whole script, AFAIK. There are certain errors that
> > will only s
thanks
it was simply "blahblah field does not existing in schema for my_var :
{..} "
On Thu, Jun 28, 2012 at 8:24 PM, Jonathan Coveney wrote:
> Pig SHOULD parse the whole script, AFAIK. There are certain errors that
> will only surface at runtime, but in general, parsing errors should be
>
Pig SHOULD parse the whole script, AFAIK. There are certain errors that
will only surface at runtime, but in general, parsing errors should be
surfacing early. Do you happen to have an example?
2012/6/28 Yang
> let's say my pig script generates 2 MR jobs.
>
> it seems that currently pig parser w
Right now load and store functions have to be in Java. I am not aware of any
existing store functions that write to a socket.
Alan.
On Jun 28, 2012, at 6:15 AM, Fernando Doglio wrote:
> Hello everyone, I've been toying around the idea of sending the output of
> my pig scripts to Carbon and f
More options -
Official apache instructions for 1.0 -
http://hadoop.apache.org/common/docs/r1.0.3/single_node_setup.html
If you want to try it out on single node on Amazon ec2-
Instructions for HDP distro -
http://hortonworks.com/community/virtual-sandbox/
If you want a wizard based guided i
This (your second method) is very neat, thanks a lot Abhinav.
Some problems though. First, I would have to do a STORE or DUMP of
bag_dummy. Otherwise, Pig won't even run the bag_dummy line.
Another problem, is it possible that the invocation of "build" step (that
iterates through the bag1) and th
Hi Markus,
Currently I am doing almost the same task. But in Hive.
In Hive you can use the native Avro+Hive integration:
https://issues.apache.org/jira/browse/HIVE-895
Or haivvreo project if you are not using the latest version of Hive.
Also there is a Dynamic Partition feature in Hive that can se
I am not aware of any work on adding those features to MultiStorage.
I think the best way to do this is to use Hcatalog. (It makes the hive
metastore available for all of hadoop, so you get metadata for your data
as well).
You can associate a outputformat+serde for a table (instead of file name
Hi,
It seems that you are using MapReduce 2.0. Why? As far as I know it is
an alpha version. Also an extract from here
http://hortonworks.com/blog/new-features-in-apache-pig-0-10/
Hadoop 0.23 (a.k.a. Hadoop 2.0) Support
Pig 0.10.0 supports Hadoop 0.23.X. All unit and end-to-end tests
passed with
Dear All,
Am trying to run a Java program that invokes a PIG script using
PigServer.registerscript(a.pig) from Windows Eclipse and the program has to
run on Cloudera VM.
When I run I get the following error in my Windows Eclipse
Exception in thread "main"
org.apache.pig.impl.logicalLayer.Fro
Hello everyone, I've been toying around the idea of sending the output of
my pig scripts to Carbon and from there, create some graphs using Graphite.
Right now, what I'm doing, is streaming that output to a python script,
which in turn sends that information to the correct socket (the one Carbon
i
You're not passing a bag to your UDF, you're passing a relation. I believe
the FOREACH.. GENERATE looks for columns within the relation being iterated
on meaning that it's looking for 'bag1' within the schema of 'a'
One way of doing this is generating a bag containing all the tuples in
relation b,
Thanks Thejas,
This _really_ helped a lot :)
Some additional question on this:
As far as I see, the MultiStorage is currently just capable to write CSV
output, right? Is there any attempt ongoing currently to make this
storage more generic regarding the format of the output data? For our
needs we
15 matches
Mail list logo