I have the same bug. I read the code... there is no obvious fix. Arg.
On Feb 2, 2012, at 10:07 PM, Something Something
wrote:
> In my Pig script I have something like this...
>
> %default MY_SCHEMA '/user/xyz/my-schema.json';
>
> %default MY_AVRO
> 'org.apache.pig.piggybank.storage.avro.Avr
Ah, yeah, if you can shrink data down that much, going outside of Pig (or
doing things in a UDF) is the way to go.
D
On Thu, Feb 2, 2012 at 3:45 PM, Grig Gheorghiu wrote:
> Hey Dmitriy! Unfortunately that't the requirement. The solution I
> found so far is to do all the pre-filtering and groupin
Hi,
I've a bunch of [for example] apache logfiles that I'm searching through. I
can process them with:
logs = load 's3://bucket/directory/*' USING LogLoader as (remoteAddr,
remoteLogname, user, time :chararray, method, uri :chararray, proto, status,
bytes, referer, userAgent);
Is there any w
Cleaned up my environment by unsetting HADOOP_HOME, and removing some old
jacksons in my CLASSPATH and Pig's AvroStorage works again.
Woot!
On Thu, Feb 2, 2012 at 3:47 PM, Russell Jurney wrote:
> Spoken too soon... this happens no matter what avros I load now. I can't
> figure that anything has
Thanks for your reply. WritableByteArrayComparable is in the same place in
HBase 0.93. I also registered my HBase jar from within Pig i.e register
/opt/hbase/hbase-0.93.jar and that command returned without error. I also
edited the line in the pig startup script that specifies the path to HBase.
Spoken too soon... this happens no matter what avros I load now. I can't
figure that anything has changed regarding jars, etc. Confused.
I think this happens when Avro is parsing the schema?
Pig Stack Trace
---
ERROR 2998: Unhandled internal error.
org.codehaus.jackson.JsonFactory.e
Hey Dmitriy! Unfortunately that't the requirement. The solution I
found so far is to do all the pre-filtering and grouping I can in Pig,
and then run Python on the output file generated by Pig. That file is
~ 300 MB, so it's not a problem to just run through Python.
Thanks for getting back to me.
"records before" is kind of hard do define in an MR paradigm.
I suppose you could group and then run the records through an accumulative
UDF. But this is feeling very hacky. Is there a more scalable
(order-independent) way you can do what you need?
On Thu, Jan 26, 2012 at 4:32 PM, Grig Gheorghiu w
"Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.filter.WritableByteArrayComparable" indicates that
you don't have HBase on your classpath, or that the version of HBase you
are testing against moved this class someplace else. We've tested against
the 0.90 series, but not 0.92+.
Hi,
I'm trying to use Pig 0.9.2 with HBase 0.93 (i.e. the latest from HBase
trunk) and following the tutorial.
This line loads the sample file from HDFS successfully:
raw = LOAD 'test/excite-small.log' USING PigStorage('\t') AS (user, time,
query);
This line seems to work:
T= FOREACH
a bag of one tuple is still a bag, you need to flatten it
generate group, FLATTEN(biggest);
Shawn
On Wed, Feb 1, 2012 at 10:07 PM, Sid Stuart wrote:
> Hi,
>
> I'm using Pig to analyze some log files. We would like to find the last
> time a URL has been accessed. I've pulled out the path and the
On Feb 1, 2012, at 5:04 PM, Aleksandr Elbakyan wrote:
> Hello All,
>
> I am trying to understand how does pig group partitioning work, I was not
> able to find any documentation regarding what happen under the hood.
>
>
> For example
>
> B = GROUP A BY age;
>
> Does pig partition data by a
Okie so its wierd.
I was able to run a pig query using $0.$0
the pig script I wrote for the data (tmp.txt) :
(1,2,3) (2,4,5)
(2,3,4) (2,3,5)
z = load 'tmp.txt';
x = foreach z generate $0.$0;
dump x;
It ran fine for first time. But now its giving me error :
ERROR 1066: Unable to open iterator
Okie got it.Thanks for guiding.
Without schema. we can refer through $0.$0 or $1.$0 and so on based on the
positions..
Thanks,
Praveenesh
On Thu, Feb 2, 2012 at 3:28 PM, praveenesh kumar wrote:
> One more thing, suppose I have data - tmp.txt lie
> (1,2,3) (2,4,5)
> (2,3,4) (2,3,5)
>
> So if I w
One more thing, suppose I have data - tmp.txt like
(1,2,3) (2,4,5)
(2,3,4) (2,3,5)
So if I will use Z1 = Load 'tmp.txt'
The data will get stored in a bag (right?)
( (1,2,3), (2,4,5) )
( (2,3,4), (2,3,5) )
Now I can refer to the fields in this case ( without schema ) ?
B = Foreach Z1 generate
thanks Daniel,
so it means for all other complex datatypes, we need the file contents to
be in that format
like tuples in ( ), bag in { } , map in [ ]
On Thu, Feb 2, 2012 at 2:49 PM, Daniel Dai wrote:
> Hi, Praveenesh,
> Your tmp.txt should be:
> (1,2,3,4)
> (2,3,4,5)
> (4,5,5,6)
>
> And you c
Hi, Praveenesh,
Your tmp.txt should be:
(1,2,3,4)
(2,3,4,5)
(4,5,5,6)
And you cannot use "," as a delimit for PigStorage, otherwise,
PigStorage will split the line with comma first then parse the tuple.
Daniel
On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar wrote:
> Hi,
>
> I am trying to lear
17 matches
Mail list logo