Looks like it's covered:
public ProtobufBytesToTuple(TypeRef typeRef, ProtobufExtensionRegistry
extensionRegistry) {
Thanks,
Ben
On Apr 3, 2012, at 4:41 PM, Raghu Angadi wrote:
> extension are not supported yet. there is a patch pending :
> https://github.com/kevinweil/elephant-bird/pull/143
>
extension are not supported yet. there is a patch pending :
https://github.com/kevinweil/elephant-bird/pull/143
Can you check if that covers your use case?
On Tue, Apr 3, 2012 at 4:32 PM, Benjamin Juhn wrote:
> Thanks Dmitriy. Doesn't look like that class supports extensions. Am I
> missing s
Thanks Dmitriy. Doesn't look like that class supports extensions. Am I
missing something?
- Ben
On Mar 27, 2012, at 10:01 PM, Dmitriy Ryaboy wrote:
> I think you want ProtobufBytesToTuple
> (https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/piggybank
why not pipe multi-line xml from the executable through another script that
understands it?
On Wed, Mar 28, 2012 at 8:24 AM, Ahmed Sobhi wrote:
> I'm streaming data in a pig script through an executable that returns an
> xml fragment for each line of input I stream to it. That xml fragment
> hap
SequenceFileStorage in elephant-bird lets you load and store to sequence
files.
If your input is text lines, you can store each line as 'value'.
You can experiment with different codecs.
depending on your use case, simple bzip2 files may not be a bad choice.
On Tue, Apr 3, 2012 at 1:57 PM, Mohit
Thanks for the examples. It appears that snappy is not splittable and
suggested approach is to write to sequence files.
I know how to load from sequencefiles, but in pig I can't find a way to
write to the sequence files using snappy compression.
On Tue, Apr 3, 2012 at 1:30 PM, Prashant Kommireddi
Does it mean Snappy is splittable?
http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
If so then how can I use it in pig?
http://hadoopified.wordpress.com/2012/01/24/snappy-compression-with-pig/
On Tue, Apr 3, 2012 at 1:02 PM, Mohit Anchlia wrote:
> I am currently using Snappy in sequence
I am currently using Snappy in sequence files. I wasn't aware snappy uses
block compression. Does it mean Snappy is splittable? If so then how can I
use it in pig?
Thanks again
On Tue, Apr 3, 2012 at 12:42 PM, Prashant Kommireddi wrote:
> Most companies handling BigData use LZO, a few have start
Most companies handling BigData use LZO, a few have started exploring/using
Snappy as well (which is not any easier to configure). These are the 2
splittable fast-compression algorithms. Note Snappy is not efficient
space-wise compared to gzip or other compression algos, but a lot faster
(ideal for
Thanks for your input.
It looks like it's some work to configure LZO. What are the other
alternatives? We read new sequence files and generate output continuously.
What are my options? Should I split the output in small pieces and gzip
them? How do people solve similar problems where there is cont
Actually I don't expect lots of rows, there should be only one row in the
output. I will try with groups rather than distinct.
On Tue, Apr 3, 2012 at 12:09 PM, Jonathan Coveney wrote:
> woops hit enter. just to see, how long does it take if you just store h?
>
> 2012/4/3 Jonathan Coveney
>
> >
woops hit enter. just to see, how long does it take if you just store h?
2012/4/3 Jonathan Coveney
> point 1: doing dump is dangerous, depending on how many rows you expect in
> the relation. you're going to serialize every row in the output to your
> console
> point 2: the issue is that you're
point 1: doing dump is dangerous, depending on how many rows you expect in
the relation. you're going to serialize every row in the output to your
console
point 2: the issue is that you're doing a nested DISTINCT. This is done in
memory, and for large data sets can be quite slow. The scalable solut
Yes, it is splittable.
Bzip2 consumes a lot of CPU in decompression. With Hadoop jobs generally
being IO bound, Bzip2 sometimes can become the bottleneck with respect to
performance due to this slow decompression rate (algorithm unable to
decompress at disk read rate).
On Tue, Apr 3, 2012 at 11:
HI,
I use Windows 7 operating system. I have recently started working on Pig and
Hadoop. I have no
previous experience on both hadoop and pig. I have installed Cygwin so that I
can make hadoop and
Pig work on my windows system. I have untarred Pig0.9.2 as well as Hadoop1.0.0.
I have set
envi
Is bzip2 not advisable? I think it can split too and is supported out of
the box.
On Thu, Mar 29, 2012 at 8:08 PM, 帝归 wrote:
> When I use LzoPigStorage, it will load all files under a directory. But I
> want compress every file under a directory and keep the file name
> unchanged, just with a .l
Thanks Guys,
This is pig script which I am running, Dataset is also small for the
filtered date, which is around 2 million rows but I am targeting to write
this script for larger scope. In here titles is array of JSON object but
stored as string datatype so I am using python udf to split it into
c
Dooohhthank you for pointing that outI thought I ran that thru
jsonlint.That seemed to fix it
Regards,
Dano
On Tue, Apr 3, 2012 at 12:11 PM, Bill Graham wrote:
> In the schema approach the error is that your json is invalid. You're
> missing a second '}' before the last ']'
In the schema approach the error is that your json is invalid. You're
missing a second '}' before the last ']'.
On Tue, Apr 3, 2012 at 10:32 AM, Dan Young wrote:
> I just updated my pig from svn repo and now am using the latest from trunk:
>
> pig -i
> Apache Pig version 0.11.0-SNAPSHOT (r1309
I just updated my pig from svn repo and now am using the latest from trunk:
pig -i
Apache Pig version 0.11.0-SNAPSHOT (r1309051)
compiled Apr 03 2012, 11:18:53
Here's the gist with stack traces, both with or without specifying schema.
Am using piggybank from trunk.
https://gist.github.com/22939
Here's the version of Pig I'm using:
pig -i
Apache Pig version 0.11.0-SNAPSHOT (r1304979)
compiled Mar 24 2012, 21:48:44
The version of Hadoop:
*Version:* 1.0.0, r1214675
Regards,
Dan
On Tue, Apr 3, 2012 at 11:07 AM, Russell Jurney wrote:
> This looks like a bug fixed in 0.10. Mind trying i
This looks like a bug fixed in 0.10. Mind trying it?
Russell Jurney http://datasyndrome.com
On Apr 3, 2012, at 9:13 AM, Dan Young wrote:
> Hello Stan,
>
> I'm back from Mexico now, and here's my GIST with all the information.
>
> https://gist.github.com/2293226
>
> Any insight into what I'm
Hello Stan,
I'm back from Mexico now, and here's my GIST with all the information.
https://gist.github.com/2293226
Any insight into what I'm not doing correctly would be greatly appreciated.
Regards,
Dan
On Mon, Mar 26, 2012 at 9:11 AM, Stan Rosenberg wrote:
> Hi Dan,
>
> Could you attach yo
Hey everyone,
we're facing a problem while reading AVRO files written with FLUME using
the AVRO Java API 1.5.4 into a HADOOP cluster. The Avro Data Store
complains about missing sync marker. Investigating the problem shows us,
that's perfectly right. The sync marker is missing. Thus we have a bloc
24 matches
Mail list logo