Re: Converting xml to csv

2013-09-16 Thread ajay kumar
yeah thank you... now im also struck. if possible, can you share the solution ?? On Mon, Sep 16, 2013 at 7:21 PM, wrote: > Your example had newlines in the element. The regular > expression .* does not match newlines. One way to remove newlines is > REPLACE(x,'[\\n]',''). If the text ranges

Re: how to load custom Writable class from sequence file?

2013-09-16 Thread Pradeep Gollakota
It doesn't look like the SequenceFileLoader from the piggybank has much support. The elephant bird version looks like it does what you need it to do. https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java You'll have to wr

Re: how to load custom Writable class from sequence file?

2013-09-16 Thread Pradeep Gollakota
Thats correct... The "load ... AS (k:chararray, v:charrary);" doesn't actually do what you think it does. The AS statement tell Pig what the schema types are, so it will call the appropriate LoadCaster method to get it into the right type. A LoadCaster object defines how to map byte[] into appropr

Re: how to load custom Writable class from sequence file?

2013-09-16 Thread Yang
I think my custom type has toString(), well at least writable() says it's writable to bytes, so supposedly if I force it to bytes or string, pig should be able to cast like load ... AS ( k:chararray, v:chararray); but this actually fails On Mon, Sep 16, 2013 at 6:22 PM, Pradeep Gollakota wrote:

Re: how to load custom Writable class from sequence file?

2013-09-16 Thread Pradeep Gollakota
The problem is that pig only speaks its data types. So you need to tell it how to translate from your custom writable to a pig datatype. Apparently elephant-bird has some support for doing this type of thing... take a look at this SO post http://stackoverflow.com/questions/16540651/apache-pig-can-

how to load custom Writable class from sequence file?

2013-09-16 Thread Yang
I tried to do a quick and dirty inspection of some of our data feeds, which are encoded in gzipped SequenceFile. basically I did a = load 'myfile' using ..SequenceFileLoader() AS ( mykey, myvalue); but it gave me some error: 2013-09-16 17:34:28,915 [Thread-5] INFO org.apache.hadoop.io.compr

Re: Pig Parameter Substitution

2013-09-16 Thread Ruslan Al-Fakikh
Hi, No sure whether it helps, but I did a lot of testing in such cases. "Test and see" was my main approach. It is really tricky sometimes. Also you can try the -dryrun option when launching pig. Best Regards, Ruslan Al-Fakikh https://www.odesk.com/users/~015b7b5f617eb89923 On Tue, Sep 17, 2013

Pig Parameter Substitution

2013-09-16 Thread Siddhi Mehta
Hey All, How does pig deal with handling null param values. Should there be an exception on null param value? Currently it just translates it to String null e.g InputStream queryStream = IOUtils.toInputStream("A = LOAD '$VAL' using PigStorage()", "UTF-8"); Map paramMap = Maps.ne

RE: Converting xml to csv

2013-09-16 Thread william.dowling
Your example had newlines in the element. The regular expression .* does not match newlines. One way to remove newlines is REPLACE(x,'[\\n]',''). If the text ranges you are interested in do not contain newlines, for example if you are interested in but do not care about its relation to other

unittest for Jython pig UDFs

2013-09-16 Thread Serega Sheypak
Hi, I'm trying to integrate Jython UDF into my maven project. I have a problem with running scripts where @outputSchema("blabla") is defined Here is an error: File "/home/ssa/devel/etl-masterdata/gsmcell-merger/src/test/python/Test.py", line 3, in from pig.udf.mergerUDF import * File "__py