I don't know whats up, but figure its worth drawing your attention to
https://issues.apache.org/jira/browse/DATAFU-8


On Thu, Feb 13, 2014 at 11:03 AM, Donald Miner <dmi...@clearedgeit.com>wrote:

> I'm trying to get the Python streaming UDFs working for the first time and
> I'm not quite sure what is going wrong. The job runs fine (I'm running in
> local mode, haven't tested nonlocal yet), but the output from the Python
> streaming UDF isn't outputting correctly. Any ideas? My python UDFs, pig
> script, and output is below.
>
> $ jython --version
> Jython 2.5.2
> $ python --version
> Python 2.7.5+
> $ pig --version
> Apache Pig version 0.12.0 (r1529718)
> compiled Oct 07 2013, 12:20:14
>
>
> ===== My pigudf.py: =====
>
> try:
>     from pig_util import outputSchema
> except ImportError:
>     pass # must be using jython...
>
> @outputSchema("tokens:{(word:chararray)}")
> def word_tokenize(line):
>     if line is None:
>         return []
>
>     return line.split()
>
>
> ===== My test.pig: =====
>
> Register 'pigudf.py' using streaming_python as pyudfc;
> Register 'pigudf.py' using jython as pyudfj;
>
> A = LOAD 'huckfinn_ascii.txt' AS (line:chararray);
>
> B = FOREACH A GENERATE $0, pyudfc.word_tokenize($0) as a,
> pyudfj.word_tokenize($0) as b;
>
> STORE B INTO 'huckfinn_out.txt';
>
>
> ===== output =====
>
> YOU don't know about me without you have read a book by the name of The
> {(),(),(),(),(),(h),(),(),(),(),(),(),(),(),(),()}
>  {(YOU),(don't),(know),(
>
> about),(me),(without),(you),(have),(read),(a),(book),(by),(the),(name),(of),(The)}
> Adventures of Tom Sawyer; but that ain't no matter. That book was made
>  {(entu),(),(),(y),(),(),(),(),(t),(),(),(),()}
>  {(Adventures),(of),(Tom),(Sawye
> r;),(but),(that),(ain't),(no),(matter.),(That),(book),(was),(made)}
> by Mr. Mark Twain, and he told the truth, mainly. There was things
>  {(),(),(),(),(),(),(),(),(),(n),(),(),()}
> {(by),(Mr.),(Mark),(Twain,),(an
> d),(he),(told),(the),(truth,),(mainly.),(There),(was),(things)}
> which he stretched, but mainly he told the truth. That is nothing. I
>  {(),(),(etch),(),(),(),(),(),(),(),(),(hi),()}
>  {(which),(he),(stretched,),(but
> ),(mainly),(he),(told),(the),(truth.),(That),(is),(nothing.),(I)}
> never seen anybody but lied one time or another, without it was Aunt
>  {(),(),(b),(),(),(),(),(),(th),(h),(),(),()}
>  {(never),(seen),(anybody),(but)
> ,(lied),(one),(time),(or),(another,),(without),(it),(was),(Aunt)}
> Polly, or the widow, or maybe Mary. Aunt PollyTom's Aunt Polly, she
> {(),(),(),(),(),(),(),(),(lyTo),(),(),()}
>
> {(Polly,),(or),(the),(widow,),(or),(maybe),(Mary.),(Aunt),(PollyTom's),(Aunt),(Polly,),(she)}
> isand Mary, and the Widow Douglas is all told about in that book, which
> {(),(),(),(),(),(g),(),(),(),(),(),(),(),()}
>
>  
> {(isand),(Mary,),(and),(the),(Widow),(Douglas),(is),(all),(told),(about),(in),(that),(book,),(which)}
> is mostly a true book, with some stretchers, as I said before.
>  {(),(),(),(),(),(),(),(etche),(),(),(),(o)}
>
> {(is),(mostly),(a),(true),(book,),(with),(some),(stretchers,),(as),(I),(said),(before.)}
>



-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com

Reply via email to