I'm trying to get the Python streaming UDFs working for the first time and
I'm not quite sure what is going wrong. The job runs fine (I'm running in
local mode, haven't tested nonlocal yet), but the output from the Python
streaming UDF isn't outputting correctly. Any ideas? My python UDFs, pig
script, and output is below.

$ jython --version
Jython 2.5.2
$ python --version
Python 2.7.5+
$ pig --version
Apache Pig version 0.12.0 (r1529718)
compiled Oct 07 2013, 12:20:14


===== My pigudf.py: =====

try:
    from pig_util import outputSchema
except ImportError:
    pass # must be using jython...

@outputSchema("tokens:{(word:chararray)}")
def word_tokenize(line):
    if line is None:
        return []

    return line.split()


===== My test.pig: =====

Register 'pigudf.py' using streaming_python as pyudfc;
Register 'pigudf.py' using jython as pyudfj;

A = LOAD 'huckfinn_ascii.txt' AS (line:chararray);

B = FOREACH A GENERATE $0, pyudfc.word_tokenize($0) as a,
pyudfj.word_tokenize($0) as b;

STORE B INTO 'huckfinn_out.txt';


===== output =====

YOU don't know about me without you have read a book by the name of The
{(),(),(),(),(),(h),(),(),(),(),(),(),(),(),(),()}
 {(YOU),(don't),(know),(
about),(me),(without),(you),(have),(read),(a),(book),(by),(the),(name),(of),(The)}
Adventures of Tom Sawyer; but that ain't no matter. That book was made
 {(entu),(),(),(y),(),(),(),(),(t),(),(),(),()}
 {(Adventures),(of),(Tom),(Sawye
r;),(but),(that),(ain't),(no),(matter.),(That),(book),(was),(made)}
by Mr. Mark Twain, and he told the truth, mainly. There was things
 {(),(),(),(),(),(),(),(),(),(n),(),(),()}
{(by),(Mr.),(Mark),(Twain,),(an
d),(he),(told),(the),(truth,),(mainly.),(There),(was),(things)}
which he stretched, but mainly he told the truth. That is nothing. I
 {(),(),(etch),(),(),(),(),(),(),(),(),(hi),()}
 {(which),(he),(stretched,),(but
),(mainly),(he),(told),(the),(truth.),(That),(is),(nothing.),(I)}
never seen anybody but lied one time or another, without it was Aunt
 {(),(),(b),(),(),(),(),(),(th),(h),(),(),()}
 {(never),(seen),(anybody),(but)
,(lied),(one),(time),(or),(another,),(without),(it),(was),(Aunt)}
Polly, or the widow, or maybe Mary. Aunt PollyTom's Aunt Polly, she
{(),(),(),(),(),(),(),(),(lyTo),(),(),()}
{(Polly,),(or),(the),(widow,),(or),(maybe),(Mary.),(Aunt),(PollyTom's),(Aunt),(Polly,),(she)}
isand Mary, and the Widow Douglas is all told about in that book, which
{(),(),(),(),(),(g),(),(),(),(),(),(),(),()}
 
{(isand),(Mary,),(and),(the),(Widow),(Douglas),(is),(all),(told),(about),(in),(that),(book,),(which)}
is mostly a true book, with some stretchers, as I said before.
 {(),(),(),(),(),(),(),(etche),(),(),(),(o)}
{(is),(mostly),(a),(true),(book,),(with),(some),(stretchers,),(as),(I),(said),(before.)}

Reply via email to