I'm trying to get the Python streaming UDFs working for the first time and I'm not quite sure what is going wrong. The job runs fine (I'm running in local mode, haven't tested nonlocal yet), but the output from the Python streaming UDF isn't outputting correctly. Any ideas? My python UDFs, pig script, and output is below.
$ jython --version Jython 2.5.2 $ python --version Python 2.7.5+ $ pig --version Apache Pig version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14 ===== My pigudf.py: ===== try: from pig_util import outputSchema except ImportError: pass # must be using jython... @outputSchema("tokens:{(word:chararray)}") def word_tokenize(line): if line is None: return [] return line.split() ===== My test.pig: ===== Register 'pigudf.py' using streaming_python as pyudfc; Register 'pigudf.py' using jython as pyudfj; A = LOAD 'huckfinn_ascii.txt' AS (line:chararray); B = FOREACH A GENERATE $0, pyudfc.word_tokenize($0) as a, pyudfj.word_tokenize($0) as b; STORE B INTO 'huckfinn_out.txt'; ===== output ===== YOU don't know about me without you have read a book by the name of The {(),(),(),(),(),(h),(),(),(),(),(),(),(),(),(),()} {(YOU),(don't),(know),( about),(me),(without),(you),(have),(read),(a),(book),(by),(the),(name),(of),(The)} Adventures of Tom Sawyer; but that ain't no matter. That book was made {(entu),(),(),(y),(),(),(),(),(t),(),(),(),()} {(Adventures),(of),(Tom),(Sawye r;),(but),(that),(ain't),(no),(matter.),(That),(book),(was),(made)} by Mr. Mark Twain, and he told the truth, mainly. There was things {(),(),(),(),(),(),(),(),(),(n),(),(),()} {(by),(Mr.),(Mark),(Twain,),(an d),(he),(told),(the),(truth,),(mainly.),(There),(was),(things)} which he stretched, but mainly he told the truth. That is nothing. I {(),(),(etch),(),(),(),(),(),(),(),(),(hi),()} {(which),(he),(stretched,),(but ),(mainly),(he),(told),(the),(truth.),(That),(is),(nothing.),(I)} never seen anybody but lied one time or another, without it was Aunt {(),(),(b),(),(),(),(),(),(th),(h),(),(),()} {(never),(seen),(anybody),(but) ,(lied),(one),(time),(or),(another,),(without),(it),(was),(Aunt)} Polly, or the widow, or maybe Mary. Aunt PollyTom's Aunt Polly, she {(),(),(),(),(),(),(),(),(lyTo),(),(),()} {(Polly,),(or),(the),(widow,),(or),(maybe),(Mary.),(Aunt),(PollyTom's),(Aunt),(Polly,),(she)} isand Mary, and the Widow Douglas is all told about in that book, which {(),(),(),(),(),(g),(),(),(),(),(),(),(),()} {(isand),(Mary,),(and),(the),(Widow),(Douglas),(is),(all),(told),(about),(in),(that),(book,),(which)} is mostly a true book, with some stretchers, as I said before. {(),(),(),(),(),(),(),(etche),(),(),(),(o)} {(is),(mostly),(a),(true),(book,),(with),(some),(stretchers,),(as),(I),(said),(before.)}