I don't know whats up, but figure its worth drawing your attention to https://issues.apache.org/jira/browse/DATAFU-8
On Thu, Feb 13, 2014 at 11:03 AM, Donald Miner <dmi...@clearedgeit.com>wrote: > I'm trying to get the Python streaming UDFs working for the first time and > I'm not quite sure what is going wrong. The job runs fine (I'm running in > local mode, haven't tested nonlocal yet), but the output from the Python > streaming UDF isn't outputting correctly. Any ideas? My python UDFs, pig > script, and output is below. > > $ jython --version > Jython 2.5.2 > $ python --version > Python 2.7.5+ > $ pig --version > Apache Pig version 0.12.0 (r1529718) > compiled Oct 07 2013, 12:20:14 > > > ===== My pigudf.py: ===== > > try: > from pig_util import outputSchema > except ImportError: > pass # must be using jython... > > @outputSchema("tokens:{(word:chararray)}") > def word_tokenize(line): > if line is None: > return [] > > return line.split() > > > ===== My test.pig: ===== > > Register 'pigudf.py' using streaming_python as pyudfc; > Register 'pigudf.py' using jython as pyudfj; > > A = LOAD 'huckfinn_ascii.txt' AS (line:chararray); > > B = FOREACH A GENERATE $0, pyudfc.word_tokenize($0) as a, > pyudfj.word_tokenize($0) as b; > > STORE B INTO 'huckfinn_out.txt'; > > > ===== output ===== > > YOU don't know about me without you have read a book by the name of The > {(),(),(),(),(),(h),(),(),(),(),(),(),(),(),(),()} > {(YOU),(don't),(know),( > > about),(me),(without),(you),(have),(read),(a),(book),(by),(the),(name),(of),(The)} > Adventures of Tom Sawyer; but that ain't no matter. That book was made > {(entu),(),(),(y),(),(),(),(),(t),(),(),(),()} > {(Adventures),(of),(Tom),(Sawye > r;),(but),(that),(ain't),(no),(matter.),(That),(book),(was),(made)} > by Mr. Mark Twain, and he told the truth, mainly. There was things > {(),(),(),(),(),(),(),(),(),(n),(),(),()} > {(by),(Mr.),(Mark),(Twain,),(an > d),(he),(told),(the),(truth,),(mainly.),(There),(was),(things)} > which he stretched, but mainly he told the truth. That is nothing. I > {(),(),(etch),(),(),(),(),(),(),(),(),(hi),()} > {(which),(he),(stretched,),(but > ),(mainly),(he),(told),(the),(truth.),(That),(is),(nothing.),(I)} > never seen anybody but lied one time or another, without it was Aunt > {(),(),(b),(),(),(),(),(),(th),(h),(),(),()} > {(never),(seen),(anybody),(but) > ,(lied),(one),(time),(or),(another,),(without),(it),(was),(Aunt)} > Polly, or the widow, or maybe Mary. Aunt PollyTom's Aunt Polly, she > {(),(),(),(),(),(),(),(),(lyTo),(),(),()} > > {(Polly,),(or),(the),(widow,),(or),(maybe),(Mary.),(Aunt),(PollyTom's),(Aunt),(Polly,),(she)} > isand Mary, and the Widow Douglas is all told about in that book, which > {(),(),(),(),(),(g),(),(),(),(),(),(),(),()} > > > {(isand),(Mary,),(and),(the),(Widow),(Douglas),(is),(all),(told),(about),(in),(that),(book,),(which)} > is mostly a true book, with some stretchers, as I said before. > {(),(),(),(),(),(),(),(etche),(),(),(),(o)} > > {(is),(mostly),(a),(true),(book,),(with),(some),(stretchers,),(as),(I),(said),(before.)} > -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com