Re: python modules

2012-03-12 Thread Aniket Mokashi
I spent some time debugging this. The reason is -- Sys.path on TT for jython is - ['__classpath__', '__pyclasspath__/'] And for client is ['', '/users/lib/Lib', '/users/lib/jython_simplejson.jar/Lib', '__classpath__', '__pyclasspath__/'] I am still figuring out why CLASSPATH (java.class.path

Re: python modules

2012-03-12 Thread Aniket Mokashi
This looks like a bug to me. Jython cuts out jython.jar location from classpath and appends Lib to it. But, in general on TT jython,jar is not available and its merged into job.jar by pig. Hence, imports will always fail. ~Aniket On Mon, Mar 12, 2012 at 12:54 AM, Aniket Mokashi

UDF for LOAD SimpleTextLoader without mapreduce.

2012-03-12 Thread chethan
Hi, Can write UDF with overrides LOAD SimpleTextLoader without mapreduce, I am bit confused with the use of mapreduce, because i am not able to get the flow of the LOAD SimpleTextLoader when the command is invoked. command: A = LOAD 'data' using myudf.SimpleTextLoader(); i want to now the step

Re: UDF allowed return data types?

2012-03-12 Thread Jonathan Coveney
It is restricted to the pig types, yes. You could serializize it to a DataByteArray and manually manage that, or you could just convert it to a databag, or you could make it a hashmap with null values, or a tuple... but yeah. 2012/3/12 Yang tedd...@gmail.com I tried to return a SetString

Re: want to do Linear regression analysis to achieve Interpolation using PIG Scripts.

2012-03-12 Thread Dmitriy Lyubimov
No known public good attempts known to me exist to put ML kind of stuff on top of pig . (well almost none). There are some statistical packages written at Yahoo but afaik they don't do directly what you need. Pig is somewhat excellent data prep pipeline, but IMO is not as excellent as something

Re: UDF allowed return data types?

2012-03-12 Thread Yang
thanks! On Mon, Mar 12, 2012 at 11:36 AM, Jonathan Coveney jcove...@gmail.comwrote: It is restricted to the pig types, yes. You could serializize it to a DataByteArray and manually manage that, or you could just convert it to a databag, or you could make it a hashmap with null values, or a

Re: want to do Linear regression analysis to achieve Interpolation using PIG Scripts.

2012-03-12 Thread Dmitriy Lyubimov
yes that's what i meant by almost none. It would seem to me that pig vector it is technically a bridge between pig schema to some (and at the moment perhaps quite limited) Mahout functionality rather than something fundamentally leaning on Pig's own capability. It would seem to me for that

Re: want to do Linear regression analysis to achieve Interpolation using PIG Scripts.

2012-03-12 Thread Dmitriy Ryaboy
Well that's not entirely true -- you can in fact train in parallel on different segments of your dataset, thereby creating an ensemble. Pair the outputs with a classifier udf that knows how to take advantage of that, and suddenly you have a massively parallel ETL engine that can do ML as part of