Right now UDF is not working. Its in the top list though. You should be able to soon :) Are thr any other functionality of pig you use often apart from the usual suspects??
Existing Java MR jobs would be a easier move. are these cascading jobs or single map reduce jobs. If single then you should be able to, write a scala wrapper code code to call map & reduce functions with some magic & let your core code be. Would be interesting to see an actual example & get it to work. Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Thu, Apr 24, 2014 at 2:46 AM, suman bharadwaj <suman....@gmail.com>wrote: > We currently are in the process of converting PIG and Java map reduce jobs > to SPARK jobs. And we have written couple of PIG UDFs as well. Hence was > checking if we can leverage SPORK without converting to SPARK jobs. > > And is there any way I can port my existing Java MR jobs to SPARK ? > I know this thread has a different subject, let me know if need to ask > this question in separate thread. > > Thanks in advance. > > > On Thu, Apr 24, 2014 at 2:13 AM, Mayur Rustagi <mayur.rust...@gmail.com>wrote: > >> UDF >> Generate >> & many many more are not working :) >> >> Several of them work. Joins, filters, group by etc. >> I am translating the ones we need, would be happy to get help on others. >> Will host a jira to track them if you are intersted. >> >> >> Mayur Rustagi >> Ph: +1 (760) 203 3257 >> http://www.sigmoidanalytics.com >> @mayur_rustagi <https://twitter.com/mayur_rustagi> >> >> >> >> On Thu, Apr 24, 2014 at 2:10 AM, suman bharadwaj <suman....@gmail.com>wrote: >> >>> Are all the features available in PIG working in SPORK ?? Like for eg: >>> UDFs ? >>> >>> Thanks. >>> >>> >>> On Thu, Apr 24, 2014 at 1:54 AM, Mayur Rustagi >>> <mayur.rust...@gmail.com>wrote: >>> >>>> Thr are two benefits I get as of now >>>> 1. Most of the time a lot of customers dont want the full power but >>>> they want something dead simple with which they can do dsl. They end up >>>> using Hive for a lot of ETL just cause its SQL & they understand it. Pig is >>>> close & wraps up a lot of framework level semantics away from the user & >>>> lets him focus on data flow >>>> 2. Some have codebases in Pig already & are just looking to do it >>>> faster. I am yet to benchmark that on Pig on spark. >>>> >>>> I agree that pig on spark cannot solve a lot problems but it can solve >>>> some without forcing the end customer to do anything even close to coding, >>>> I believe thr is quite some value in making Spark accessible to larger >>>> group of audience. >>>> End of the day to each his own :) >>>> >>>> Regards >>>> Mayur >>>> >>>> >>>> Mayur Rustagi >>>> Ph: +1 (760) 203 3257 >>>> http://www.sigmoidanalytics.com >>>> @mayur_rustagi <https://twitter.com/mayur_rustagi> >>>> >>>> >>>> >>>> On Thu, Apr 24, 2014 at 1:24 AM, Bharath Mundlapudi < >>>> mundlap...@gmail.com> wrote: >>>> >>>>> This seems like an interesting question. >>>>> >>>>> I love Apache Pig. It is so natural and the language flows with nice >>>>> syntax. >>>>> >>>>> While I was at Yahoo! in core Hadoop Engineering, I have used Pig a >>>>> lot for analytics and provided feedback to Pig Team to do much more >>>>> functionality when it was at version 0.7. Lots of new functionality got >>>>> offered now >>>>> . >>>>> End of the day, Pig is a DSL for data flows. There will be always gaps >>>>> and enhancements. I was often thought is DSL right way to solve data flow >>>>> problems? May be not, we need complete language construct. We may have >>>>> found the answer - Scala. With Scala's dynamic compilation, we can write >>>>> much power constructs than any DSL can provide. >>>>> >>>>> If I am a new organization and beginning to choose, I would go with >>>>> Scala. >>>>> >>>>> Here is the example: >>>>> >>>>> #!/bin/sh >>>>> exec scala "$0" "$@" >>>>> !# >>>>> YOUR DSL GOES HERE BUT IN SCALA! >>>>> >>>>> You have DSL like scripting, functional and complete language power! >>>>> If we can improve first 3 lines, here you go, you have most powerful DSL >>>>> to >>>>> solve data problems. >>>>> >>>>> -Bharath >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Mar 10, 2014 at 11:00 PM, Xiangrui Meng <men...@gmail.com>wrote: >>>>> >>>>>> Hi Sameer, >>>>>> >>>>>> Lin (cc'ed) could also give you some updates about Pig on Spark >>>>>> development on her side. >>>>>> >>>>>> Best, >>>>>> Xiangrui >>>>>> >>>>>> On Mon, Mar 10, 2014 at 12:52 PM, Sameer Tilak <ssti...@live.com> >>>>>> wrote: >>>>>> > Hi Mayur, >>>>>> > We are planning to upgrade our distribution MR1> MR2 (YARN) and the >>>>>> goal is >>>>>> > to get SPROK set up next month. I will keep you posted. Can you >>>>>> please keep >>>>>> > me informed about your progress as well. >>>>>> > >>>>>> > ________________________________ >>>>>> > From: mayur.rust...@gmail.com >>>>>> > Date: Mon, 10 Mar 2014 11:47:56 -0700 >>>>>> > >>>>>> > Subject: Re: Pig on Spark >>>>>> > To: user@spark.apache.org >>>>>> > >>>>>> > >>>>>> > Hi Sameer, >>>>>> > Did you make any progress on this. My team is also trying it out >>>>>> would love >>>>>> > to know some detail so progress. >>>>>> > >>>>>> > Mayur Rustagi >>>>>> > Ph: +1 (760) 203 3257 >>>>>> > http://www.sigmoidanalytics.com >>>>>> > @mayur_rustagi >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak <ssti...@live.com> >>>>>> wrote: >>>>>> > >>>>>> > Hi Aniket, >>>>>> > Many thanks! I will check this out. >>>>>> > >>>>>> > ________________________________ >>>>>> > Date: Thu, 6 Mar 2014 13:46:50 -0800 >>>>>> > Subject: Re: Pig on Spark >>>>>> > From: aniket...@gmail.com >>>>>> > To: user@spark.apache.org; tgraves...@yahoo.com >>>>>> > >>>>>> > >>>>>> > There is some work to make this work on yarn at >>>>>> > https://github.com/aniket486/pig. (So, compile pig with ant >>>>>> > -Dhadoopversion=23) >>>>>> > >>>>>> > You can look at >>>>>> https://github.com/aniket486/pig/blob/spork/pig-spark to >>>>>> > find out what sort of env variables you need (sorry, I haven't been >>>>>> able to >>>>>> > clean this up- in-progress). There are few known issues with this, >>>>>> I will >>>>>> > work on fixing them soon. >>>>>> > >>>>>> > Known issues- >>>>>> > 1. Limit does not work (spork-fix) >>>>>> > 2. Foreach requires to turn off schema-tuple-backend (should be a >>>>>> pig-jira) >>>>>> > 3. Algebraic udfs dont work (spork-fix in-progress) >>>>>> > 4. Group by rework (to avoid OOMs) >>>>>> > 5. UDF Classloader issue (requires SPARK-1053, then you can put >>>>>> > pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf >>>>>> jars) >>>>>> > >>>>>> > ~Aniket >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves...@yahoo.com> >>>>>> wrote: >>>>>> > >>>>>> > I had asked a similar question on the dev mailing list a while back >>>>>> (Jan >>>>>> > 22nd). >>>>>> > >>>>>> > See the archives: >>>>>> > >>>>>> http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser-> >>>>>> > look for spork. >>>>>> > >>>>>> > Basically Matei said: >>>>>> > >>>>>> > Yup, that was it, though I believe people at Twitter picked it up >>>>>> again >>>>>> > recently. I'd suggest >>>>>> > asking Dmitriy if you know him. I've seen interest in this from >>>>>> several >>>>>> > other groups, and >>>>>> > if there's enough of it, maybe we can start another open source >>>>>> repo to >>>>>> > track it. The work >>>>>> > in that repo you pointed to was done over one week, and already had >>>>>> most of >>>>>> > Pig's operators >>>>>> > working. (I helped out with this prototype over Twitter's hack >>>>>> week.) That >>>>>> > work also calls >>>>>> > the Scala API directly, because it was done before we had a Java >>>>>> API; it >>>>>> > should be easier >>>>>> > with the Java one. >>>>>> > >>>>>> > >>>>>> > Tom >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <ssti...@live.com> >>>>>> wrote: >>>>>> > Hi everyone, >>>>>> > >>>>>> > We are using to Pig to build our data pipeline. I came across Spork >>>>>> -- Pig >>>>>> > on Spark at: https://github.com/dvryaboy/pig and not sure if it is >>>>>> still >>>>>> > active. >>>>>> > >>>>>> > Can someone please let me know the status of Spork or any other >>>>>> effort that >>>>>> > will let us run Pig on Spark? We can significantly benefit by using >>>>>> Spark, >>>>>> > but we would like to keep using the existing Pig scripts. >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > "...:::Aniket:::... Quetzalco@tl" >>>>>> > >>>>>> > >>>>>> >>>>> >>>>> >>>> >>> >> >