Hey Mayur, We use HiveColumnarLoader and XMLLoader. Are these working as well ?
Will try few things regarding porting Java MR. Regards, Suman Bharadwaj S On Thu, Apr 24, 2014 at 3:09 AM, Mayur Rustagi <mayur.rust...@gmail.com>wrote: > Right now UDF is not working. Its in the top list though. You should be > able to soon :) > Are thr any other functionality of pig you use often apart from the usual > suspects?? > > Existing Java MR jobs would be a easier move. are these cascading jobs or > single map reduce jobs. If single then you should be able to, write a > scala wrapper code code to call map & reduce functions with some magic & > let your core code be. Would be interesting to see an actual example & get > it to work. > > Regards > Mayur > > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Thu, Apr 24, 2014 at 2:46 AM, suman bharadwaj <suman....@gmail.com>wrote: > >> We currently are in the process of converting PIG and Java map reduce >> jobs to SPARK jobs. And we have written couple of PIG UDFs as well. Hence >> was checking if we can leverage SPORK without converting to SPARK jobs. >> >> And is there any way I can port my existing Java MR jobs to SPARK ? >> I know this thread has a different subject, let me know if need to ask >> this question in separate thread. >> >> Thanks in advance. >> >> >> On Thu, Apr 24, 2014 at 2:13 AM, Mayur Rustagi >> <mayur.rust...@gmail.com>wrote: >> >>> UDF >>> Generate >>> & many many more are not working :) >>> >>> Several of them work. Joins, filters, group by etc. >>> I am translating the ones we need, would be happy to get help on others. >>> Will host a jira to track them if you are intersted. >>> >>> >>> Mayur Rustagi >>> Ph: +1 (760) 203 3257 >>> http://www.sigmoidanalytics.com >>> @mayur_rustagi <https://twitter.com/mayur_rustagi> >>> >>> >>> >>> On Thu, Apr 24, 2014 at 2:10 AM, suman bharadwaj <suman....@gmail.com>wrote: >>> >>>> Are all the features available in PIG working in SPORK ?? Like for eg: >>>> UDFs ? >>>> >>>> Thanks. >>>> >>>> >>>> On Thu, Apr 24, 2014 at 1:54 AM, Mayur Rustagi <mayur.rust...@gmail.com >>>> > wrote: >>>> >>>>> Thr are two benefits I get as of now >>>>> 1. Most of the time a lot of customers dont want the full power but >>>>> they want something dead simple with which they can do dsl. They end up >>>>> using Hive for a lot of ETL just cause its SQL & they understand it. Pig >>>>> is >>>>> close & wraps up a lot of framework level semantics away from the user & >>>>> lets him focus on data flow >>>>> 2. Some have codebases in Pig already & are just looking to do it >>>>> faster. I am yet to benchmark that on Pig on spark. >>>>> >>>>> I agree that pig on spark cannot solve a lot problems but it can solve >>>>> some without forcing the end customer to do anything even close to coding, >>>>> I believe thr is quite some value in making Spark accessible to larger >>>>> group of audience. >>>>> End of the day to each his own :) >>>>> >>>>> Regards >>>>> Mayur >>>>> >>>>> >>>>> Mayur Rustagi >>>>> Ph: +1 (760) 203 3257 >>>>> http://www.sigmoidanalytics.com >>>>> @mayur_rustagi <https://twitter.com/mayur_rustagi> >>>>> >>>>> >>>>> >>>>> On Thu, Apr 24, 2014 at 1:24 AM, Bharath Mundlapudi < >>>>> mundlap...@gmail.com> wrote: >>>>> >>>>>> This seems like an interesting question. >>>>>> >>>>>> I love Apache Pig. It is so natural and the language flows with nice >>>>>> syntax. >>>>>> >>>>>> While I was at Yahoo! in core Hadoop Engineering, I have used Pig a >>>>>> lot for analytics and provided feedback to Pig Team to do much more >>>>>> functionality when it was at version 0.7. Lots of new functionality got >>>>>> offered now >>>>>> . >>>>>> End of the day, Pig is a DSL for data flows. There will be always >>>>>> gaps and enhancements. I was often thought is DSL right way to solve data >>>>>> flow problems? May be not, we need complete language construct. We may >>>>>> have >>>>>> found the answer - Scala. With Scala's dynamic compilation, we can write >>>>>> much power constructs than any DSL can provide. >>>>>> >>>>>> If I am a new organization and beginning to choose, I would go with >>>>>> Scala. >>>>>> >>>>>> Here is the example: >>>>>> >>>>>> #!/bin/sh >>>>>> exec scala "$0" "$@" >>>>>> !# >>>>>> YOUR DSL GOES HERE BUT IN SCALA! >>>>>> >>>>>> You have DSL like scripting, functional and complete language power! >>>>>> If we can improve first 3 lines, here you go, you have most powerful DSL >>>>>> to >>>>>> solve data problems. >>>>>> >>>>>> -Bharath >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Mar 10, 2014 at 11:00 PM, Xiangrui Meng <men...@gmail.com>wrote: >>>>>> >>>>>>> Hi Sameer, >>>>>>> >>>>>>> Lin (cc'ed) could also give you some updates about Pig on Spark >>>>>>> development on her side. >>>>>>> >>>>>>> Best, >>>>>>> Xiangrui >>>>>>> >>>>>>> On Mon, Mar 10, 2014 at 12:52 PM, Sameer Tilak <ssti...@live.com> >>>>>>> wrote: >>>>>>> > Hi Mayur, >>>>>>> > We are planning to upgrade our distribution MR1> MR2 (YARN) and >>>>>>> the goal is >>>>>>> > to get SPROK set up next month. I will keep you posted. Can you >>>>>>> please keep >>>>>>> > me informed about your progress as well. >>>>>>> > >>>>>>> > ________________________________ >>>>>>> > From: mayur.rust...@gmail.com >>>>>>> > Date: Mon, 10 Mar 2014 11:47:56 -0700 >>>>>>> > >>>>>>> > Subject: Re: Pig on Spark >>>>>>> > To: user@spark.apache.org >>>>>>> > >>>>>>> > >>>>>>> > Hi Sameer, >>>>>>> > Did you make any progress on this. My team is also trying it out >>>>>>> would love >>>>>>> > to know some detail so progress. >>>>>>> > >>>>>>> > Mayur Rustagi >>>>>>> > Ph: +1 (760) 203 3257 >>>>>>> > http://www.sigmoidanalytics.com >>>>>>> > @mayur_rustagi >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak <ssti...@live.com> >>>>>>> wrote: >>>>>>> > >>>>>>> > Hi Aniket, >>>>>>> > Many thanks! I will check this out. >>>>>>> > >>>>>>> > ________________________________ >>>>>>> > Date: Thu, 6 Mar 2014 13:46:50 -0800 >>>>>>> > Subject: Re: Pig on Spark >>>>>>> > From: aniket...@gmail.com >>>>>>> > To: user@spark.apache.org; tgraves...@yahoo.com >>>>>>> > >>>>>>> > >>>>>>> > There is some work to make this work on yarn at >>>>>>> > https://github.com/aniket486/pig. (So, compile pig with ant >>>>>>> > -Dhadoopversion=23) >>>>>>> > >>>>>>> > You can look at >>>>>>> https://github.com/aniket486/pig/blob/spork/pig-spark to >>>>>>> > find out what sort of env variables you need (sorry, I haven't >>>>>>> been able to >>>>>>> > clean this up- in-progress). There are few known issues with this, >>>>>>> I will >>>>>>> > work on fixing them soon. >>>>>>> > >>>>>>> > Known issues- >>>>>>> > 1. Limit does not work (spork-fix) >>>>>>> > 2. Foreach requires to turn off schema-tuple-backend (should be a >>>>>>> pig-jira) >>>>>>> > 3. Algebraic udfs dont work (spork-fix in-progress) >>>>>>> > 4. Group by rework (to avoid OOMs) >>>>>>> > 5. UDF Classloader issue (requires SPARK-1053, then you can put >>>>>>> > pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf >>>>>>> jars) >>>>>>> > >>>>>>> > ~Aniket >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves...@yahoo.com> >>>>>>> wrote: >>>>>>> > >>>>>>> > I had asked a similar question on the dev mailing list a while >>>>>>> back (Jan >>>>>>> > 22nd). >>>>>>> > >>>>>>> > See the archives: >>>>>>> > >>>>>>> http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser-> >>>>>>> > look for spork. >>>>>>> > >>>>>>> > Basically Matei said: >>>>>>> > >>>>>>> > Yup, that was it, though I believe people at Twitter picked it up >>>>>>> again >>>>>>> > recently. I'd suggest >>>>>>> > asking Dmitriy if you know him. I've seen interest in this from >>>>>>> several >>>>>>> > other groups, and >>>>>>> > if there's enough of it, maybe we can start another open source >>>>>>> repo to >>>>>>> > track it. The work >>>>>>> > in that repo you pointed to was done over one week, and already >>>>>>> had most of >>>>>>> > Pig's operators >>>>>>> > working. (I helped out with this prototype over Twitter's hack >>>>>>> week.) That >>>>>>> > work also calls >>>>>>> > the Scala API directly, because it was done before we had a Java >>>>>>> API; it >>>>>>> > should be easier >>>>>>> > with the Java one. >>>>>>> > >>>>>>> > >>>>>>> > Tom >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <ssti...@live.com> >>>>>>> wrote: >>>>>>> > Hi everyone, >>>>>>> > >>>>>>> > We are using to Pig to build our data pipeline. I came across >>>>>>> Spork -- Pig >>>>>>> > on Spark at: https://github.com/dvryaboy/pig and not sure if it >>>>>>> is still >>>>>>> > active. >>>>>>> > >>>>>>> > Can someone please let me know the status of Spork or any other >>>>>>> effort that >>>>>>> > will let us run Pig on Spark? We can significantly benefit by >>>>>>> using Spark, >>>>>>> > but we would like to keep using the existing Pig scripts. >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > -- >>>>>>> > "...:::Aniket:::... Quetzalco@tl" >>>>>>> > >>>>>>> > >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >