Re: ml ALS.fit(..) issue

2016-07-22 Thread Pedro Rodriguez
ror; >>>> at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:452) >>>> at yelp.TestUser.main(TestUser.java:101) >>>> >>>> here line 101 in the above error is the following in code. >>>> >>>> ALSModel model = als.fit(training)

Re: Spark 2.0 Dataset Documentation

2016-06-18 Thread Pedro Rodriguez
, Jun 18, 2016 at 6:03 AM, Jacek Laskowski wrote: > On Sat, Jun 18, 2016 at 6:13 AM, Pedro Rodriguez > wrote: > > > using Datasets (eg using $ to select columns). > > Or even my favourite one - the tick ` :-) > > Jacek > -- Pedro Rodriguez PhD Student in Distri

Re: Spark 2.0 Dataset Documentation

2016-06-17 Thread Pedro Rodriguez
an add more contents > incrementally. > > We should definitely cover more about Dataset. > > > Cheng > > On 6/17/16 10:28 PM, Pedro Rodriguez wrote: > > The updates look great! > > Looks like many places are updated to the new APIs, but there still isn't > a s

Re: Spark 2.0 Dataset Documentation

2016-06-17 Thread Pedro Rodriguez
; Cheng > On 6/17/16 9:13 PM, Pedro Rodriguez wrote: > > Hi All, > > At my workplace we are starting to use Datasets in 1.6.1 and even more > with Spark 2.0 in place of Dataframes. I looked at the 1.6.1 documentation > then the 2.0 documentation and it looks like not much time

Re: Skew data

2016-06-17 Thread Pedro Rodriguez
hat if the data was skewed while joining it would take long time > to finish the job.(99 percent finished in seconds where 1 percent of task > taking minutes to hour). > > How to handle skewed data in spark. > > Thanks, > Selvam R > +91-97877-87724 > -- Pedro Rodriguez P

Spark 2.0 Dataset Documentation

2016-06-17 Thread Pedro Rodriguez
creating and using Datasets (eg using $ to select columns). Is this of value, and if so what should my next step be to get this going (create JIRA etc)? -- Pedro Rodriguez PhD Student in Distributed Machine Learning | CU Boulder R&D Data Science Intern at Oracle Data Cloud UC Berkeley AM

Re: Hello

2016-06-17 Thread Pedro Rodriguez
be helpful to know what to look for or if its better to ask library maintainers directly. Thanks, Pedro Rodriguez On Fri, Jun 17, 2016 at 10:46 AM, Xinh Huynh wrote: > Here are some guidelines about contributing to Spark: > > https://cwiki.apache.org/confluence/display/SPARK/Contri

Re: Open Issues for Contributors

2015-09-22 Thread Pedro Rodriguez
> > https://issues.apache.org/jira/issues/?filter=12333428 > > For a specific release, you can also filter the release, and I Reynold had > sent this a few days ago for 1.5.1 > > https://issues.apache.org/jira/issues/?filter=1221 > > > On Tue, Sep 22, 2015 at 8:50 A

Open Issues for Contributors

2015-09-22 Thread Pedro Rodriguez
ls for the next release (be it 1.5.1 or 1.6) with some parent issues along with smaller child issues to work on (like the built ins ticket from 1.5)? Thanks, -- Pedro Rodriguez PhD Student in Distributed Machine Learning | CU Boulder UC Berkeley AMPLab Alumni ski.rodrig...@gmail.com | pedrorodrigue

Re: "Spree": Live-updating web UI for Spark

2015-07-27 Thread Pedro Rodriguez
g the web UI; Spree currently involves two JS servers so > some rewriting of things would probably have to happen), why it might be > good to do, and why it might be not good or not worth it (e.g. Spark should > make sure it’s possible and easy to do sophisticated things like this > outside of the Spark repo, putting more work in the driver process is a bad > idea, etc.). > > OK, that’s my brain dump, I’d love to hear peoples’ thoughts on any/all of > this, otherwise thanks for the APIs and sorry for having to cheat them a > bit! :) > > -Ryan > ​ > -- Pedro Rodriguez UCBerkeley 2014 | Computer Science SnowGeek <http://SnowGeek.org> pedro-rodriguez.com ski.rodrig...@gmail.com 208-340-1703

Re: Is `dev/lint-python` broken?

2015-07-27 Thread Pedro Rodriguez
Mon, Jul 27, 2015 at 1:09 PM, Pedro Rodriguez > wrote: > >> I am having the same issue, but the python style checks are failing on >> the Jenkins build server. Is anyone else having this problem? Failed build >> is here: >> https://amplab.cs.berkeley.edu/jenkins/j

Re: Is `dev/lint-python` broken?

2015-07-27 Thread Pedro Rodriguez
I am having the same issue, but the python style checks are failing on the Jenkins build server. Is anyone else having this problem? Failed build is here: https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/121/console Pedro Rodriguez On Mon, Jul 27, 2015 at 7:10 AM, Yu

PySpark addPyFile for directories

2015-07-22 Thread Pedro Rodriguez
? Does this seem like a good idea? The implementation would be to have python zip the given directory into a tmp directory, then ship that to the cluster. -- Pedro Rodriguez CU Boulder Phd Student UCBerkeley 2014 | Computer Science