Re: Spark Local Pipelines

2017-05-18 Thread Asher Krim
much less pleasant and safe (due to possible train-serve skews) than it can be. Internally, the lack of this feature has caused debates about how appropriate Spark really is for production ML. Asher Krim Senior Software Engineer On Thu, May 18, 2017 at 4:24 AM, Cristian Opris wrote: > Reviving

Re: Outstanding Spark 2.1.1 issues

2017-03-28 Thread Asher Krim
Hey Michael, any update on this? We're itching for a 2.1.1 release (specifically SPARK-14804 which is currently blocking us) Thanks, Asher Krim Senior Software Engineer On Wed, Mar 22, 2017 at 7:44 PM, Michael Armbrust wrote: > An update: I cut the tag for RC1 last night. Currently

Re: Spark Local Pipelines

2017-03-13 Thread Asher Krim
hat's the crux of my proposal - expose the implementation so users can use Spark models with the same exact code that was used to train I think this is one of those things that could live outside the project, because it's more not-Spark than Spark. Remember too that building a solution

Spark Local Pipelines

2017-03-12 Thread Asher Krim
continue the discussion on. Thanks, Asher Krim Senior Software Engineer

Re: welcoming Takuya Ueshin as a new Apache Spark committer

2017-02-13 Thread Asher Krim
Congrats! Asher Krim Senior Software Engineer On Mon, Feb 13, 2017 at 6:24 PM, Kousuke Saruta wrote: > Congratulations, Takuya! > > - Kousuke > On 2017/02/14 7:38, Herman van Hövell tot Westerflier wrote: > > Congrats Takuya! > > On Mon, Feb 13, 2017 at 11:27 PM,

Re: ml word2vec finSynonyms return type

2017-02-05 Thread Asher Krim
It took me a while, but I finally got around this: https://github.com/apache/spark/pull/16811/files On Fri, Jan 6, 2017 at 4:03 AM, Asher Krim wrote: > Felix - I'm not sure I understand your example about pipeline models, > could you elaborate? I'm talking about the `findS

Re: MLlib mission and goals

2017-01-24 Thread Asher Krim
e algorithms* > A less exciting but still very important item will be constantly improving > the core set of algorithms in MLlib. This could mean speed, scaling, > robustness, and usability for the few algorithms which cover 90% of use > cases. > > There are plenty of other possibilities, and it will be great to hear the > community's thoughts! > > Thanks, > Joseph > > > > -- > > Joseph Bradley > > Software Engineer - Machine Learning > > Databricks, Inc. > > [image: http://databricks.com] <http://databricks.com/> > > > > - To > unsubscribe e-mail: dev-unsubscr...@spark.apache.org -- Asher Krim Senior Software Engineer

Spark 1.6.3 Driver OOM on createDataFrame

2017-01-22 Thread Asher Krim
sed to be based on RDDs. This makes these algorithms unusable for anything larger than toy examples in < Spark 2. If anyone is familiar with this bug, I would really appreciate it if they could point me in the direction of the pr that fixed it. Is a 1.6.4 release planned? Would be possible to ba

Re: Possible bug - Java iterator/iterable inconsistency

2017-01-19 Thread Asher Krim
blem with respect to Java 8 >> lambdas, but if that's settled, I think this could be fixed without >> breaking the API. >> >> On Wed, Jan 18, 2017 at 8:50 PM Asher Krim wrote: >> >> In Spark 2 + Java + RDD api, the use of iterables was replaced with >> itera

Possible bug - Java iterator/iterable inconsistency

2017-01-18 Thread Asher Krim
using these constructs correctly? Is there a workaround other than converting the iterator to an iterable outside of the function? Thanks, -- Asher Krim Senior Software Engineer

Re: Why are ml models repartition(1)'d in save methods?

2017-01-16 Thread Asher Krim
del could probably easily be serialized as > individual vectors in this case. It would introduce a > backwards-compatibility issue but it's possible to read old and new > formats, I believe. > > On Fri, Jan 13, 2017 at 8:16 PM Asher Krim wrote: > >> I guess it depends on

Re: Why are ml models repartition(1)'d in save methods?

2017-01-13 Thread Asher Krim
re quite small. >> For example a PCA model consists of a few principal component vector. It's >> a Dataset of just one element being saved here. It's re-using the code path >> normally used to save big data sets, to output 1 file with 1 thing as >> Parquet. >> &

Re: Why are ml models repartition(1)'d in save methods?

2017-01-13 Thread Asher Krim
Fri, Jan 13, 2017 at 5:23 PM Asher Krim wrote: > >> Hi, >> >> I'm curious why it's common for data to be repartitioned to 1 partition >> when saving ml models: >> >> sqlContext.createDataFrame(Seq(data)).repartition(1).write. >> parquet(dataPath)

Why are ml models repartition(1)'d in save methods?

2017-01-13 Thread Asher Krim
cala#L605>). Am I missing some benefit of repartitioning like this? Thanks, -- Asher Krim Senior Software Engineer

Re: ml word2vec finSynonyms return type

2017-01-06 Thread Asher Krim
hub.com/apache/spark/blob/master/examples/src/ >> main/scala/org/apache/spark/examples/ml/Word2VecExample.scala >> >> >> _ >> From: Asher Krim >> Sent: Tuesday, January 3, 2017 11:58 PM >> Subject: Re: ml word2vec finSyno

Re: ml word2vec finSynonyms return type

2017-01-03 Thread Asher Krim
turn type of > the existing one. > > > _____ > From: Asher Krim > Sent: Wednesday, December 28, 2016 11:52 AM > Subject: ml word2vec finSynonyms return type > To: > Cc: , Joseph Bradley < > jos...@databricks.com> > > > > Hey

ml word2vec finSynonyms return type

2016-12-28 Thread Asher Krim
ss it, so here we are.) Thanks, -- Asher Krim Senior Software Engineer

Re: [SPARK-15717][GraphX] status

2016-09-23 Thread Asher Krim
son de Andrade < > adeandrad...@gmail.com> wrote: > >> I have updates to that PR that cover other cases. Let me update it. >> >> On Thu, Sep 22, 2016 at 5:51 PM, Reynold Xin wrote: >> >>> Did you try the proposed fix? Would be good to know whether it fixes th

[SPARK-15717][GraphX] status

2016-09-22 Thread Asher Krim
Does anyone know what the status of SPARK-15717 is? It's a simple enough looking PR, but there has been no activity on it since June 16th. I believe that we are hitting that bug with checkpointed distributed LDA. It's a blocker for us and we would really appreciate getting it fixed. Jira: https:/