much less pleasant and safe
(due to possible train-serve skews) than it can be. Internally, the lack of
this feature has caused debates about how appropriate Spark really is for
production ML.
Asher Krim
Senior Software Engineer
On Thu, May 18, 2017 at 4:24 AM, Cristian Opris
wrote:
> Reviving
Hey Michael,
any update on this? We're itching for a 2.1.1 release (specifically
SPARK-14804 which is currently blocking us)
Thanks,
Asher Krim
Senior Software Engineer
On Wed, Mar 22, 2017 at 7:44 PM, Michael Armbrust
wrote:
> An update: I cut the tag for RC1 last night. Currently
hat's
the crux of my proposal - expose the implementation so users can use Spark
models with the same exact code that was used to train
I think this is one of those things that could live outside the project,
because it's more not-Spark than Spark. Remember too that building a
solution
continue the discussion on.
Thanks,
Asher Krim
Senior Software Engineer
Congrats!
Asher Krim
Senior Software Engineer
On Mon, Feb 13, 2017 at 6:24 PM, Kousuke Saruta
wrote:
> Congratulations, Takuya!
>
> - Kousuke
> On 2017/02/14 7:38, Herman van Hövell tot Westerflier wrote:
>
> Congrats Takuya!
>
> On Mon, Feb 13, 2017 at 11:27 PM,
It took me a while, but I finally got around this:
https://github.com/apache/spark/pull/16811/files
On Fri, Jan 6, 2017 at 4:03 AM, Asher Krim wrote:
> Felix - I'm not sure I understand your example about pipeline models,
> could you elaborate? I'm talking about the `findS
e algorithms*
> A less exciting but still very important item will be constantly improving
> the core set of algorithms in MLlib. This could mean speed, scaling,
> robustness, and usability for the few algorithms which cover 90% of use
> cases.
>
> There are plenty of other possibilities, and it will be great to hear the
> community's thoughts!
>
> Thanks,
> Joseph
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] <http://databricks.com/>
>
>
>
> - To
> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
--
Asher Krim
Senior Software Engineer
sed to be based on RDDs. This makes these algorithms
unusable for anything larger than toy examples in < Spark 2.
If anyone is familiar with this bug, I would really appreciate it if they
could point me in the direction of the pr that fixed it.
Is a 1.6.4 release planned?
Would be possible to ba
blem with respect to Java 8
>> lambdas, but if that's settled, I think this could be fixed without
>> breaking the API.
>>
>> On Wed, Jan 18, 2017 at 8:50 PM Asher Krim wrote:
>>
>> In Spark 2 + Java + RDD api, the use of iterables was replaced with
>> itera
using these
constructs correctly? Is there a workaround other than converting the
iterator to an iterable outside of the function?
Thanks,
--
Asher Krim
Senior Software Engineer
del could probably easily be serialized as
> individual vectors in this case. It would introduce a
> backwards-compatibility issue but it's possible to read old and new
> formats, I believe.
>
> On Fri, Jan 13, 2017 at 8:16 PM Asher Krim wrote:
>
>> I guess it depends on
re quite small.
>> For example a PCA model consists of a few principal component vector. It's
>> a Dataset of just one element being saved here. It's re-using the code path
>> normally used to save big data sets, to output 1 file with 1 thing as
>> Parquet.
>>
&
Fri, Jan 13, 2017 at 5:23 PM Asher Krim wrote:
>
>> Hi,
>>
>> I'm curious why it's common for data to be repartitioned to 1 partition
>> when saving ml models:
>>
>> sqlContext.createDataFrame(Seq(data)).repartition(1).write.
>> parquet(dataPath)
cala#L605>).
Am I missing some benefit of repartitioning like this?
Thanks,
--
Asher Krim
Senior Software Engineer
hub.com/apache/spark/blob/master/examples/src/
>> main/scala/org/apache/spark/examples/ml/Word2VecExample.scala
>>
>>
>> _
>> From: Asher Krim
>> Sent: Tuesday, January 3, 2017 11:58 PM
>> Subject: Re: ml word2vec finSyno
turn type of
> the existing one.
>
>
> _____
> From: Asher Krim
> Sent: Wednesday, December 28, 2016 11:52 AM
> Subject: ml word2vec finSynonyms return type
> To:
> Cc: , Joseph Bradley <
> jos...@databricks.com>
>
>
>
> Hey
ss it, so here we are.)
Thanks,
--
Asher Krim
Senior Software Engineer
son de Andrade <
> adeandrad...@gmail.com> wrote:
>
>> I have updates to that PR that cover other cases. Let me update it.
>>
>> On Thu, Sep 22, 2016 at 5:51 PM, Reynold Xin wrote:
>>
>>> Did you try the proposed fix? Would be good to know whether it fixes th
Does anyone know what the status of SPARK-15717 is? It's a simple enough
looking PR, but there has been no activity on it since June 16th.
I believe that we are hitting that bug with checkpointed distributed LDA.
It's a blocker for us and we would really appreciate getting it fixed.
Jira: https:/
19 matches
Mail list logo