Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

Aseem Bansal Mon, 06 Feb 2017 00:02:39 -0800

I agree with you that this is needed. There is a JIRA
https://issues.apache.org/jira/browse/SPARK-10413


On Sun, Feb 5, 2017 at 11:21 PM, Debasish Das <debasish.da...@gmail.com>
wrote:

> Hi Aseem,
>
> Due to production deploy, we did not upgrade to 2.0 but that's critical
> item on our list.
>
> For exposing models out of PipelineModel, let me look into the ML
> tasks...we should add it since dataframe should not be must for model
> scoring...many times model are scored on api or streaming paths which don't
> have micro batching involved...data directly lands from http or kafka/msg
> queues...for such cases raw access to ML model is essential similar to
> mllib model access...
>
> Thanks.
> Deb
> On Feb 4, 2017 9:58 PM, "Aseem Bansal" <asmbans...@gmail.com> wrote:
>
>> @Debasish
>>
>> I see that the spark version being used in the project that you mentioned
>> is 1.6.0. I would suggest that you take a look at some blogs related to
>> Spark 2.0 Pipelines, Models in new ml package. The new ml package's API as
>> of latest Spark 2.1.0 release has no way to call predict on single vector.
>> There is no API exposed. It is WIP but not yet released.
>>
>> On Sat, Feb 4, 2017 at 11:07 PM, Debasish Das <debasish.da...@gmail.com>
>> wrote:
>>
>>> If we expose an API to access the raw models out of PipelineModel can't
>>> we call predict directly on it from an API ? Is there a task open to expose
>>> the model out of PipelineModel so that predict can be called on it....there
>>> is no dependency of spark context in ml model...
>>> On Feb 4, 2017 9:11 AM, "Aseem Bansal" <asmbans...@gmail.com> wrote:
>>>
>>>>
>>>>    - In Spark 2.0 there is a class called PipelineModel. I know that
>>>>    the title says pipeline but it is actually talking about PipelineModel
>>>>    trained via using a Pipeline.
>>>>    - Why PipelineModel instead of pipeline? Because usually there is a
>>>>    series of stuff that needs to be done when doing ML which warrants an
>>>>    ordered sequence of operations. Read the new spark ml docs or one of the
>>>>    databricks blogs related to spark pipelines. If you have used python's
>>>>    sklearn library the concept is inspired from there.
>>>>    - "once model is deserialized as ml model from the store of choice
>>>>    within ms" - The timing of loading the model was not what I was
>>>>    referring to when I was talking about timing.
>>>>    - "it can be used on incoming features to score through
>>>>    spark.ml.Model predict API". The predict API is in the old mllib package
>>>>    not the new ml package.
>>>>    - "why r we using dataframe and not the ML model directly from API"
>>>>    - Because as of now the new ml package does not have the direct API.
>>>>
>>>>
>>>> On Sat, Feb 4, 2017 at 10:24 PM, Debasish Das <debasish.da...@gmail.com
>>>> > wrote:
>>>>
>>>>> I am not sure why I will use pipeline to do scoring...idea is to build
>>>>> a model, use model ser/deser feature to put it in the row or column store
>>>>> of choice and provide a api access to the model...we support these
>>>>> primitives in github.com/Verizon/trapezium...the api has access to
>>>>> spark context in local or distributed mode...once model is deserialized as
>>>>> ml model from the store of choice within ms, it can be used on incoming
>>>>> features to score through spark.ml.Model predict API...I am not clear on
>>>>> 2200x speedup...why r we using dataframe and not the ML model directly 
>>>>> from
>>>>> API ?
>>>>> On Feb 4, 2017 7:52 AM, "Aseem Bansal" <asmbans...@gmail.com> wrote:
>>>>>
>>>>>> Does this support Java 7?
>>>>>> What is your timezone in case someone wanted to talk?
>>>>>>
>>>>>> On Fri, Feb 3, 2017 at 10:23 PM, Hollin Wilkins <hol...@combust.ml>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey Aseem,
>>>>>>>
>>>>>>> We have built pipelines that execute several string indexers, one
>>>>>>> hot encoders, scaling, and a random forest or linear regression at the 
>>>>>>> end.
>>>>>>> Execution time for the linear regression was on the order of 11
>>>>>>> microseconds, a bit longer for random forest. This can be further 
>>>>>>> optimized
>>>>>>> by using row-based transformations if your pipeline is simple to around 
>>>>>>> 2-3
>>>>>>> microseconds. The pipeline operated on roughly 12 input features, and by
>>>>>>> the time all the processing was done, we had somewhere around 1000 
>>>>>>> features
>>>>>>> or so going into the linear regression after one hot encoding and
>>>>>>> everything else.
>>>>>>>
>>>>>>> Hope this helps,
>>>>>>> Hollin
>>>>>>>
>>>>>>> On Fri, Feb 3, 2017 at 4:05 AM, Aseem Bansal <asmbans...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Does this support Java 7?
>>>>>>>>
>>>>>>>> On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal <asmbans...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Is computational time for predictions on the order of few
>>>>>>>>> milliseconds (< 10 ms) like the old mllib library?
>>>>>>>>>
>>>>>>>>> On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins <hol...@combust.ml
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Hey everyone,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Some of you may have seen Mikhail and I talk at Spark/Hadoop
>>>>>>>>>> Summits about MLeap and how you can use it to build production 
>>>>>>>>>> services
>>>>>>>>>> from your Spark-trained ML pipelines. MLeap is an open-source 
>>>>>>>>>> technology
>>>>>>>>>> that allows Data Scientists and Engineers to deploy Spark-trained ML
>>>>>>>>>> Pipelines and Models to a scoring engine instantly. The MLeap 
>>>>>>>>>> execution
>>>>>>>>>> engine has no dependencies on a Spark context and the serialization 
>>>>>>>>>> format
>>>>>>>>>> is entirely based on Protobuf 3 and JSON.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The recent 0.5.0 release provides serialization and inference
>>>>>>>>>> support for close to 100% of Spark transformers (we don’t yet 
>>>>>>>>>> support ALS
>>>>>>>>>> and LDA).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> MLeap is open-source, take a look at our Github page:
>>>>>>>>>>
>>>>>>>>>> https://github.com/combust/mleap
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Or join the conversation on Gitter:
>>>>>>>>>>
>>>>>>>>>> https://gitter.im/combust/mleap
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We have a set of documentation to help get you started here:
>>>>>>>>>>
>>>>>>>>>> http://mleap-docs.combust.ml/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We even have a set of demos, for training ML Pipelines and
>>>>>>>>>> linear, logistic and random forest models:
>>>>>>>>>>
>>>>>>>>>> https://github.com/combust/mleap-demo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Check out our latest MLeap-serving Docker image, which allows you
>>>>>>>>>> to expose a REST interface to your Spark ML pipeline models:
>>>>>>>>>>
>>>>>>>>>> http://mleap-docs.combust.ml/mleap-serving/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Several companies are using MLeap in production and even more are
>>>>>>>>>> currently evaluating it. Take a look and tell us what you think! We 
>>>>>>>>>> hope to
>>>>>>>>>> talk with you soon and welcome feedback/suggestions!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sincerely,
>>>>>>>>>>
>>>>>>>>>> Hollin and Mikhail
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

Reply via email to