Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

Debasish Das Sat, 04 Feb 2017 10:06:59 -0800

Except of course lda als and neural net model....for them the model need to
be either prescored and cached on a kv store or the matrices / graph should
be kept on kv store to access them using a REST API to serve the
output..for neural net its more fun since its a distributed or local  graph
over which tensorflow compute needs to run...


In trapezium we support writing these models to store like cassandra and
lucene for example and then provide config driven akka-http based API to
add the business logic to access these model from a store and expose the
model serving as REST endpoint

Matrix, graph and kernel models we use a lot and for them turned out that
mllib style model predict were useful if we change the underlying store...
On Feb 4, 2017 9:37 AM, "Debasish Das" <[email protected]> wrote:

> If we expose an API to access the raw models out of PipelineModel can't we
> call predict directly on it from an API ? Is there a task open to expose
> the model out of PipelineModel so that predict can be called on it....there
> is no dependency of spark context in ml model...
> On Feb 4, 2017 9:11 AM, "Aseem Bansal" <[email protected]> wrote:
>
>>
>>    - In Spark 2.0 there is a class called PipelineModel. I know that the
>>    title says pipeline but it is actually talking about PipelineModel trained
>>    via using a Pipeline.
>>    - Why PipelineModel instead of pipeline? Because usually there is a
>>    series of stuff that needs to be done when doing ML which warrants an
>>    ordered sequence of operations. Read the new spark ml docs or one of the
>>    databricks blogs related to spark pipelines. If you have used python's
>>    sklearn library the concept is inspired from there.
>>    - "once model is deserialized as ml model from the store of choice
>>    within ms" - The timing of loading the model was not what I was
>>    referring to when I was talking about timing.
>>    - "it can be used on incoming features to score through
>>    spark.ml.Model predict API". The predict API is in the old mllib package
>>    not the new ml package.
>>    - "why r we using dataframe and not the ML model directly from API" -
>>    Because as of now the new ml package does not have the direct API.
>>
>>
>> On Sat, Feb 4, 2017 at 10:24 PM, Debasish Das <[email protected]>
>> wrote:
>>
>>> I am not sure why I will use pipeline to do scoring...idea is to build a
>>> model, use model ser/deser feature to put it in the row or column store of
>>> choice and provide a api access to the model...we support these primitives
>>> in github.com/Verizon/trapezium...the api has access to spark context
>>> in local or distributed mode...once model is deserialized as ml model from
>>> the store of choice within ms, it can be used on incoming features to score
>>> through spark.ml.Model predict API...I am not clear on 2200x speedup...why
>>> r we using dataframe and not the ML model directly from API ?
>>> On Feb 4, 2017 7:52 AM, "Aseem Bansal" <[email protected]> wrote:
>>>
>>>> Does this support Java 7?
>>>> What is your timezone in case someone wanted to talk?
>>>>
>>>> On Fri, Feb 3, 2017 at 10:23 PM, Hollin Wilkins <[email protected]>
>>>> wrote:
>>>>
>>>>> Hey Aseem,
>>>>>
>>>>> We have built pipelines that execute several string indexers, one hot
>>>>> encoders, scaling, and a random forest or linear regression at the end.
>>>>> Execution time for the linear regression was on the order of 11
>>>>> microseconds, a bit longer for random forest. This can be further 
>>>>> optimized
>>>>> by using row-based transformations if your pipeline is simple to around 
>>>>> 2-3
>>>>> microseconds. The pipeline operated on roughly 12 input features, and by
>>>>> the time all the processing was done, we had somewhere around 1000 
>>>>> features
>>>>> or so going into the linear regression after one hot encoding and
>>>>> everything else.
>>>>>
>>>>> Hope this helps,
>>>>> Hollin
>>>>>
>>>>> On Fri, Feb 3, 2017 at 4:05 AM, Aseem Bansal <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Does this support Java 7?
>>>>>>
>>>>>> On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Is computational time for predictions on the order of few
>>>>>>> milliseconds (< 10 ms) like the old mllib library?
>>>>>>>
>>>>>>> On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hey everyone,
>>>>>>>>
>>>>>>>>
>>>>>>>> Some of you may have seen Mikhail and I talk at Spark/Hadoop
>>>>>>>> Summits about MLeap and how you can use it to build production services
>>>>>>>> from your Spark-trained ML pipelines. MLeap is an open-source 
>>>>>>>> technology
>>>>>>>> that allows Data Scientists and Engineers to deploy Spark-trained ML
>>>>>>>> Pipelines and Models to a scoring engine instantly. The MLeap execution
>>>>>>>> engine has no dependencies on a Spark context and the serialization 
>>>>>>>> format
>>>>>>>> is entirely based on Protobuf 3 and JSON.
>>>>>>>>
>>>>>>>>
>>>>>>>> The recent 0.5.0 release provides serialization and inference
>>>>>>>> support for close to 100% of Spark transformers (we don’t yet support 
>>>>>>>> ALS
>>>>>>>> and LDA).
>>>>>>>>
>>>>>>>>
>>>>>>>> MLeap is open-source, take a look at our Github page:
>>>>>>>>
>>>>>>>> https://github.com/combust/mleap
>>>>>>>>
>>>>>>>>
>>>>>>>> Or join the conversation on Gitter:
>>>>>>>>
>>>>>>>> https://gitter.im/combust/mleap
>>>>>>>>
>>>>>>>>
>>>>>>>> We have a set of documentation to help get you started here:
>>>>>>>>
>>>>>>>> http://mleap-docs.combust.ml/
>>>>>>>>
>>>>>>>>
>>>>>>>> We even have a set of demos, for training ML Pipelines and linear,
>>>>>>>> logistic and random forest models:
>>>>>>>>
>>>>>>>> https://github.com/combust/mleap-demo
>>>>>>>>
>>>>>>>>
>>>>>>>> Check out our latest MLeap-serving Docker image, which allows you
>>>>>>>> to expose a REST interface to your Spark ML pipeline models:
>>>>>>>>
>>>>>>>> http://mleap-docs.combust.ml/mleap-serving/
>>>>>>>>
>>>>>>>>
>>>>>>>> Several companies are using MLeap in production and even more are
>>>>>>>> currently evaluating it. Take a look and tell us what you think! We 
>>>>>>>> hope to
>>>>>>>> talk with you soon and welcome feedback/suggestions!
>>>>>>>>
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>>
>>>>>>>> Hollin and Mikhail
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

Reply via email to