[ 
https://issues.apache.org/jira/browse/MINIFICPP-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Parisi resolved MINIFICPP-1201.
------------------------------------
    Resolution: Fixed

> Integrates MiNiFi C++ with H2O Driverless AI MOJO Scoring Pipeline (C++ 
> Runtime Python Wrapper) To Do ML Inference on Edge
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MINIFICPP-1201
>                 URL: https://issues.apache.org/jira/browse/MINIFICPP-1201
>             Project: Apache NiFi MiNiFi C++
>          Issue Type: New Feature
>    Affects Versions: master
>         Environment: Ubuntu 18.04 in AWS EC2
> MiNiFi C++ 0.7.0
>            Reporter: James Medel
>            Priority: Blocker
>             Fix For: 0.8.0
>
>
> *MiNiFi C++ and H2O Driverless AI Integration* via Custom Python Processors:
> Integrates MiNiFi C++ with H2O Driverless AI by using Driverless AI's MOJO 
> Scoring Pipeline (in C++ Runtime Python Wrapper) and MiNiFi's Custom Python 
> Processor. Uses a Python Processor to execute the MOJO Scoring Pipeline to do 
> batch scoring or real-time scoring for one or more predicted labels on 
> tabular test data in the incoming flow file content. If the tabular data is 
> one row, then the MOJO does real-time scoring. If the tabular data is 
> multiple rows, then the MOJO does batch scoring. I would like to contribute 
> my processors to MiNiFi C++ as a new feature.
> *1 custom python processor* created for MiNiFi:
> *H2oMojoPwScoring* - Executes H2O Driverless AI's MOJO Scoring Pipeline in 
> C++ Runtime Python Wrapper to do batch scoring or real-time scoring on a 
> frame of data within each incoming flow file. Requires the user to add the 
> *pipeline.mojo* filepath into the "MOJO Pipeline Filepath" property. This 
> property is used in the onTrigger(context, session) function to get the 
> pipeline.mojo filepath, so we can *pass it into* the 
> *daimojo.model(pipeline_mojo_filepath)* function to instantiate our 
> *mojo_scorer*. MOJO creation time and uuid are added as individual flow file 
> attributes. Then the *flow file content* is *loaded into Datatable* *frame* 
> to hold the test data. Then a Python lambda function called compare is used 
> to compare whether the datatable frame header column names equals the 
> expected header column names from the mojo scorer. This check is done because 
> the datatable frame could have a missing header, which is true when the 
> header does not equal the expected header and so we update the datatable 
> frame header with the mojo scorer's expected header. Having the correct 
> header works nicely because the *mojo scorer's* *predict(datatable_frame)* 
> function needs the header and then does the prediction returning a 
> predictions datatable frame. The mojo scorer's predict function is *capable 
> of doing real-time scoring or batch scoring*, it just depends on the amount 
> of rows that the tabular data has. This predictions datatable frame is then 
> converted to pandas dataframe, so we can use pandas' to_string(index=False) 
> function to convert the dataframe to a string without the dataframe's index. 
> Then *the prediction string is written to flow file content*. A flow file 
> attribute is added for the number of rows scored. Another one or more flow 
> file attributes are added for the predicted label name and its associated 
> score. Finally, the flow file is transferred on a success relationship.
>  
> *Hydraulic System Condition Monitoring* Data used in MiNiFi Flow:
> The sensor test data I used in this integration comes from Kaggle: Condition 
> Monitoring of Hydraulic Systems. I was able to predict hydraulic system 
> cooling efficiency through MiNiFi and H2O integration described above. This 
> use case here is hydraulic system predictive maintenance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to