SPOILER: I need to say that the release 2.8 will be published after New Year
and all answers will be related to the new release.

If we talk to 2.8 release (the last update of ML functionality in master and
release branch)

 +++ I assume that I would start by extracting features from my JSON records
in
a cache into a vectorizer - how does this impact memory usage? +++

The answer is here:
https://apacheignite.readme.io/docs/ml-partition-based-dataset

The cache will be in memory and additional data will be located in heap
too(but not in caches but near)
Of course, more memory is required (depends on training algorithm)

If heap is small you have a chance to get and OOM

+++Are there any built-in algorithms or recommended strategies for
sampling+++
Please have a look here 
https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/tutorial/Step_7_Split_train_test.java

You could use the same mechanism to get the random sample

But the have no sampling tool as is to get the sample rows from cache. It is
not a part of ML functionality now.

+++ Are there any dataset statistical functions like those provided by
Python's ML libraries, for high-level evaluation of specific features in a
dataset (to assess things like missing-data, cardinality, min-max, mean,
mode, standard-deviation, percentiles, etc)? +++

We are not manipulate directly the data in caches, the build new data in new
format for training purposes, but we doesn't support in ML pandas-like
operations.

We have preprocessing algorithms, but they could be used as a first step in
training Pipeline
https://apacheignite.readme.io/docs/preprocessing

Hope that in 2.9 summary for the dataset and a few stats (like described
above) will be added.

+++ - Is there any doc/video tutorial that would provide a guide for the
complete workflow pipeline for an ML example (encompassing the
abovementioned operations)? +++

First of all, please have a look to the Titanic Tutorial
https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/tutorial
and another examples
https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml

Also a few videos are available in my channel
https://www.youtube.com/watch?v=3CmnV6IQtTw
https://www.youtube.com/watch?v=DmoMBsiHxf8

Jose, great questions, hope to share more docs and papers about Ignite ML
after New Year and 2.8 release.








--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to