Hi Guys,

A few questions as I progress through my ML learning journey with Ignite...

- I assume that I would start by extracting features from my JSON records in
a cache into a vectorizer - how does this impact memory usage? Will origin
cache records be moved to disk, as more memory is required than is available
for the data in the vectorizer? Or will the vectorizer data begin to use
swap? Or will I get OOM exceptions?

- Are there any built-in algorithms or recommended strategies for sampling?

- Are there any dataset statistical functions like those provided by
Python's ML libraries, for high-level evaluation of specific features in a
dataset (to assess things like missing-data, cardinality, min-max, mean,
mode, standard-deviation, percentiles, etc)?

- Is there any doc/video tutorial that would provide a guide for the
complete workflow pipeline for an ML example (encompassing the
abovementioned operations)?

Thanks,
Jose



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to