Hi, Igniters!

I am looking for a possibility to load data from Spark RDD or DataFrame to
Ignite cache with next declaration IgniteCache<Integer, Object[]> dataCache
to perform Ignite ML algorithms.

As I understand the current mechanism of Ignite-Spark integration helps to
store RDD/DF from Spark in Ignite to improve performance of Spark Jobs and
this implementation couldn't help me, am I correct?

Dou  you know how to make this small ETL more effectively? Without
collecting data on one node like in example below?

 IgniteCache<Integer, Object[]> cache = getCache(ignite);

        SparkSession spark = SparkSession
            .builder()
            .appName("SparkForIgnite")
            .master("local")
            .config("spark.executor.instances", "2")
            .getOrCreate();

        Dataset<Row> ds = <ds in Spark>;

        ds.show();

        List<Row> data = ds.collectAsList(); // stupid solution

        Object[] parsedRow = new Object[14];
        for (int i = 0; i < data.size(); i++) {
            for (int j = 0; j < 14; j++)
                parsedRow[j] = data.get(i).get(j);
            cache.put(i, parsedRow);
        }

        spark.stop();



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to