Hello, Since I'm new to data science, I'm not really sure if it's a bug or wrong incoming data, so I decided to ask here for advice before submitting a ticket. I tried to apply Kmeans algorithm on my bag-of-words data with ~8k features. So I copy-pasted some lines from example:
IgniteCache<String, double[]> dataCache = ignite.cache(storageName); KMeansTrainer trainer = new KMeansTrainer().withSeed(1234L); KMeansModel mdl = trainer.fit( ignite, dataCache, (k, v) -> Arrays.copyOfRange(v, 1, v.length), (k, v) -> v[0] ); But this leads to a NullPointerException in KMeansTrainer.class: Caused by: java.lang.NullPointerException at org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.lambda$initClusterCentersRandomly$4dba08e1$1(KMeansTrainer.java:190) at org.apache.ignite.ml.dataset.impl.cache.CacheBasedDataset.computeForAllPartitions(CacheBasedDataset.java:158) at org.apache.ignite.ml.dataset.impl.cache.CacheBasedDataset.compute(CacheBasedDataset.java:122) at org.apache.ignite.ml.dataset.Dataset.compute(Dataset.java:102) at org.apache.ignite.ml.dataset.Dataset.compute(Dataset.java:156) at org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.initClusterCentersRandomly(KMeansTrainer.java:186) at org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.fit(KMeansTrainer.java:86) at line: List<LabeledVector> rndPnts = dataset.compute(data -> { List<LabeledVector> rndPnt = new ArrayList<>(); rndPnt.add(data.getRow(new Random(seed).nextInt(data.rowSize()))); return rndPnt; }, (a, b) -> a == null ? b : Stream.concat(a.stream(), b.stream()).collect(Collectors.toList())); The reducer receives null value for b and since there's no check for null, b.stream() leads to NPE. Ignite version is 2.6. This seems like a bug for me, is there any ways to workaround this issue? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/