I asked this question on StackOverflow <https://stackoverflow.com/questions/58759138/apache-ignite-analogue-of-spark-vector-udf-and-distributed-compute-in-general/58766331#58766331>
However I probably put too much weight on Spark. My question really is, how can I load in a large CSV file to the cache and send compute actions to the nodes which work in a similar way to Pandas UDF. That is, they work on a subset of the data (rows). In Ignite I imagine I could load the CSV to a cache using PARTITION mode and then using affinity compute send functions to the nodes where the data is, so each node is processing only the data that exists on it. This seems like a nice way to go, each node is always only processing locally, and the results of those actions would be adding back to the cache, so presumably would only add locally as well. However, I am not entirely sure how the partitioning works. The examples for affinity show using a single key value. Is there a way to load a CSV into a cache in PARTITION mode, so Ignite evenly distributes across the grid but then run a compute job on every node that works ONLY with the data in its own cache, that way i wont need to care about keys? For example, imagine a CSV file that is a matrix of numbers. My distributed cache would really be a dataframe representation of that file. For arguments sake lets say my cache is keyed by an increment ID with the data being an array of doubles and the column names are A,B,C That ID key is really pretty irrelevant. Its is meaningless to my application. Now lets say I wanted to perform the same maths on every row in that dataframe, with the results being a new column in the cache. If that formula was D = A * B * C then D becomes a new column. Ignoring Spark SQL, in Spark I could write a UDF easily that creates column D by passing columns [A,B,C]. Spark doesnt care about keys or ID columns in this instance, it just gives you a vector of data and you return a vector of results. So in Ignite, how can i replicate that behaviour the most elegantly in code (.NET), send compute to the grid that collectively processes all rows without caring about the keys? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/