RE: Re: How to perform distributed compute in similar way to Spark vector UDF

Alexandr Shapkin Mon, 18 Nov 2019 06:58:52 -0800

Hello!

Have you tried the ML package for Apache Ignite [1]?

It definitely should use data frames internally.

> So cache keys [0...99999] have affinity key 0, keys [100000...199999] have

>affinity key 1 etc?

Sure, the easiest one would be to add a new field to your model with

[AffinityKeyMapped] public string AffKey { get; set; } = rowNumber / itemsPerPartition

Also, you may refer to GridGain’s documentation as well [2].

It’s compatible with Apache Ignite.

[1] - https://apacheignite.readme.io/docs/ml-partition-based-dataset

[2] - https://www.gridgain.com/docs/latest/developers-guide/data-modeling/affinity-collocation#configuring-affinity-key

From: camer314
Sent: Monday, November 18, 2019 6:43 AM
To: user@ignite.apache.org
Subject: Re: How to perform distributed compute in similar way to Spark vector UDF

Reading a little more in the Java docs about AffinityKey, I am thinking that,

much like vector UDF batch sizing, one way I could easily achieve my result

is to batch my rows into affinity keys. That is, for every 100,000 rows the

affinity key changes for example.

So cache keys [0...99999] have affinity key 0, keys [100000...199999] have

affinity key 1 etc?

If that is the case, may I suggest you update the .NET documentation for

Data Grid regarding Affinity Colocation as it does not mention the use of

AffinityKey or go into anywhere near as much detail as the Java docs.

Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Re: How to perform distributed compute in similar way to Spark vector UDF

Reply via email to