Hi, How can I convert this Python Numpy code into Spark RDD so that the operations leverage the Spark distributed architecture for Big Data.
Code is as follows - def gini(array): """Calculate the Gini coefficient of a numpy array.""" array = array.flatten() #all values are treated equally, arrays must be 1d if np.amin(array) < 0: array -= np.amin(array) #values cannot be negative array += 0.0000001 #values cannot be 0 array = np.sort(array) #values must be sorted index = np.arange(1,array.shape[0]+1) #index per array element n = array.shape[0]#number of array elements return ((np.sum((2 * index - n - 1) * array)) / (n * np.sum(array))) #Gini coefficient Thanks in adv, Aakash.