Hi Felix! We had a similar usecase and I trained multiple models on partitions of my data with mapPartition and the model-parameters (weights) as broadcast variable. If I understood broadcast variables in Flink correctly, you should end up with one model on each TaskManager.
Does that work? Felix Am 07.07.2015 um 17:32 schrieb Felix Neutatz: > Hi, > > at the moment I have a dataset which looks like this: > > DataSet[model_ID, DataVector] data > > So what I want to do is group by the model_ID and build for each model_ID > one regression model > > in pseudo code: > data.groupBy(model_ID) > --> MultipleLinearRegression().fit(data_grouped) > > Is there anyway besides an iteration how to do this at the moment? > > Thanks for your help, > > Felix Neutatz >