Hi There Spark Users, Been trying to follow allow to this posted gxboost spark databricks notebook (https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1526931011080774/3624187670661048/6320440561800420/latest.html <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1526931011080774/3624187670661048/6320440561800420/latest.html>) however keep getting ValueError: bad input shape ().
Tried a few things with fixing it … complete SO post with details => https://stackoverflow.com/questions/58595442/xgboost-spark-one-model-per-worker-integration <https://stackoverflow.com/questions/58595442/xgboost-spark-one-model-per-worker-integration> ################################## features = inputTrainingDF.select("features").collect() lables = inputTrainingDF.select("label").collect() X = np.asarray(map(lambda v: v[0].toArray(), features)) Y = np.asarray(map(lambda v: v[0], lables)) xgbClassifier = xgb.XGBClassifier(max_depth=3, seed=18238, objective='binary:logistic') model = xgbClassifier.fit(X, Y) ValueError: bad input shape () ################################## ################################## def trainXGbModel(partitionKey, labelAndFeatures): X = np.asarray(map(lambda v: v[1].toArray(), labelAndFeatures)) Y = np.asarray(map(lambda v: v[0], labelAndFeatures)) xgbClassifier = xgb.XGBClassifier(max_depth=3, seed=18238, objective='binary:logistic' ) model = xgbClassifier.fit(X, Y) return [partitionKey, model] xgbModels = inputTrainingDF\ .select("education", "label", "features")\ .rdd\ .map(lambda row: [row[0], [row[1], row[2]]])\ .groupByKey()\ .map(lambda v: trainXGbModel(v[0], list(v[1]))) xgbModels.take(1) ValueError: bad input shape () ################################## Could someone please try to look at this? Thank you for your time and research!