Hi There Spark Users,

Been trying to follow allow to this posted gxboost spark databricks notebook 
(https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1526931011080774/3624187670661048/6320440561800420/latest.html
 
<https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1526931011080774/3624187670661048/6320440561800420/latest.html>)
 however keep getting ValueError: bad input shape ().  

Tried a few things with fixing it … complete SO post with details => 
https://stackoverflow.com/questions/58595442/xgboost-spark-one-model-per-worker-integration
 
<https://stackoverflow.com/questions/58595442/xgboost-spark-one-model-per-worker-integration>

##################################

features = inputTrainingDF.select("features").collect()
lables = inputTrainingDF.select("label").collect()

X = np.asarray(map(lambda v: v[0].toArray(), features))
Y = np.asarray(map(lambda v: v[0], lables))

xgbClassifier = xgb.XGBClassifier(max_depth=3, seed=18238, 
objective='binary:logistic')

model = xgbClassifier.fit(X, Y)
ValueError: bad input shape () 
##################################

##################################

def trainXGbModel(partitionKey, labelAndFeatures):
  X = np.asarray(map(lambda v: v[1].toArray(), labelAndFeatures))
  Y = np.asarray(map(lambda v: v[0], labelAndFeatures))
  xgbClassifier = xgb.XGBClassifier(max_depth=3, seed=18238, 
objective='binary:logistic' )
  model =  xgbClassifier.fit(X, Y)
  return [partitionKey, model]

xgbModels = inputTrainingDF\
.select("education", "label", "features")\
.rdd\
.map(lambda row: [row[0], [row[1], row[2]]])\
.groupByKey()\
.map(lambda v: trainXGbModel(v[0], list(v[1])))

xgbModels.take(1)
ValueError: bad input shape ()
##################################

Could someone please try to look at this?

Thank you for your time and research!

Reply via email to