Hi,

i am trying to create dummy variables in sparkR by creating new columns for
categorical variables. But it is not appending the columns


df <- createDataFrame(sqlContext, iris)
class(dtypes(df))

cat.column<-vector(mode="character",length=nrow(df))
cat.column<-collect(select(df,df$Species))
lev<-length(levels(as.factor(unlist(cat.column))))
varb.names<-vector(mode="character",length=lev)
for (i in 1:lev){

  varb.names[i]<-paste0(colnames(cat.column),i)

}

for (j in 1:lev)

{

   dummy.df.new<-withColumn(df,paste0(colnames
   (cat.column),j),if else(df$Species==levels(as.factor(un list(cat.column))
   [j],1,0) )

}

I am getting the below output for

head(dummy.df.new)

output:

  Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1
1          5.1         3.5          1.4         0.2  setosa        1
2          4.9         3.0          1.4         0.2  setosa        1
3          4.7         3.2          1.3         0.2  setosa        1
4          4.6         3.1          1.5         0.2  setosa        1
5          5.0         3.6          1.4         0.2  setosa        1
6          5.4         3.9          1.7         0.4  setosa        1

Problem: Species2 and Species3 column are not getting added to the dataframe

-- 
Warm regards,
Devesh.

Reply via email to