[ 
https://issues.apache.org/jira/browse/SPARK-17043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-17043.
-------------------------------
    Resolution: Duplicate

> Cannot call zipWithIndex on RDD with more than 200 columns (get wrong result)
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-17043
>                 URL: https://issues.apache.org/jira/browse/SPARK-17043
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.2, 2.0.0
>            Reporter: Barry Becker
>
> I have a method that adds a row index column to a dataframe. It only works 
> correctly if the dataframe has less than 200 columns. When more than 200 
> columns nearly all the data becomes empty (""'s for values).
> {code}
> def zipWithIndex(df: DataFrame, rowIdxColName: String): DataFrame = {
>     val nullable = false
>      df.sparkSession.createDataFrame(
>       df.rdd.zipWithIndex.map{case (row, i) => Row.fromSeq(row.toSeq :+ i)},
>       StructType(df.schema.fields :+ StructField(rowIdxColName, LongType, 
> nullable))
>     )
>   }
> {code}
> This might be related to https://issues.apache.org/jira/browse/SPARK-16664 
> but I'm not sure. I saw the 200 column threshold and it made me think it 
> might be related. I saw this problem in spark 1.6.2 and 2.0.0. Maybe it is 
> fixed in 2.0.1 (have not tried yet). I have no idea why the 200 column 
> threshold is significant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to