Hasil Sharma created SPARK-30006: ------------------------------------ Summary: printSchema indeterministic output Key: SPARK-30006 URL: https://issues.apache.org/jira/browse/SPARK-30006 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Hasil Sharma
printSchema doesn't give a consistent output in following example. ```python from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder.appName("new-session").getOrCreate() l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] rdd = spark.sparkContext.parallelize(l) people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) df1 = spark.createDataFrame(people_1) print(df1.printSchema()) df2 = df1.select("name", "age") print(df2.printSchema()) ``` first print outputs ``` root |-- age: long (nullable = true) |-- name: string (nullable = true) ``` second print outputs ``` root |-- name: string (nullable = true) |-- age: long (nullable = true) ``` Expectation: The output should be same because the column names are same. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org