[ https://issues.apache.org/jira/browse/SPARK-21978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162749#comment-16162749 ]
Hyukjin Kwon commented on SPARK-21978: -------------------------------------- Not sure. It sounds rather a niche use case. As a workaround, we could just disable {{inferSchema}} or manually change it after only getting the schema, manually changing it and setting it. For example: {code} schema = spark.read.csv("...", inferSchema=True).schema # Update `schema` spark.read.schema(schema).csv("...").show() {code} Do you maybe have a reference to support this idea, for example, in {{read.csv}} at R or other CSV parsing libraries? > schemaInference option not to convert strings with leading zeros to int/long > ----------------------------------------------------------------------------- > > Key: SPARK-21978 > URL: https://issues.apache.org/jira/browse/SPARK-21978 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.1.0, 2.1.1, 2.2.0, 2.3.0 > Reporter: Ruslan Dautkhanov > Labels: csv, csvparser, easy-fix, inference, ramp-up, schema > > It would be great to have an option in Spark's schema inference to *not* to > convert to int/long datatype a column that has leading zeros. Think zip > codes, for example. > {code} > df = (sqlc.read.format('csv') > .option('inferSchema', True) > .option('header', True) > .option('delimiter', '|') > .option('leadingZeros', 'KEEP') # this is the new > proposed option > .option('mode', 'FAILFAST') > .load('csvfile_withzipcodes_to_ingest.csv') > ) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org