[ https://issues.apache.org/jira/browse/SPARK-16170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346420#comment-15346420 ]
Federico Ponzi commented on SPARK-16170: ---------------------------------------- Hi, and thanks for the response. I've setted this as a bug instead of an improvement because if I do: {code} i = [(1, "rol"), (2.4, "str")] rdd = sc.parallelize(i) sqlContext.createDataFrame(i, schema=sch) {code} Running this I get this output: {code} Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/spark/python/pyspark/sql/context.py", line 438, in createDataFrame rdd, schema = self._createFromLocal(map(prepare, data), schema) File "/usr/local/spark/python/pyspark/sql/context.py", line 306, in _createFromLocal data = list(data) File "/usr/local/spark/python/pyspark/sql/context.py", line 423, in prepare _verify_type(obj, schema) File "/usr/local/spark/python/pyspark/sql/types.py", line 1311, in _verify_type _verify_type(v, f.dataType, f.nullable) File "/usr/local/spark/python/pyspark/sql/types.py", line 1283, in _verify_type raise TypeError("%s can not accept object %r in type %s" % (dataType, obj, type(obj))) TypeError: LongType can not accept object 2.4 in type <type 'float'> {code} > Throw error when row is not schema-compatible > --------------------------------------------- > > Key: SPARK-16170 > URL: https://issues.apache.org/jira/browse/SPARK-16170 > Project: Spark > Issue Type: Improvement > Reporter: Federico Ponzi > Priority: Minor > > We are using Spark to import some data from mysql. > We just found that many of our imports are useless because our import > function was wrongly forcing the longtype to a float column. > Consider this example: > {code} > from pyspark.sql.types import * > sqlContext = SQLContext(sc) > sch = StructType([StructField("id", LongType(), True), StructField("rol", > StringType(), True)]) > i = ['{"id": 1, "rol": "str"}', '{"id": 2.4, "rol": "str"}'] > rdd = sc.parallelize(i) > df = sqlContext.read.json(rdd, schema=sch) > print df.collect() > {code} > The output is: > {code} > [Row(id=1, rol=u'str'), Row(id=None, rol=None)] > {code} > Every column in the second row is null, not only id which has a wrong > datatype and no error is triggered. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org