Thanks Nicholas. It makes sense. Now that I have a hint, I can play with it too!

jg

> On Feb 11, 2018, at 19:15, Nicholas Hakobian 
> <nicholas.hakob...@rallyhealth.com> wrote:
> 
> I spent a few minutes poking around in the source code and found this:
> 
> The data type representing None, used for the types that cannot be inferred.
> 
> https://github.com/apache/spark/blob/branch-2.1/python/pyspark/sql/types.py#L107-L113
>  
> <https://github.com/apache/spark/blob/branch-2.1/python/pyspark/sql/types.py#L107-L113>
> 
> Playing around a bit, this is the only use case that I could immediately come 
> up with; you have some type of a placeholder field already in data, but its 
> always null. If you let createDataFrame (and I bet other things like 
> DataFrameReader would behave similarly) try to infer it directly, it will 
> error out since it can't infer the schema automatically. Doing something like 
> below will allow the data to be used. And, if memory serves, Hive has a 
> concept of a Null data type also for these types of situations.
> 
> In [9]: df = spark.createDataFrame([Row(id=1, val=None), Row(id=2, 
> val=None)], schema=StructType([StructField('id', LongType()), 
> StructField('val', NullType())]))
> 
> In [10]: df.show()
> +---+----+
> | id| val|
> +---+----+
> |  1|null|
> |  2|null|
> +---+----+
> 
> 
> In [11]: df.printSchema()
> root
>  |-- id: long (nullable = true)
>  |-- val: null (nullable = true)
> 
> 
> Nicholas Szandor Hakobian, Ph.D.
> Staff Data Scientist
> Rally Health
> nicholas.hakob...@rallyhealth.com <mailto:nicholas.hakob...@rallyhealth.com>
> 
> 
> On Sun, Feb 11, 2018 at 5:40 AM, Jean Georges Perrin <j...@jgp.net 
> <mailto:j...@jgp.net>> wrote:
> What is the purpose of DataTypes.NullType, specially as you are building a 
> schema? Have anyone used it or seen it as spart of a schema auto-generation?
> 
> 
> (If I keep asking long enough, I may get an answer, no? :) )
> 
> 
> > On Feb 4, 2018, at 13:15, Jean Georges Perrin <j...@jgp.net 
> > <mailto:j...@jgp.net>> wrote:
> >
> > Any taker on this one? ;)
> >
> >> On Jan 29, 2018, at 16:05, Jean Georges Perrin <j...@jgp.net 
> >> <mailto:j...@jgp.net>> wrote:
> >>
> >> Hi Sparkians,
> >>
> >> Can someone tell me what is the purpose of DataTypes.NullType, specially 
> >> as you are building a schema?
> >>
> >> Thanks
> >>
> >> jg
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> >> <mailto:user-unsubscr...@spark.apache.org>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> > <mailto:user-unsubscr...@spark.apache.org>
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> 
> 

Reply via email to