Thanks Nicholas. It makes sense. Now that I have a hint, I can play with it too!
jg > On Feb 11, 2018, at 19:15, Nicholas Hakobian > <nicholas.hakob...@rallyhealth.com> wrote: > > I spent a few minutes poking around in the source code and found this: > > The data type representing None, used for the types that cannot be inferred. > > https://github.com/apache/spark/blob/branch-2.1/python/pyspark/sql/types.py#L107-L113 > > <https://github.com/apache/spark/blob/branch-2.1/python/pyspark/sql/types.py#L107-L113> > > Playing around a bit, this is the only use case that I could immediately come > up with; you have some type of a placeholder field already in data, but its > always null. If you let createDataFrame (and I bet other things like > DataFrameReader would behave similarly) try to infer it directly, it will > error out since it can't infer the schema automatically. Doing something like > below will allow the data to be used. And, if memory serves, Hive has a > concept of a Null data type also for these types of situations. > > In [9]: df = spark.createDataFrame([Row(id=1, val=None), Row(id=2, > val=None)], schema=StructType([StructField('id', LongType()), > StructField('val', NullType())])) > > In [10]: df.show() > +---+----+ > | id| val| > +---+----+ > | 1|null| > | 2|null| > +---+----+ > > > In [11]: df.printSchema() > root > |-- id: long (nullable = true) > |-- val: null (nullable = true) > > > Nicholas Szandor Hakobian, Ph.D. > Staff Data Scientist > Rally Health > nicholas.hakob...@rallyhealth.com <mailto:nicholas.hakob...@rallyhealth.com> > > > On Sun, Feb 11, 2018 at 5:40 AM, Jean Georges Perrin <j...@jgp.net > <mailto:j...@jgp.net>> wrote: > What is the purpose of DataTypes.NullType, specially as you are building a > schema? Have anyone used it or seen it as spart of a schema auto-generation? > > > (If I keep asking long enough, I may get an answer, no? :) ) > > > > On Feb 4, 2018, at 13:15, Jean Georges Perrin <j...@jgp.net > > <mailto:j...@jgp.net>> wrote: > > > > Any taker on this one? ;) > > > >> On Jan 29, 2018, at 16:05, Jean Georges Perrin <j...@jgp.net > >> <mailto:j...@jgp.net>> wrote: > >> > >> Hi Sparkians, > >> > >> Can someone tell me what is the purpose of DataTypes.NullType, specially > >> as you are building a schema? > >> > >> Thanks > >> > >> jg > >> --------------------------------------------------------------------- > >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> <mailto:user-unsubscr...@spark.apache.org> > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > <mailto:user-unsubscr...@spark.apache.org> > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> > >