Hi,

We solved this the ugly way, when parsing external column definitions:

private def columnTypeToFieldType(columnType: String): DataType = {
  columnType match {
    case "IntegerType" => IntegerType
    case "StringType" => StringType
    case "DateType" => DateType
    case "FloatType" => FloatType
    case "DecimalType" => DecimalType.SYSTEM_DEFAULT
    case "TimeStampType" => TimestampType
    case "BooleanType" => BooleanType
    case _ => throw new IllegalArgumentException(s"ColumnType
$columnType is not known " +
      s"please add it in the ${this.getClass.getName} class!")
  }
}

There may be a prettier solution than this, but especially with
DecimalType, there are limitations where even with Reflection and
Class.forName, it's not trivial (i.e.
Class.forName(s"org.apache.spark.sql.types.$columnType"))
Furthermore, getting a companion object for a class name is a bit uglier
than getting just the class, see
https://stackoverflow.com/questions/11020746/get-companion-object-instance-with-new-scala-reflection-api
Since the number of types can be expected to be roughly constant, you only
have to overhead of Scala's matching engine, in the ugly solution. In our
case, the effort of engineering something was outshone by a simple method
that might rarely fail, but then does so in a mostly understandable way.

N.B.: mapping isn't complete -- complex types weren't in our scope.

On Fri, Jan 26, 2018 at 8:11 AM, Kurt Fehlhauer <kfehl...@gmail.com> wrote:

> Can you share your code and a sample of your data? WIthout seeing it, I
> can't give a definitive answer. I can offer some hints. If you have a
> column of strings you should either be able to create a new column casted
> to Integer. This can be accomplished two ways:
>
> df.withColumn("newColumn", df.currentColumn.cast(IntegerType))
>
> or
>
> val df = df.select("cast(CurretColumn as int) newColum")
>
>
> Without seeing your json, I really can't offer assistance.
>
>
> On Thu, Jan 25, 2018 at 11:39 PM, kant kodali <kanth...@gmail.com> wrote:
>
>> It seems like its hard to construct a DataType given its String literal
>> representation.
>>
>> dataframe.types() return column names and its corresponding Types. for
>> example say I have an integer column named "sum" doing dataframe.dtypes()
>> would return "sum" and "IntegerType" but this string  representation
>> "IntegerType" doesnt seem to be very useful because I cannot do
>> DataType.fromJson("IntegerType") This will throw an error. so I am not
>> quite sure how to construct a DataType given its String representation ?
>>
>> On Thu, Jan 25, 2018 at 4:22 PM, kant kodali <kanth...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I have a datatype "IntegerType" represented as a String and now I want
>>> to create DataType object out of that. I couldn't find in the DataType or
>>> DataTypes api on how to do that?
>>>
>>> Thanks!
>>>
>>
>>
>

Reply via email to