SparkSQL DataType mappings

Costin Leau Tue, 30 Sep 2014 02:06:08 -0700

Hi,

I'm working on supporting SchemaRDD in Elasticsearch Hadoop [1] but I'm having some issues with the SQL API, inparticular in what the DataTypes translate to.

1. A SchemaRDD is composed of a Row and StructType - I'm using the latter to decompose a Row into primitives. I'm notclear however how to deal with _rich_ types, namely array, map and struct.

MapType gives me type information about the key and its value however what's 
the actual Map object? j.u.Map, scala.Map?
For example assuming row(0) has a MapType associated with it, to what do I cast 
row(0)?
Same goes for StructType; if row(1) has a StructType associated with it, do I 
cast the value to Row?

2. Similar to the above, I've noticed the Row interface has cast methods so ideally one should userow(index).getFloat|Integer|Boolean etc... but I didn't see any methods for Binary or Decimal. Also the _rich_ types aremissing; I presume this is for pluggability reasons however whats the generic way to access/unwrap the genericAny/Object in this case to the desired DataType?

3. On a separate note, for RDDs containing just values (think CSV,TSV files) is there an option to have a headerassociated with it without having to wrap each row with a case class? As each entry has exactly the same structure, thewrapping is just overhead that doesn't provide any extra information (you know the structure of one row, you know it forall of them).


Thanks,

[1] github.com/elasticsearch/elasticsearch-hadoop
--
Costin

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

SparkSQL DataType mappings

Reply via email to