Hi,

I'm working on supporting SchemaRDD in Elasticsearch Hadoop [1] but I'm having some issues with the SQL API, in particular in what the DataTypes translate to.

1. A SchemaRDD is composed of a Row and StructType - I'm using the latter to decompose a Row into primitives. I'm not clear however how to deal with _rich_ types, namely array, map and struct.
MapType gives me type information about the key and its value however what's 
the actual Map object? j.u.Map, scala.Map?
For example assuming row(0) has a MapType associated with it, to what do I cast 
row(0)?
Same goes for StructType; if row(1) has a StructType associated with it, do I 
cast the value to Row?

2. Similar to the above, I've noticed the Row interface has cast methods so ideally one should use row(index).getFloat|Integer|Boolean etc... but I didn't see any methods for Binary or Decimal. Also the _rich_ types are missing; I presume this is for pluggability reasons however whats the generic way to access/unwrap the generic Any/Object in this case to the desired DataType?

3. On a separate note, for RDDs containing just values (think CSV,TSV files) is there an option to have a header associated with it without having to wrap each row with a case class? As each entry has exactly the same structure, the wrapping is just overhead that doesn't provide any extra information (you know the structure of one row, you know it for all of them).

Thanks,

[1] github.com/elasticsearch/elasticsearch-hadoop
--
Costin

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to