You can resort to |SQLContext.jsonFile(path: String, samplingRate:
Double)| and set |samplingRate| to 1.0, so that all the columns can be
inferred.
You can also use |SQLContext.applySchema| to specify your own schema
(which is a |StructType|).
On 10/22/14 5:56 AM, Harivardan Jayaraman wrote:
Hi,
I have unstructured JSON as my input which may have extra columns row
to row. I want to store these json rows using HiveContext so that it
can be accessed from the JDBC Thrift Server.
I notice there are primarily only two methods available on the
SchemaRDD for data - saveAsTable and insertInto. One defines the
schema while the other can be used to insert in to the table, but
there is no way to Alter the table and add columns to it.
How do I do this?
One option that I thought of is to write native "CREATE TABLE..." and
"ALTER TABLE.." statements but just does not seem feasible because at
every step, I will need to query Hive to determine what is the current
schema and make a decision whether I should add columns to it or not.
Any thoughts? Has anyone been able to do this?