You can resort to |SQLContext.jsonFile(path: String, samplingRate: Double)| and set |samplingRate| to 1.0, so that all the columns can be inferred.

You can also use |SQLContext.applySchema| to specify your own schema (which is a |StructType|).

On 10/22/14 5:56 AM, Harivardan Jayaraman wrote:

Hi,
I have unstructured JSON as my input which may have extra columns row to row. I want to store these json rows using HiveContext so that it can be accessed from the JDBC Thrift Server. I notice there are primarily only two methods available on the SchemaRDD for data - saveAsTable and insertInto. One defines the schema while the other can be used to insert in to the table, but there is no way to Alter the table and add columns to it.
How do I do this?

One option that I thought of is to write native "CREATE TABLE..." and "ALTER TABLE.." statements but just does not seem feasible because at every step, I will need to query Hive to determine what is the current schema and make a decision whether I should add columns to it or not.

Any thoughts? Has anyone been able to do this?

Reply via email to