I had a very similar problem and solved it with Hive and ORC files using
the Spark SQLContext.
* Create a table in Hive stored as an ORC file (I recommend using
partitioning too)
* Use SQLContext.sql to Insert data into the table
* Use SQLContext.sql to periodically run ALTER TABLE...CONCATENATE
Hello,
Given the following example customers.json file:
{
name: Sherlock Holmes,
customerNumber: 12345,
address: {
street: 221b Baker Street,
city: London,
zipcode: NW1 6XE,
country: United Kingdom
}
},
{
name: Big Bird,
customerNumber: 10001,
address: {
street: 123 Sesame Street,
city: