Re: HDFS small file generation problem

2015-10-02 Thread Brett Antonides
I had a very similar problem and solved it with Hive and ORC files using the Spark SQLContext. * Create a table in Hive stored as an ORC file (I recommend using partitioning too) * Use SQLContext.sql to Insert data into the table * Use SQLContext.sql to periodically run ALTER TABLE...CONCATENATE

Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Brett Antonides
Hello, Given the following example customers.json file: { name: Sherlock Holmes, customerNumber: 12345, address: { street: 221b Baker Street, city: London, zipcode: NW1 6XE, country: United Kingdom } }, { name: Big Bird, customerNumber: 10001, address: { street: 123 Sesame Street, city: