Re: [Spark SQL] Making InferSchema and JacksonParser public

2017-01-18 Thread Brian Hong
Yes that is the option I took while implementing this under Spark 1.4. But every time there is a major update in Spark, I needed to re-copy the needed parts, which is very time consuming. The reason is that InferSchema and JacksonParser uses many more Spark internal methods, which makes this very

Re: [Spark SQL] Making InferSchema and JacksonParser public

2017-01-18 Thread Reynold Xin
That is internal, but the amount of code is not a lot. Can you just copy the relevant classes over to your project? On Wed, Jan 18, 2017 at 5:52 AM Brian Hong wrote: > I work for a mobile game company. I'm solving a simple question: "Can we > efficiently/cheaply query for the log of a particular

Re: [Spark SQL] Making InferSchema and JacksonParser public

2017-01-18 Thread Michael Allman
Personally I'd love to see some kind of pluggability, configurability in the JSON schema parsing, maybe as an option in the DataFrameReader. Perhaps you can propose an API? > On Jan 18, 2017, at 5:51 AM, Brian Hong wrote: > > I work for a mobile game company. I'm solving a simple question: "Ca

[Spark SQL] Making InferSchema and JacksonParser public

2017-01-18 Thread Brian Hong
I work for a mobile game company. I'm solving a simple question: "Can we efficiently/cheaply query for the log of a particular user within given date period?" I've created a special JSON text-based file format that has these traits: - Snappy compressed, saved in AWS S3 - Partitioned by date. ie.