Re: Read multiline JSON/XML

2019-12-01 Thread vino yang
Also, say sorry to Flavio! Best, Vino vino yang 于2019年12月2日周一 上午10:29写道: > Hi Chesnay, > > Sorry, yes, I lost the "like" keyword. I mistakenly thought he wanted to > ask how to use Spark to accomplish this job. > > Best, > Vino > > Chesnay Schepler 于2019年11月29日周五 下午10:01写道: > >> Why vino? >> >

Re: Read multiline JSON/XML

2019-11-29 Thread Flavio Pompermaier
Parallel files processing would be enough, inner file parallelism would be awesome but it's a plus On Fri, Nov 29, 2019 at 3:46 PM Arvid Heise wrote: > A while ago, I implemented XML and Json input formats. However, having > proper split support for structured formats without sync markers is not

Re: Read multiline JSON/XML

2019-11-29 Thread Arvid Heise
A while ago, I implemented XML and Json input formats. However, having proper split support for structured formats without sync markers is not that easy. Any split that has a random start offset need to figure out the start of the next record on its own, which is fragile by definition. That's why s

Re: Read multiline JSON/XML

2019-11-29 Thread Chesnay Schepler
I know that at least the Table API can read json, but I don't know how well this translates into other APIs. On 29/11/2019 12:09, Flavio Pompermaier wrote: Hi to all, is there any out-of-the-box opt

Re: Read multiline JSON/XML

2019-11-29 Thread Suneel Marthi
For XML, u could look at Mahout's XMLInputFormat (if u r using HadoopInput Format). On Fri, Nov 29, 2019 at 9:01 AM Chesnay Schepler wrote: > Why vino? > > He's specifically asking whether Flink offers something _like_ spark. > > On 29/11/2019 14:39, vino yang wrote: > > Hi Flavio, > > IMO, it w

Re: Read multiline JSON/XML

2019-11-29 Thread Chesnay Schepler
Why vino? He's specifically asking whether Flink offers something _like_ spark. On 29/11/2019 14:39, vino yang wrote: Hi Flavio, IMO, it would take more effect to ask this question in the Spark user mailing list. WDYT? Best, Vino Flavio Pompermaier > 于2019年11月

Re: Read multiline JSON/XML

2019-11-29 Thread vino yang
Hi Flavio, IMO, it would take more effect to ask this question in the Spark user mailing list. WDYT? Best, Vino Flavio Pompermaier 于2019年11月29日周五 下午7:09写道: > Hi to all, > is there any out-of-the-box option to read multiline JSON or XML like in > Spark? > It would be awesome to have something

Read multiline JSON/XML

2019-11-29 Thread Flavio Pompermaier
Hi to all, is there any out-of-the-box option to read multiline JSON or XML like in Spark? It would be awesome to have something like spark.read .option("multiline", true) .json("/path/to/user.json") Best, Flavio