RE: How to read a Multi Line json object via Spark

2016-11-14 Thread Kappaganthu, Sivaram (ES)
.map(lambda x : re.sub(r"\s+", "", x, \ flags=re.UNICODE)) Thanks, SIvaram From: Hyukjin Kwon [mailto:gurwls...@gmail.com] Sent: Wednesday, October 12, 2016 11:45 AM To: Kappaganthu, Sivaram (ES) Cc: Luciano Resende; Jean Georges Perrin; user @spa

Help needed in parsing JSon with nested structures

2016-10-31 Thread Kappaganthu, Sivaram (ES)
Hello All, I am processing a nested complex Json and below is the schema for it. root |-- businessEntity: array (nullable = true) ||-- element: struct (containsNull = true) |||-- payGroup: array (nullable = true) ||||-- element: struct (containsNull = true) |||

RE: how to extract arraytype data to file

2016-10-18 Thread Kappaganthu, Sivaram (ES)
There is an option called Explode for this . From: lk_spark [mailto:lk_sp...@163.com] Sent: Wednesday, October 19, 2016 9:06 AM To: user.spark Subject: how to extract arraytype data to file hiļ¼Œall: I want to read a json file and search it by sql . the data struct should be : bid: string

RE: JSON Arrays and Spark

2016-10-12 Thread Kappaganthu, Sivaram (ES)
Hi, Does this mean that handling any Json with kind of below schema with spark is not a good fit?? I have requirement to parse the below Json that spans across multiple lines. Whats the best way to parse the jsns of this kind?? Please suggest. root |-- maindate: struct (nullable = true) |

Spark Streaming-- for each new file in HDFS

2016-09-15 Thread Kappaganthu, Sivaram (ES)
Hello, I am a newbie to spark and I have below requirement. Problem statement : A third party application is dumping files continuously in a server. Typically the count of files is 100 files per hour and each file is of size less than 50MB. My application has to process those files. Here