Hi Lian, "What have you tried?" would be a good starting point. Any help on this?
How do you read the JSONs? readStream.json? You could use readStream.text followed by filter to include/exclude good/bad JSONs. Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski On Sat, Feb 9, 2019 at 8:25 PM Lian Jiang <jiangok2...@gmail.com> wrote: > Hi, > > We have a structured streaming job that converting json into parquets. We > want to validate the json records. If a json record is not valid, we want > to log a message and refuse to write it into the parquet. Also the json has > nesting jsons and we want to flatten the nesting jsons into other parquets > by using the same streaming job. My questions are: > > 1. how to validate the json records in a structured streaming job? > 2. how to flattening the nesting jsons in a structured streaming job? > 3. is it possible to use one structured streaming job to validate json, > convert json into a parquet and convert nesting jsons into other parquets? > > I think unstructured streaming can achieve these goals but structured > streaming is recommended by spark community. > > Appreciate your feedback! >