Re: Structured Stream in Spark

Shixiong(Ryan) Zhu Fri, 27 Oct 2017 13:22:44 -0700

The codes in the link write the data into files. Did you check the output
location?


By the way, if you want to see the data on the console, you can use the
console sink by changing this line *format("parquet").option("path",
outputPath + "/ETL").partitionBy("creationTime").start()* to
*format("console").start().*

On Fri, Oct 27, 2017 at 8:41 AM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:

> Hi TathagataDas,
>
> I was trying to use eventhub with spark streaming. Looks like I was able
> to make connection successfully but cannot see any data on the console. Not
> sure if eventhub is supported or not.
>
> https://github.com/Azure/spark-eventhubs/blob/master/
> examples/src/main/scala/com/microsoft/spark/sql/examples/
> EventHubsStructuredStreamingExample.scala
> is the code snippet I have used to connect to eventhub
>
> Thanks,
> Asmath
>
>
>
> On Thu, Oct 26, 2017 at 9:39 AM, KhajaAsmath Mohammed <
> mdkhajaasm...@gmail.com> wrote:
>
>> Thanks TD.
>>
>> On Wed, Oct 25, 2017 at 6:42 PM, Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>>
>>> Please do not confuse old Spark Streaming (DStreams) with Structured
>>> Streaming. Structured Streaming's offset and checkpoint management is far
>>> more robust than DStreams.
>>> Take a look at my talk - https://spark-summit.org/201
>>> 7/speakers/tathagata-das/
>>>
>>> On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed <
>>> mdkhajaasm...@gmail.com> wrote:
>>>
>>>> Thanks Subhash.
>>>>
>>>> Have you ever used zero data loss concept with streaming. I am bit
>>>> worried to use streamig when it comes to data loss.
>>>>
>>>> https://blog.cloudera.com/blog/2017/06/offset-management-for
>>>> -apache-kafka-with-apache-spark-streaming/
>>>>
>>>>
>>>> does structured streaming handles it internally?
>>>>
>>>> On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <
>>>> subhash.sri...@gmail.com> wrote:
>>>>
>>>>> No problem! Take a look at this:
>>>>>
>>>>> http://spark.apache.org/docs/latest/structured-streaming-pro
>>>>> gramming-guide.html#recovering-from-failures-with-checkpointing
>>>>>
>>>>> Thanks,
>>>>> Subhash
>>>>>
>>>>> On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <
>>>>> mdkhajaasm...@gmail.com> wrote:
>>>>>
>>>>>> Hi Sriram,
>>>>>>
>>>>>> Thanks. This is what I was looking for.
>>>>>>
>>>>>> one question, where do we need to specify the checkpoint directory in
>>>>>> case of structured streaming?
>>>>>>
>>>>>> Thanks,
>>>>>> Asmath
>>>>>>
>>>>>> On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <
>>>>>> subhash.sri...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Asmath,
>>>>>>>
>>>>>>> Here is an example of using structured streaming to read from Kafka:
>>>>>>>
>>>>>>> https://github.com/apache/spark/blob/master/examples/src/mai
>>>>>>> n/scala/org/apache/spark/examples/sql/streaming/StructuredKa
>>>>>>> fkaWordCount.scala
>>>>>>>
>>>>>>> In terms of parsing the JSON, there is a from_json function that you
>>>>>>> can use. The following might help:
>>>>>>>
>>>>>>> https://databricks.com/blog/2017/02/23/working-complex-data-
>>>>>>> formats-structured-streaming-apache-spark-2-1.html
>>>>>>>
>>>>>>> I hope this helps.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Subhash
>>>>>>>
>>>>>>> On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <
>>>>>>> mdkhajaasm...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Could anyone provide suggestions on how to parse json data from
>>>>>>>> kafka and load it back in hive.
>>>>>>>>
>>>>>>>> I have read about structured streaming but didn't find any
>>>>>>>> examples. is there any best practise on how to read it and parse it 
>>>>>>>> with
>>>>>>>> structured streaming for this use case?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Asmath
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Structured Stream in Spark

Reply via email to