my little program prints out query.lastProgress every 10 seconds, and this is what it shows:
{ "id" : "4cc09da1-c002-4fa7-86dd-ba03018e53a0", "runId" : "bf8cdde2-44d0-4bff-ad90-7cbac0432099", "name" : "wiki", "timestamp" : "2017-01-26T22:54:45.732Z", "numInputRows" : 0, "inputRowsPerSecond" : 0.0, "processedRowsPerSecond" : 0.0, "durationMs" : { "getOffset" : 9, "triggerExecution" : 10 }, "stateOperators" : [ ], "sources" : [ { "description" : "KafkaSource[Subscribe[wikipedia]]", "startOffset" : { "wikipedia" : { "2" : 0, "4" : 0, "1" : 0, "3" : 0, "0" : 0 } }, "endOffset" : { "wikipedia" : { "2" : 0, "4" : 0, "1" : 0, "3" : 0, "0" : 0 } }, "numInputRows" : 0, "inputRowsPerSecond" : 0.0, "processedRowsPerSecond" : 0.0 } ], "sink" : { "description" : "org.apache.spark.sql.execution.streaming.ConsoleSink@ 4818d2d9" } } { "id" : "4cc09da1-c002-4fa7-86dd-ba03018e53a0", "runId" : "bf8cdde2-44d0-4bff-ad90-7cbac0432099", "name" : "wiki", "timestamp" : "2017-01-26T22:54:55.745Z", "numInputRows" : 0, "inputRowsPerSecond" : 0.0, "processedRowsPerSecond" : 0.0, "durationMs" : { "getOffset" : 5, "triggerExecution" : 5 }, "stateOperators" : [ ], "sources" : [ { "description" : "KafkaSource[Subscribe[wikipedia]]", "startOffset" : { "wikipedia" : { "2" : 0, "4" : 0, "1" : 0, "3" : 0, "0" : 0 } }, "endOffset" : { "wikipedia" : { "2" : 0, "4" : 0, "1" : 0, "3" : 0, "0" : 0 } }, "numInputRows" : 0, "inputRowsPerSecond" : 0.0, "processedRowsPerSecond" : 0.0 } ], "sink" : { "description" : "org.apache.spark.sql.execution.streaming.ConsoleSink@ 4818d2d9" } } { "id" : "4cc09da1-c002-4fa7-86dd-ba03018e53a0", "runId" : "bf8cdde2-44d0-4bff-ad90-7cbac0432099", "name" : "wiki", "timestamp" : "2017-01-26T22:55:05.748Z", "numInputRows" : 0, "inputRowsPerSecond" : 0.0, "processedRowsPerSecond" : 0.0, "durationMs" : { "getOffset" : 5, "triggerExecution" : 5 }, "stateOperators" : [ ], "sources" : [ { "description" : "KafkaSource[Subscribe[wikipedia]]", "startOffset" : { "wikipedia" : { "2" : 0, "4" : 0, "1" : 0, "3" : 0, "0" : 0 } }, "endOffset" : { "wikipedia" : { "2" : 0, "4" : 0, "1" : 0, "3" : 0, "0" : 0 } }, "numInputRows" : 0, "inputRowsPerSecond" : 0.0, "processedRowsPerSecond" : 0.0 } ], "sink" : { "description" : "org.apache.spark.sql.execution.streaming.ConsoleSink@ 4818d2d9" } } { "id" : "4cc09da1-c002-4fa7-86dd-ba03018e53a0", "runId" : "bf8cdde2-44d0-4bff-ad90-7cbac0432099", "name" : "wiki", "timestamp" : "2017-01-26T22:55:15.758Z", "numInputRows" : 0, "inputRowsPerSecond" : 0.0, "processedRowsPerSecond" : 0.0, "durationMs" : { "getOffset" : 4, "triggerExecution" : 4 }, "stateOperators" : [ ], "sources" : [ { "description" : "KafkaSource[Subscribe[wikipedia]]", "startOffset" : { "wikipedia" : { "2" : 0, "4" : 0, "1" : 0, "3" : 0, "0" : 0 } }, "endOffset" : { "wikipedia" : { "2" : 0, "4" : 0, "1" : 0, "3" : 0, "0" : 0 } }, "numInputRows" : 0, "inputRowsPerSecond" : 0.0, "processedRowsPerSecond" : 0.0 } ], "sink" : { "description" : "org.apache.spark.sql.execution.streaming.ConsoleSink@ 4818d2d9" } } { "id" : "4cc09da1-c002-4fa7-86dd-ba03018e53a0", "runId" : "bf8cdde2-44d0-4bff-ad90-7cbac0432099", "name" : "wiki", "timestamp" : "2017-01-26T22:55:25.760Z", "numInputRows" : 0, "inputRowsPerSecond" : 0.0, "processedRowsPerSecond" : 0.0, "durationMs" : { "getOffset" : 4, "triggerExecution" : 4 }, "stateOperators" : [ ], "sources" : [ { "description" : "KafkaSource[Subscribe[wikipedia]]", "startOffset" : { "wikipedia" : { "2" : 0, "4" : 0, "1" : 0, "3" : 0, "0" : 0 } }, "endOffset" : { "wikipedia" : { "2" : 0, "4" : 0, "1" : 0, "3" : 0, "0" : 0 } }, "numInputRows" : 0, "inputRowsPerSecond" : 0.0, "processedRowsPerSecond" : 0.0 } ], "sink" : { "description" : "org.apache.spark.sql.execution.streaming.ConsoleSink@ 4818d2d9" } } { "id" : "4cc09da1-c002-4fa7-86dd-ba03018e53a0", "runId" : "bf8cdde2-44d0-4bff-ad90-7cbac0432099", "name" : "wiki", "timestamp" : "2017-01-26T22:55:35.766Z", "numInputRows" : 0, "inputRowsPerSecond" : 0.0, "processedRowsPerSecond" : 0.0, "durationMs" : { "getOffset" : 4, "triggerExecution" : 4 }, "stateOperators" : [ ], "sources" : [ { "description" : "KafkaSource[Subscribe[wikipedia]]", "startOffset" : { "wikipedia" : { "2" : 0, "4" : 0, "1" : 0, "3" : 0, "0" : 0 } }, "endOffset" : { "wikipedia" : { "2" : 0, "4" : 0, "1" : 0, "3" : 0, "0" : 0 } }, "numInputRows" : 0, "inputRowsPerSecond" : 0.0, "processedRowsPerSecond" : 0.0 } ], "sink" : { "description" : "org.apache.spark.sql.execution.streaming.ConsoleSink@ 4818d2d9" } } On Thu, Jan 26, 2017 at 10:33 PM, Koert Kuipers <ko...@tresata.com> wrote: > hey, > i am just getting started with kafka + spark structured streaming. so this > is probably a pretty dumb mistake. > > i wrote a little program in spark to read messages from a kafka topic and > display them in the console, using the kafka source and console sink. i run > it it in spark local mode. > > i hooked it up to a test topic that i send messages to using the kafka > console producer, and everything works great. i type a message in the > console producer, and it pops up in my spark program. very neat! > > next i point it to another topic instead on which a kafka-connect program > is writing lots of irc messages. i can see kafka connect to the topic > successfully, the partitions are discovered etc., and then... nothing. it > just keeps stuck at offsets 0 for all partitions. at the same time in > another terminal i can see messages coming in just fine using the kafka > console consumer. > > i dont get it. why doesnt kafka want to consume from this topic in spark > structured streaming? > > thanks! koert > > >