Hi Guys, Anyone could explain me how to work Kafka with Spark, I am using the JavaKafkaWordCount.java like a test and the line command is:
./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3 and like a producer I am using this command: rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt -l 100 -n 10 rdkafka_cachesender is a program that was developed by me which send to kafka the output.txt’s content where -l is the length of each send(upper bound) and -n is the lines to send in a row. Bellow is the throughput calculated by the program: File is 2235755 bytes throughput (b/s) = 699751388 throughput (b/s) = 723542382 throughput (b/s) = 662989745 throughput (b/s) = 505028200 throughput (b/s) = 471263416 throughput (b/s) = 446837266 throughput (b/s) = 409856716 throughput (b/s) = 373994467 throughput (b/s) = 366343097 throughput (b/s) = 373240017 throughput (b/s) = 386139016 throughput (b/s) = 373802209 throughput (b/s) = 369308515 throughput (b/s) = 366935820 throughput (b/s) = 365175388 throughput (b/s) = 362175419 throughput (b/s) = 358356633 throughput (b/s) = 357219124 throughput (b/s) = 352174125 throughput (b/s) = 348313093 throughput (b/s) = 355099099 throughput (b/s) = 348069777 throughput (b/s) = 348478302 throughput (b/s) = 340404276 throughput (b/s) = 339876031 throughput (b/s) = 339175102 throughput (b/s) = 327555252 throughput (b/s) = 324272374 throughput (b/s) = 322479222 throughput (b/s) = 319544906 throughput (b/s) = 317201853 throughput (b/s) = 317351399 throughput (b/s) = 315027978 throughput (b/s) = 313831014 throughput (b/s) = 310050384 throughput (b/s) = 307654601 throughput (b/s) = 305707061 throughput (b/s) = 307961102 throughput (b/s) = 296898200 throughput (b/s) = 296409904 throughput (b/s) = 294609332 throughput (b/s) = 293397843 throughput (b/s) = 293194876 throughput (b/s) = 291724886 throughput (b/s) = 290031314 throughput (b/s) = 289747022 throughput (b/s) = 289299632 The throughput goes down after some seconds and it does not maintain the performance like the initial values: throughput (b/s) = 699751388 throughput (b/s) = 723542382 throughput (b/s) = 662989745 Another question is about spark, after I have started the spark line command after 15 sec spark continue to repeat the words counted, but my program continue to send words to kafka, so I mean that the words counted in spark should grow up. I have attached the log from spark. My Case is: ComputerA(Kafka_cachsesender) -> ComputerB(Kakfa-Brokers-Zookeeper) -> ComputerC (Spark) If I don’t explain very well send a reply to me. Thanks Guys -- Informativa sulla Privacy: http://www.unibs.it/node/8155