Spark Kafka Performance

Eduardo Costa Alfaia Mon, 03 Nov 2014 06:59:40 -0800

Hi Guys,
Anyone could explain me how to work Kafka with Spark, I am using the 
JavaKafkaWordCount.java like a test and the line command is:


./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount 
spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3

and like a producer I am using this command:

rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt -l 100 
-n 10


rdkafka_cachesender is a program that was developed by me which send to kafka 
the output.txt’s content where -l is the length of each send(upper bound) and 
-n is the lines to send in a row. Bellow is the throughput calculated by the 
program:

File is 2235755 bytes
throughput (b/s) = 699751388
throughput (b/s) = 723542382
throughput (b/s) = 662989745
throughput (b/s) = 505028200
throughput (b/s) = 471263416
throughput (b/s) = 446837266
throughput (b/s) = 409856716
throughput (b/s) = 373994467
throughput (b/s) = 366343097
throughput (b/s) = 373240017
throughput (b/s) = 386139016
throughput (b/s) = 373802209
throughput (b/s) = 369308515
throughput (b/s) = 366935820
throughput (b/s) = 365175388
throughput (b/s) = 362175419
throughput (b/s) = 358356633
throughput (b/s) = 357219124
throughput (b/s) = 352174125
throughput (b/s) = 348313093
throughput (b/s) = 355099099
throughput (b/s) = 348069777
throughput (b/s) = 348478302
throughput (b/s) = 340404276
throughput (b/s) = 339876031
throughput (b/s) = 339175102
throughput (b/s) = 327555252
throughput (b/s) = 324272374
throughput (b/s) = 322479222
throughput (b/s) = 319544906
throughput (b/s) = 317201853
throughput (b/s) = 317351399
throughput (b/s) = 315027978
throughput (b/s) = 313831014
throughput (b/s) = 310050384
throughput (b/s) = 307654601
throughput (b/s) = 305707061
throughput (b/s) = 307961102
throughput (b/s) = 296898200
throughput (b/s) = 296409904
throughput (b/s) = 294609332
throughput (b/s) = 293397843
throughput (b/s) = 293194876
throughput (b/s) = 291724886
throughput (b/s) = 290031314
throughput (b/s) = 289747022
throughput (b/s) = 289299632

The throughput goes down after some seconds and it does not maintain the 
performance like the initial values:

throughput (b/s) = 699751388
throughput (b/s) = 723542382
throughput (b/s) = 662989745

Another question is about spark, after I have started the spark line command 
after 15 sec spark continue to repeat the words counted, but my program 
continue to send words to kafka, so I mean that the words counted in spark 
should grow up. I have attached the log from spark.
  
My Case is:

ComputerA(Kafka_cachsesender) -> ComputerB(Kakfa-Brokers-Zookeeper) -> 
ComputerC (Spark)
 
If I don’t explain very well send a reply to me.

Thanks Guys
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

Spark Kafka Performance

Reply via email to