Can this docker image be used to spin up kafka cluster in a CI/CD pipeline like Jenkins to run the integration tests? Or it can be done only in the local machine that has docker installed? I assume that the box where the CI/CD pipeline runs should have docker installed correct?
On Mon, Jul 4, 2016 at 5:20 AM, Lars Albertsson <la...@mapflat.com> wrote: > I created such a setup for a client a few months ago. It is pretty > straightforward, but it can take some work to get all the wires > connected. > > I suggest that you start with the spotify/kafka > (https://github.com/spotify/docker-kafka) Docker image, since it > includes a bundled zookeeper. The alternative would be to spin up a > separate Zookeeper Docker container and connect them, but for testing > purposes, it would make the setup more complex. > > You'll need to inform Kafka about the external address it exposes by > setting ADVERTISED_HOST to the output of "docker-machine ip" (on Mac) > or the address printed by "ip addr show docker0" (Linux). I also > suggest setting > AUTO_CREATE_TOPICS to true. > > You can choose to run your Spark Streaming application under test > (SUT) and your test harness also in Docker containers, or directly on > your host. > > In the former case, it is easiest to set up a Docker Compose file > linking the harness and SUT to Kafka. This variant provides better > isolation, and might integrate better if you have existing similar > test frameworks. > > If you want to run the harness and SUT outside Docker, I suggest that > you build your harness with a standard test framework, e.g. scalatest > or JUnit, and run both harness and SUT in the same JVM. In this case, > you put code to bring up the Kafka Docker container in test framework > setup methods. This test strategy integrates better with IDEs and > build tools (mvn/sbt/gradle), since they will run (and debug) your > tests without any special integration. I therefore prefer this > strategy. > > > What is the output of your application? If it is messages on a > different Kafka topic, the test harness can merely subscribe and > verify output. If you emit output to a database, you'll need another > Docker container, integrated with Docker Compose. If you are emitting > database entries, your test oracle will need to frequently poll the > database for the expected records, with a timeout in order not to hang > on failing tests. > > I hope this is comprehensible. Let me know if you have followup questions. > > Regards, > > > > Lars Albertsson > Data engineering consultant > www.mapflat.com > +46 70 7687109 > Calendar: https://goo.gl/6FBtlS > > > > On Thu, Jun 30, 2016 at 8:19 PM, SRK <swethakasire...@gmail.com> wrote: > > Hi, > > > > I need to do integration tests using Spark Streaming. My idea is to spin > up > > kafka using docker locally and use it to feed the stream to my Streaming > > Job. Any suggestions on how to do this would be of great help. > > > > Thanks, > > Swetha > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-spin-up-Kafka-using-docker-and-use-for-Spark-Streaming-Integration-tests-tp27252.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > >