Hi Bert What you are describing could be done partially with the console producer. It will read from a file and send each line to the Kafka broker. You could make a really big file or alter that code to repeat a certain number of times. The source is pretty readable, I think that might be an easier route to take.
Daniel. > On 1/07/2014, at 2:07 am, Bert Corderman <[email protected]> wrote: > > Daniel, > > > > We have the same question. We noticed that the compression tests we ran > using the built in performance tester was not realistic. I think on disk > compression was 200:1. (yes that is two hundred to one) I had planned to > try and edit the producer performance tester source and do the following > > > > 1. Add an option to read sample data from provided text file. > (thought would be to add a file with 1-5000 rows, whatever I thought my > batch size might be) > > 2. Load sample file into array > > 3. Change code that creates message to pull a random row from array > > > > I also am not a Scala developer so would take me a little bit to figure > this out. This is on hold right now as I am looking at options of > compression of the message before sending to kafka. We had originally not > wanted to do this as we are assuming that we would not get efficient > compression ratios as we are only doing a single message however we are > also talking about sending multiple messages from our application as a > single Kafka message. Our concern with using kafka compression is the > overhead required from decompression on the broker to assign Ids. Here is > a good article that describes this > http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/ > > > > But again we haven’t decided just yet. Would like to test and evaluate. > > > > Bert > > > On Mon, Jun 30, 2014 at 2:24 AM, Daniel Compton <[email protected]> > wrote: > >> Hi folks >> >> I was doing some performance testing using the built in Kafka performance >> tester and it seems like it sends messages of size n bytes but with all >> bytes having the value 0x0. Is that correct? Reading the source seemed to >> indicate that too but I'm not a Scala developer so I could be wrong. >> >> Would this affect the performance compared to a real world scenario? >> Obviously you will get very efficient compression rates but apart from >> that, is there likely to be optimisations carried out anywhere between the >> JVM and the network card that won't hold for messages with non zero entropy? >> >> We're going to test this against our production workload so it's not a big >> deal for us but I wondered if this could give others skewed results? >> >> --- >> Daniel
