Double check how you are pushing data into Kafka. You are probably pushing
one line at a time.
On Jul 4, 2016 12:30 PM, "Navin Ipe" <>

> I haven't worked with Kafka, *so perhaps someone else here would be able
> to help you with it. *
> What I could suggest though, is to search for how to emit more than one
> sentence using the Kafka spout.
> If you still can emit only one sentence, then I'd recommend not using a
> separate SaveBolt. Instead, use FieldsGrouping where you group tuples based
> on the name of the CSV file, and emit sentences to TransformBolt. When
> TransformBolt completes receiving all tuples from a CSV, it can save to
> If you still want to use a separate TransformBolt and SaveBolt, then use
> fields grouping as I mentioned above when emitting to both bolts. This way,
> you can have multiple spouts which read from multiple files, and whatever
> they emit will go only to specific bolts.
> On Mon, Jul 4, 2016 at 9:21 PM, praveen reddy <>
> wrote:
>> want to add bit more,
>> i am posting the json data using file, copy the
>> json data and pasting on console.
>> On Mon, Jul 4, 2016 at 11:44 AM, praveen reddy <>
>> wrote:
>>> Thanks Naveen for response, i was using mobile so couldn't see typo's.
>>> here is my requirement. this is my first POC on Kafka/Storm, so please help
>>> me if i can design it better way.
>>> i need to read a Json data from Kafka, than convert the Json Data to CSV
>>> file and save it on HDFS.
>>> this is how i did initial design and having lot of issues.
>>>         builder.setSpout("kafka-spout", new
>>> KafkaSpout(kafkaSpoutConfig));
>>>         builder.setBolt("TransformBolt", new
>>> TransformationBolt()).shuffleGrouping("kafka-spout");
>>>         builder.setBolt("Savebolt", new
>>> SaveBolt()).shuffleGrouping("TransformBolt");
>>> KafkaSpout to read the data from Kafka topic, TransformationBolt to
>>> convert the json to cvs file and savebolt is to save the csv file.
>>> KafkaSpout was able to read data from Kafka Topic. what i was expecting
>>> from Spout was to get the complete Json data but i am getting 1 line each
>>> from Json data i sent to topic
>>> here is my transport bolt
>>>     @Override
>>>     public void execute(Tuple input) {
>>>         String sentence = input.getString(0);
>>>         collector.emit(new Values(sentence));
>>>         System.out.println("emitted " + sentence);
>>>     }
>>> i was expecting getString(0) would return complete json data, but
>>> getting only 1 line at once.
>>> and i am not sure how to emit csv file so that Savebolt would save it.
>>> can you please let me know how to get complete Json data in single
>>> request rather than line by line, how to emit CSV file from bolt. and if
>>> you guys can help me to design this better it would be really helpful
>>> On Mon, Jul 4, 2016 at 5:59 AM, Navin Ipe <
>>>> wrote:
>>>> Dear Praveen,
>>>> The questions aren't silly, but it is rather tough to understand what
>>>> you are trying to convey. When you say "omit", do you mean "emit"?
>>>> Bolts can emit data even without having to write to disk (I think
>>>> there's a 2MB limit to the size of that data that can be emitted, because
>>>> Thrift can't handle more than that).
>>>> If you want one bolt to write to disk and then want another bolt to
>>>> read from disk, then that's also possible.
>>>> The first bolt can just send to the second bolt, whatever information
>>>> is necessary to read from file.
>>>> As of what I know, basic datatypes will automatically get serialized.
>>>> If you have a more complex class, then serialize it with Serializable.
>>>> If you could re-phrase your question and make it clearer, people here
>>>> would be able to help you better.
>>>> On Sat, Jul 2, 2016 at 7:16 AM, praveen reddy <
>>>>> wrote:
>>>>> Hi All,
>>>>> i am new to Storm and Kafka and working on POC.
>>>>> my requirement is get a message from Kafka in json format, spout
>>>>> reading that message and firts bolt converting the json message to
>>>>> different format like csv and the second bolt saving it to hadoop.
>>>>> now i came up with initial design where i can use kafkaspout to read
>>>>> kafka topics and bolt converting it to csv file and next bolt saving in
>>>>> hadoop.
>>>>> i have following questions
>>>>> can the first bold which coverts the message to csv file can omit it?
>>>>> the file would be saving on disk. can a file which is saved on disk can be
>>>>> omitted.
>>>>> how does the second bolt read the file which is saved on disk by first
>>>>> bolt?
>>>>> do we need to serialize message ommitted by spout and/or bolt?
>>>>> sorry if the questions sound silly, this is my first topology with
>>>>> minimum knowledge of storm.
>>>>> if you guys think of proper design how to implement the my requirement
>>>>> can you please let me know
>>>>> thanks in advance
>>>>> -Praveen
>>>> --
>>>> Regards,
>>>> Navin
> --
> Regards,
> Navin

Reply via email to