Double check how you are pushing data into Kafka. You are probably pushing one line at a time. On Jul 4, 2016 12:30 PM, "Navin Ipe" <navin....@searchlighthealth.com> wrote:
> I haven't worked with Kafka, *so perhaps someone else here would be able > to help you with it. * > What I could suggest though, is to search for how to emit more than one > sentence using the Kafka spout. > > If you still can emit only one sentence, then I'd recommend not using a > separate SaveBolt. Instead, use FieldsGrouping where you group tuples based > on the name of the CSV file, and emit sentences to TransformBolt. When > TransformBolt completes receiving all tuples from a CSV, it can save to > HDFS. > > If you still want to use a separate TransformBolt and SaveBolt, then use > fields grouping as I mentioned above when emitting to both bolts. This way, > you can have multiple spouts which read from multiple files, and whatever > they emit will go only to specific bolts. > > > On Mon, Jul 4, 2016 at 9:21 PM, praveen reddy <onlineid8...@gmail.com> > wrote: > >> want to add bit more, >> i am posting the json data using kafka-console-produer.sh file, copy the >> json data and pasting on console. >> >> On Mon, Jul 4, 2016 at 11:44 AM, praveen reddy <onlineid8...@gmail.com> >> wrote: >> >>> Thanks Naveen for response, i was using mobile so couldn't see typo's. >>> here is my requirement. this is my first POC on Kafka/Storm, so please help >>> me if i can design it better way. >>> >>> i need to read a Json data from Kafka, than convert the Json Data to CSV >>> file and save it on HDFS. >>> >>> this is how i did initial design and having lot of issues. >>> >>> builder.setSpout("kafka-spout", new >>> KafkaSpout(kafkaSpoutConfig)); >>> builder.setBolt("TransformBolt", new >>> TransformationBolt()).shuffleGrouping("kafka-spout"); >>> builder.setBolt("Savebolt", new >>> SaveBolt()).shuffleGrouping("TransformBolt"); >>> >>> KafkaSpout to read the data from Kafka topic, TransformationBolt to >>> convert the json to cvs file and savebolt is to save the csv file. >>> >>> KafkaSpout was able to read data from Kafka Topic. what i was expecting >>> from Spout was to get the complete Json data but i am getting 1 line each >>> from Json data i sent to topic >>> >>> here is my transport bolt >>> @Override >>> public void execute(Tuple input) { >>> String sentence = input.getString(0); >>> collector.emit(new Values(sentence)); >>> System.out.println("emitted " + sentence); >>> } >>> >>> i was expecting getString(0) would return complete json data, but >>> getting only 1 line at once. >>> >>> and i am not sure how to emit csv file so that Savebolt would save it. >>> >>> can you please let me know how to get complete Json data in single >>> request rather than line by line, how to emit CSV file from bolt. and if >>> you guys can help me to design this better it would be really helpful >>> >>> >>> On Mon, Jul 4, 2016 at 5:59 AM, Navin Ipe < >>> navin....@searchlighthealth.com> wrote: >>> >>>> Dear Praveen, >>>> >>>> The questions aren't silly, but it is rather tough to understand what >>>> you are trying to convey. When you say "omit", do you mean "emit"? >>>> Bolts can emit data even without having to write to disk (I think >>>> there's a 2MB limit to the size of that data that can be emitted, because >>>> Thrift can't handle more than that). >>>> If you want one bolt to write to disk and then want another bolt to >>>> read from disk, then that's also possible. >>>> The first bolt can just send to the second bolt, whatever information >>>> is necessary to read from file. >>>> As of what I know, basic datatypes will automatically get serialized. >>>> If you have a more complex class, then serialize it with Serializable. >>>> >>>> If you could re-phrase your question and make it clearer, people here >>>> would be able to help you better. >>>> >>>> >>>> >>>> On Sat, Jul 2, 2016 at 7:16 AM, praveen reddy < >>>> praveen.onlinecou...@gmail.com> wrote: >>>> >>>>> Hi All, >>>>> >>>>> i am new to Storm and Kafka and working on POC. >>>>> >>>>> my requirement is get a message from Kafka in json format, spout >>>>> reading that message and firts bolt converting the json message to >>>>> different format like csv and the second bolt saving it to hadoop. >>>>> >>>>> now i came up with initial design where i can use kafkaspout to read >>>>> kafka topics and bolt converting it to csv file and next bolt saving in >>>>> hadoop. >>>>> >>>>> i have following questions >>>>> can the first bold which coverts the message to csv file can omit it? >>>>> the file would be saving on disk. can a file which is saved on disk can be >>>>> omitted. >>>>> how does the second bolt read the file which is saved on disk by first >>>>> bolt? >>>>> do we need to serialize message ommitted by spout and/or bolt? >>>>> >>>>> sorry if the questions sound silly, this is my first topology with >>>>> minimum knowledge of storm. >>>>> >>>>> if you guys think of proper design how to implement the my requirement >>>>> can you please let me know >>>>> >>>>> thanks in advance >>>>> >>>>> -Praveen >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Navin >>>> >>> >>> >> > > > -- > Regards, > Navin >