It should be possible to make this work, but it is not going to be simple. The real issue is the format of the orc file. It is not one record at a time, like CSV or other supported formats are. Sadly this is currently an assumption with the AbstractHdfsBolt. https://github.com/apache/storm/blob/master/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/bolt/format/RecordFormat.java So to support it we would need to make some modifications, not impossible, just not a drop in replacement. If this is something you want to tackle and contribute back I think we would all love it. You might also run into some issues with metadata for the format being written at the end of the file. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC I am not totally sure how easy it is to recover an ORC file if that footer is missing because a worker crashed. You might end up with data loss in some cases if you are not extremely careful. You might also need to modify the ORC APIs themselves to be able to support storing/recovering the metadata in an external location for recovery to truly fix it, and then store them in ZK on a flush until the file is rotated.
The Trident HDFState https://github.com/apache/storm/blob/master/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/trident/HdfsState.java might be a more appropriate place to start, as the updated state is written out in micro batches, but you still have to deal with the footer issues, as trident really cares about exactly once processing. So overall it is not a simple problem, and relying on an external server like hive would make it a lot simpler. - Bobby On Tuesday, July 25, 2017, 8:38:42 AM CDT, Igor Kuzmenko <f1she...@gmail.com> wrote: Is there any implementation of storm bolt which can write files to HDFS in ORC format, without using Hive Streaming API? I've found java API for writing ORC files <https://github.com/apache/orc> and I'm guessing is there any existing Hive bolts that uses it or any plans to create such?