Re: Writing orc files with storm via java API

2017-08-01 Thread Kristopher Kane
For ORC specifically, I would ONLY create an ORC HDFS file based on a Tuple batch and create/flush/close off the ORC file in one go. Adjust batch sizes and message timeout for what makes sense of your case. Yes, you will likely have many small files in HDFS, but, since this ORC, the assumption is

Re: Writing orc files with storm via java API

2017-07-31 Thread Bobby Evans
It should be possible to make this work, but it is not going to be simple.  The real issue is the format of the orc file.  It is not one record at a time, like CSV or other supported formats are.  Sadly this is currently an assumption with the AbstractHdfsBolt.

Writing orc files with storm via java API

2017-07-25 Thread Igor Kuzmenko
Is there any implementation of storm bolt which can write files to HDFS in ORC format, without using Hive Streaming API? I've found java API for writing ORC files and I'm guessing is there any existing Hive bolts that uses it or any plans to create such?