If every device can send it's information as a 'event', you could use a publish-subscribe messaging system like Apache Kafka. (http://kafka.apache.org/) Kafka is designed to self-manage it's storage by saving the last 'n-events' of data, acting like a circular buffer. The device would publish it's binary data to Kafka and Hadoop would act as a subscriber to Kafka by consuming events. If you need a scheduler to make Hadoop process the Kafka events look at Azkaban as it supports both scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)
Remember Hadoop is batch processing so reports won't happen in real time. If you need to run reports in real-time, watch the Samza project which uses Yarn and Kafka to process real-time streaming data. (http://incubator.apache.org/projects/samza.html) On Aug 7, 2013, at 9:59 AM, Wukang Lin <vboylin1...@gmail.com> wrote: > Hi Shekhar, > Thank you for your replies.So far as I know, Storm is a distributed > computing framework, but what we need is a storage system, high throughput > and concurrency is matters.We have thousands of devices, each device will > produce a steady stream of brinary data. The space for every device is fixed, > so their should reuse the space on the disk.So, how can storm or esper > achieve that? > > Many Thanks > Lin Wukang > > > 2013/8/8 Shekhar Sharma <shekhar2...@gmail.com> > Use CEP tool like Esper and Storm, you will be able to achieve that > ...I can give you more inputs if you can provide me more details of what you > are trying to achieve > Regards, > Som Shekhar Sharma > +91-8197243810 > > > On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vboylin1...@gmail.com> wrote: > Hi Niels and Bertrand, > Thank you for you great advices. > In our scenario, we need to store a steady stream of binary data into a > circular storage,throughput and concurrency are the most important > indicators.The first way seems work, but as hdfs is not friendly for small > files, this approche may be not smooth enough.HBase is good, but not > appropriate for us, both for throughput and storage.mongodb is quite good for > web applications, but not suitable the scenario we meet all the same. > we need a distributed storage system,with Highe throughput, HA,LB and > secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a > large region. we manager a lot of small file as a large one. Perhaps we > should develop it by ourselives. > > Thank you. > Lin Wukang > > > 2013/7/25 Niels Basjes <ni...@basjes.nl> > A circular file on hdfs is not possible. > > Some of the ways around this limitation: > - Create a series of files and delete the oldest file when you have too much. > - Put the data into an hbase table and do something similar. > - Use completely different technology like mongodb which has built in support > for a circular buffer (capped collection). > > Niels > > Hi all, > Is there any way to use a hdfs file as a Circular buffer? I mean, if I set > a quotas to a directory on hdfs, and writting data to a file in that > directory continuously. Once the quotas exceeded, I can redirect the writter > and write the data from the beginning of the file automatically . > > > >