If every device can send it's information as a 'event', you could use a 
publish-subscribe messaging system like Apache Kafka. 
(http://kafka.apache.org/)  Kafka is designed to self-manage it's storage by 
saving the last 'n-events' of data, acting like a circular buffer.  The device 
would publish it's binary data to Kafka and Hadoop would act as a subscriber to 
Kafka by consuming events.   If you need a scheduler to make Hadoop process the 
Kafka events look at Azkaban as it supports both scheduling and job 
dependencies. (http://azkaban.github.io/azkaban2/)  

Remember Hadoop is batch processing so reports won't happen in real time.   If 
you need to run reports in real-time, watch the Samza project which uses Yarn 
and Kafka to process real-time streaming data. 
(http://incubator.apache.org/projects/samza.html) 

On Aug 7, 2013, at 9:59 AM, Wukang Lin <vboylin1...@gmail.com> wrote:

> Hi Shekhar,
>     Thank you for your replies.So far as I know, Storm is a distributed 
> computing framework, but what we need is a storage system, high throughput 
> and concurrency is matters.We have thousands of devices, each device will 
> produce a steady stream of brinary data. The space for every device is fixed, 
> so their should reuse the space on the disk.So, how can storm or esper 
> achieve that?
> 
> Many Thanks
> Lin Wukang
> 
> 
> 2013/8/8 Shekhar Sharma <shekhar2...@gmail.com>
> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what you 
> are trying to achieve
> Regards,
> Som Shekhar Sharma
> +91-8197243810
> 
> 
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vboylin1...@gmail.com> wrote:
> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into a 
> circular storage,throughput and concurrency are the most important 
> indicators.The first way seems work, but as  hdfs is not friendly for small 
> files, this approche may be not smooth enough.HBase is good, but  not 
> appropriate for us, both for throughput and storage.mongodb is quite good for 
> web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and 
> secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a 
> large region. we manager a lot of small file as a large one. Perhaps we 
> should develop it by ourselives.
> 
> Thank you.
> Lin Wukang
> 
> 
> 2013/7/25 Niels Basjes <ni...@basjes.nl>
> A circular file on hdfs is not possible.
> 
> Some of the ways around this limitation:
> - Create a series of files and delete the oldest file when you have too much.
> - Put the data into an hbase table and do something similar.
> - Use completely different technology like mongodb which has built in support 
> for a circular buffer (capped collection).
> 
> Niels
> 
> Hi all,
>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set 
> a quotas to a directory on hdfs, and writting data to a file in that 
> directory continuously. Once the quotas exceeded, I can redirect the writter 
> and write the data from the beginning of the file automatically .
> 
> 
> 
> 

Reply via email to