Hi Lin, It might be worth checking out Apache Flume, which was built for highly parallel ingest into HDFS.
-Sandy On Thu, Aug 15, 2013 at 11:16 AM, Adam Faris <afa...@linkedin.com> wrote: > If every device can send it's information as a 'event', you could use a > publish-subscribe messaging system like Apache Kafka. ( > http://kafka.apache.org/) Kafka is designed to self-manage it's storage > by saving the last 'n-events' of data, acting like a circular buffer. The > device would publish it's binary data to Kafka and Hadoop would act as a > subscriber to Kafka by consuming events. If you need a scheduler to make > Hadoop process the Kafka events look at Azkaban as it supports both > scheduling and job dependencies. (http://azkaban.github.io/azkaban2/) > > Remember Hadoop is batch processing so reports won't happen in real time. > If you need to run reports in real-time, watch the Samza project which > uses Yarn and Kafka to process real-time streaming data. ( > http://incubator.apache.org/projects/samza.html) > > On Aug 7, 2013, at 9:59 AM, Wukang Lin <vboylin1...@gmail.com> wrote: > > > Hi Shekhar, > > Thank you for your replies.So far as I know, Storm is a distributed > computing framework, but what we need is a storage system, high throughput > and concurrency is matters.We have thousands of devices, each device will > produce a steady stream of brinary data. The space for every device is > fixed, so their should reuse the space on the disk.So, how can storm or > esper achieve that? > > > > Many Thanks > > Lin Wukang > > > > > > 2013/8/8 Shekhar Sharma <shekhar2...@gmail.com> > > Use CEP tool like Esper and Storm, you will be able to achieve that > > ...I can give you more inputs if you can provide me more details of what > you are trying to achieve > > Regards, > > Som Shekhar Sharma > > +91-8197243810 > > > > > > On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vboylin1...@gmail.com> > wrote: > > Hi Niels and Bertrand, > > Thank you for you great advices. > > In our scenario, we need to store a steady stream of binary data > into a circular storage,throughput and concurrency are the most important > indicators.The first way seems work, but as hdfs is not friendly for small > files, this approche may be not smooth enough.HBase is good, but not > appropriate for us, both for throughput and storage.mongodb is quite good > for web applications, but not suitable the scenario we meet all the same. > > we need a distributed storage system,with Highe throughput, HA,LB > and secure. Maybe It act much like hbase, manager a lot of small > file(hfile) as a large region. we manager a lot of small file as a large > one. Perhaps we should develop it by ourselives. > > > > Thank you. > > Lin Wukang > > > > > > 2013/7/25 Niels Basjes <ni...@basjes.nl> > > A circular file on hdfs is not possible. > > > > Some of the ways around this limitation: > > - Create a series of files and delete the oldest file when you have too > much. > > - Put the data into an hbase table and do something similar. > > - Use completely different technology like mongodb which has built in > support for a circular buffer (capped collection). > > > > Niels > > > > Hi all, > > Is there any way to use a hdfs file as a Circular buffer? I mean, if > I set a quotas to a directory on hdfs, and writting data to a file in that > directory continuously. Once the quotas exceeded, I can redirect the > writter and write the data from the beginning of the file automatically . > > > > > > > > > >