Hi Lin,

It might be worth checking out Apache Flume, which was built for highly
parallel ingest into HDFS.

-Sandy


On Thu, Aug 15, 2013 at 11:16 AM, Adam Faris <afa...@linkedin.com> wrote:

> If every device can send it's information as a 'event', you could use a
> publish-subscribe messaging system like Apache Kafka. (
> http://kafka.apache.org/)  Kafka is designed to self-manage it's storage
> by saving the last 'n-events' of data, acting like a circular buffer.  The
> device would publish it's binary data to Kafka and Hadoop would act as a
> subscriber to Kafka by consuming events.   If you need a scheduler to make
> Hadoop process the Kafka events look at Azkaban as it supports both
> scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)
>
> Remember Hadoop is batch processing so reports won't happen in real time.
>   If you need to run reports in real-time, watch the Samza project which
> uses Yarn and Kafka to process real-time streaming data. (
> http://incubator.apache.org/projects/samza.html)
>
> On Aug 7, 2013, at 9:59 AM, Wukang Lin <vboylin1...@gmail.com> wrote:
>
> > Hi Shekhar,
> >     Thank you for your replies.So far as I know, Storm is a distributed
> computing framework, but what we need is a storage system, high throughput
> and concurrency is matters.We have thousands of devices, each device will
> produce a steady stream of brinary data. The space for every device is
> fixed, so their should reuse the space on the disk.So, how can storm or
> esper achieve that?
> >
> > Many Thanks
> > Lin Wukang
> >
> >
> > 2013/8/8 Shekhar Sharma <shekhar2...@gmail.com>
> > Use CEP tool like Esper and Storm, you will be able to achieve that
> > ...I can give you more inputs if you can provide me more details of what
> you are trying to achieve
> > Regards,
> > Som Shekhar Sharma
> > +91-8197243810
> >
> >
> > On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vboylin1...@gmail.com>
> wrote:
> > Hi Niels and Bertrand,
> >     Thank you for you great advices.
> >     In our scenario, we need to store a steady stream of binary data
> into a circular storage,throughput and concurrency are the most important
> indicators.The first way seems work, but as  hdfs is not friendly for small
> files, this approche may be not smooth enough.HBase is good, but  not
> appropriate for us, both for throughput and storage.mongodb is quite good
> for web applications, but not suitable the scenario we meet all the same.
> >     we need a distributed storage system,with Highe throughput, HA,LB
> and secure. Maybe It act much like hbase, manager a lot of small
> file(hfile) as a large region. we manager a lot of small file as a large
> one. Perhaps we should develop it by ourselives.
> >
> > Thank you.
> > Lin Wukang
> >
> >
> > 2013/7/25 Niels Basjes <ni...@basjes.nl>
> > A circular file on hdfs is not possible.
> >
> > Some of the ways around this limitation:
> > - Create a series of files and delete the oldest file when you have too
> much.
> > - Put the data into an hbase table and do something similar.
> > - Use completely different technology like mongodb which has built in
> support for a circular buffer (capped collection).
> >
> > Niels
> >
> > Hi all,
> >    Is there any way to use a hdfs file as a Circular buffer? I mean, if
> I set a quotas to a directory on hdfs, and writting data to a file in that
> directory continuously. Once the quotas exceeded, I can redirect the
> writter and write the data from the beginning of the file automatically .
> >
> >
> >
> >
>
>

Reply via email to