Hey Jack,

Thanks for the useful information. By flush size being 15 %, do you mean
the memstore flush size ? 15 % would mean close to 1G, have you seen any
issues with flushes taking too long ?

Thanks
Varun

On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <magn...@gmail.com> wrote:

> That's right, Memstore size , not flush size is increased.  Filesize is
> 10G. Overall write cache is 60% of heap and read cache is 20%.  Flush size
> is 15%.  64 maxlogs at 128MB. One namenode server, one secondary that can
> be promoted.  On the way to hbase images are written to a queue, so that we
> can take Hbase down for maintenance and still do inserts later.  ImageShack
> has ‘perma cache’ servers that allows writes and serving of data even when
> hbase is down for hours, consider it 4th replica 😉 outside of hadoop
>
> Jack
>
>  *From:* Mohit Anchlia <mohitanch...@gmail.com>
> *Sent:* ‎January‎ ‎13‎, ‎2013 ‎7‎:‎48‎ ‎AM
> *To:* user@hbase.apache.org
> *Subject:* Re: Storing images in Hbase
>
> Thanks Jack for sharing this information. This definitely makes sense when
> using the type of caching layer. You mentioned about increasing write
> cache, I am assuming you had to increase the following parameters in
> addition to increase the memstore size:
>
> hbase.hregion.max.filesize
> hbase.hregion.memstore.flush.size
>
> On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <magn...@gmail.com> wrote:
>
> > We buffer all accesses to HBASE with Varnish SSD based caching layer.
> > So the impact for reads is negligible.  We have 70 node cluster, 8 GB
> > of RAM per node, relatively weak nodes (intel core 2 duo), with
> > 10-12TB per server of disks.  Inserting 600,000 images per day.  We
> > have relatively little of compaction activity as we made our write
> > cache much larger than read cache - so we don't experience region file
> > fragmentation as much.
> >
> > -Jack
> >
> > On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <mohitanch...@gmail.com>
> > wrote:
> > > I think it really depends on volume of the traffic, data distribution
> per
> > > region, how and when files compaction occurs, number of nodes in the
> > > cluster. In my experience when it comes to blob data where you are
> > serving
> > > 10s of thousand+ requests/sec writes and reads then it's very difficult
> > to
> > > manage HBase without very hard operations and maintenance in play. Jack
> > > earlier mentioned they have 1 billion images, It would be interesting
> to
> > > know what they see in terms of compaction, no of requests per sec. I'd
> be
> > > surprised that in high volume site it can be done without any Caching
> > layer
> > > on the top to alleviate IO spikes that occurs because of GC and
> > compactions.
> > >
> > > On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <donta...@gmail.com>
> > wrote:
> > >
> > >> IMHO, if the image files are not too huge, Hbase can efficiently serve
> > the
> > >> purpose. You can store some additional info along with the file
> > depending
> > >> upon your search criteria to make the search faster. Say if you want
> to
> > >> fetch images by the type, you can store images in one column and its
> > >> extension in another column(jpg, tiff etc).
> > >>
> > >> BTW, what exactly is the problem which you are facing. You have
> written
> > >> "But I still cant do it"?
> > >>
> > >> Warm Regards,
> > >> Tariq
> > >> https://mtariq.jux.com/
> > >>
> > >>
> > >> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <
> > michael_se...@hotmail.com
> > >> >wrote:
> > >>
> > >> > That's a viable option.
> > >> > HDFS reads are faster than HBase, but it would require first hitting
> > the
> > >> > index in HBase which points to the file and then fetching the file.
> > >> > It could be faster... we found storing binary data in a sequence
> file
> > and
> > >> > indexed on HBase to be faster than HBase, however, YMMV and HBase
> has
> > >> been
> > >> > improved since we did that project....
> > >> >
> > >> >
> > >> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv <
> > >> dwivedishash...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hi Kavish,
> > >> > >
> > >> > > i have a better idea for you copy your image files to a single
> file
> > on
> > >> > > hdfs, and if new image comes append it to the existing image, and
> > keep
> > >> > and
> > >> > > update the metadata and the offset to the HBase. Because if you
> put
> > >> > bigger
> > >> > > image in hbase it wil lead to some issue.
> > >> > >
> > >> > >
> > >> > >
> > >> > > ∞
> > >> > > Shashwat Shriparv
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <la...@apache.org>
> > >> wrote:
> > >> > >
> > >> > >> Interesting. That's close to a PB if my math is correct.
> > >> > >> Is there a write up about this somewhere? Something that we could
> > link
> > >> > >> from the HBase homepage?
> > >> > >>
> > >> > >> -- Lars
> > >> > >>
> > >> > >>
> > >> > >> ----- Original Message -----
> > >> > >> From: Jack Levin <magn...@gmail.com>
> > >> > >> To: user@hbase.apache.org
> > >> > >> Cc: Andrew Purtell <apurt...@apache.org>
> > >> > >> Sent: Thursday, January 10, 2013 9:24 AM
> > >> > >> Subject: Re: Storing images in Hbase
> > >> > >>
> > >> > >> We stored about 1 billion images into hbase with file size up to
> > 10MB.
> > >> > >> Its been running for close to 2 years without issues and serves
> > >> > >> delivery of images for Yfrog and ImageShack.  If you have any
> > >> > >> questions about the setup, I would be glad to answer them.
> > >> > >>
> > >> > >> -Jack
> > >> > >>
> > >> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <
> > mohitanch...@gmail.com
> > >> >
> > >> > >> wrote:
> > >> > >>> I have done extensive testing and have found that blobs don't
> > belong
> > >> in
> > >> > >> the
> > >> > >>> databases but are rather best left out on the file system.
> Andrew
> > >> > >> outlined
> > >> > >>> issues that you'll face and not to mention IO issues when
> > compaction
> > >> > >> occurs
> > >> > >>> over large files.
> > >> > >>>
> > >> > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <
> > apurt...@apache.org
> > >> >
> > >> > >> wrote:
> > >> > >>>
> > >> > >>>> I meant this to say "a few really large values"
> > >> > >>>>
> > >> > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <
> > >> apurt...@apache.org>
> > >> > >>>> wrote:
> > >> > >>>>
> > >> > >>>>> Consider if the split threshold is 2 GB but your one row
> > contains
> > >> 10
> > >> > >> GB
> > >> > >>>> as
> > >> > >>>>> really large value.
> > >> > >>>>
> > >> > >>>>
> > >> > >>>>
> > >> > >>>>
> > >> > >>>> --
> > >> > >>>> Best regards,
> > >> > >>>>
> > >> > >>>>   - Andy
> > >> > >>>>
> > >> > >>>> Problems worthy of attack prove their worth by hitting back. -
> > Piet
> > >> > Hein
> > >> > >>>> (via Tom White)
> > >> > >>>>
> > >> > >>
> > >> > >>
> > >> >
> > >> >
> > >>
> >
>

Reply via email to