Re: Storing images in Hbase

yiyu jia Sun, 27 Jan 2013 13:59:00 -0800

Hi Jack,

Thank you. I never heard about HOOD before. I should learn it.


Also, do you store meta data of each video clip directly in HDFS or you
have other storage like memcache?

thanks and regards,

Yiyu


On Sun, Jan 27, 2013 at 11:56 AM, Jack Levin <magn...@gmail.com> wrote:

> We did some experiments, open source project HOOP works well with
> interfacing to HDFS to expose REST Api interface to your file system.
>
> -Jack
>
> On Sun, Jan 27, 2013 at 7:37 AM, yiyu jia <jia.y...@gmail.com> wrote:
> > Hi Jack,
> >
> > Thanks so much for sharing! Do you have comments on storing video in
> HDFS?
> >
> > thanks and regards,
> >
> > Yiyu
> >
> > On Sat, Jan 26, 2013 at 9:56 PM, Jack Levin <magn...@gmail.com> wrote:
> >
> >> AFAIK, namenode would not like tracking 20 billion small files :)
> >>
> >> -jack
> >>
> >> On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <sahmed1...@gmail.com> wrote:
> >> > That's pretty amazing.
> >> >
> >> > What I am confused is, why did you go with hbase and not just straight
> >> into
> >> > hdfs?
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <magn...@gmail.com>
> wrote:
> >> >
> >> >> Two people including myself, its fairly hands off. Took about 3
> months
> >> to
> >> >> tune it right, however we did have had multiple years of experience
> with
> >> >> datanodes and hadoop in general, so that was a good boost.
> >> >>
> >> >> We have 4 hbase clusters today, image store being largest
> >> >> On Jan 24, 2013 2:14 PM, "S Ahmed" <sahmed1...@gmail.com> wrote:
> >> >>
> >> >> > Jack, out of curiosity, how many people manage the hbase related
> >> servers?
> >> >> >
> >> >> > Does it require constant monitoring or its fairly hands-off now?
>  (or
> >> a
> >> >> bit
> >> >> > of both, early days was getting things write/learning and now its
> >> purring
> >> >> > along).
> >> >> >
> >> >> >
> >> >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <magn...@gmail.com>
> >> wrote:
> >> >> >
> >> >> > > Its best to keep some RAM for caching of the filesystem, besides
> we
> >> >> > > also run datanode which takes heap as well.
> >> >> > > Now, please keep in mind that even if you specify heap of say
> 5GB,
> >> if
> >> >> > > your server opens threads to communicate with other systems via
> RPC
> >> >> > > (which hbase does a lot), you will indeed use HEAP +
> >> >> > > Nthreads*thread*kb_size.  There is a good Sun Microsystems
> document
> >> >> > > about it. (I don't have the link handy).
> >> >> > >
> >> >> > > -Jack
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <
> va...@pinterest.com>
> >> >> > wrote:
> >> >> > > > Thanks for the useful information. I wonder why you use only 5G
> >> heap
> >> >> > when
> >> >> > > > you have an 8G machine ? Is there a reason to not use all of it
> >> (the
> >> >> > > > DataNode typically takes a 1G of RAM)
> >> >> > > >
> >> >> > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <
> magn...@gmail.com>
> >> >> > wrote:
> >> >> > > >
> >> >> > > >> I forgot to mention that I also have this setup:
> >> >> > > >>
> >> >> > > >> <property>
> >> >> > > >>   <name>hbase.hregion.memstore.flush.size</name>
> >> >> > > >>   <value>33554432</value>
> >> >> > > >>   <description>Flush more often. Default:
> 67108864</description>
> >> >> > > >> </property>
> >> >> > > >>
> >> >> > > >> This parameter works on per region amount, so this means if
> any
> >> of
> >> >> my
> >> >> > > >> 400 (currently) regions on a regionserver has 30MB+ in
> memstore,
> >> the
> >> >> > > >> hbase will flush it to disk.
> >> >> > > >>
> >> >> > > >>
> >> >> > > >> Here are some metrics from a regionserver:
> >> >> > > >>
> >> >> > > >> requests=2, regions=370, stores=370, storefiles=1390,
> >> >> > > >> storefileIndexSize=304, memstoreSize=2233,
> compactionQueueSize=0,
> >> >> > > >> flushQueueSize=0, usedHeap=3516, maxHeap=4987,
> >> >> > > >> blockCacheSize=790656256, blockCacheFree=255245888,
> >> >> > > >> blockCacheCount=2436, blockCacheHitCount=218015828,
> >> >> > > >> blockCacheMissCount=13514652, blockCacheEvictedCount=2561516,
> >> >> > > >> blockCacheHitRatio=94, blockCacheHitCachingRatio=98
> >> >> > > >>
> >> >> > > >> Note, that memstore is only 2G, this particular regionserver
> >> HEAP is
> >> >> > set
> >> >> > > >> to 5G.
> >> >> > > >>
> >> >> > > >> And last but not least, its very important to have good GC
> setup:
> >> >> > > >>
> >> >> > > >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m
> >> >> > > >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
> >> >> > > >> -XX:+PrintGCDateStamps
> >> >> > > >> -XX:+HeapDumpOnOutOfMemoryError
> >> >> -Xloggc:$HBASE_HOME/logs/gc-hbase.log
> >> >> > \
> >> >> > > >> -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \
> >> >> > > >> -XX:+UseParNewGC \
> >> >> > > >> -XX:NewSize=128m -XX:MaxNewSize=128m \
> >> >> > > >> -XX:-UseAdaptiveSizePolicy \
> >> >> > > >> -XX:+CMSParallelRemarkEnabled \
> >> >> > > >> -XX:-TraceClassUnloading
> >> >> > > >> "
> >> >> > > >>
> >> >> > > >> -Jack
> >> >> > > >>
> >> >> > > >> On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma <
> >> va...@pinterest.com>
> >> >> > > wrote:
> >> >> > > >> > Hey Jack,
> >> >> > > >> >
> >> >> > > >> > Thanks for the useful information. By flush size being 15
> %, do
> >> >> you
> >> >> > > mean
> >> >> > > >> > the memstore flush size ? 15 % would mean close to 1G, have
> you
> >> >> seen
> >> >> > > any
> >> >> > > >> > issues with flushes taking too long ?
> >> >> > > >> >
> >> >> > > >> > Thanks
> >> >> > > >> > Varun
> >> >> > > >> >
> >> >> > > >> > On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <
> magn...@gmail.com
> >> >
> >> >> > > wrote:
> >> >> > > >> >
> >> >> > > >> >> That's right, Memstore size , not flush size is increased.
> >> >> >  Filesize
> >> >> > > is
> >> >> > > >> >> 10G. Overall write cache is 60% of heap and read cache is
> 20%.
> >> >> >  Flush
> >> >> > > >> size
> >> >> > > >> >> is 15%.  64 maxlogs at 128MB. One namenode server, one
> >> secondary
> >> >> > that
> >> >> > > >> can
> >> >> > > >> >> be promoted.  On the way to hbase images are written to a
> >> queue,
> >> >> so
> >> >> > > >> that we
> >> >> > > >> >> can take Hbase down for maintenance and still do inserts
> >> later.
> >> >> > > >>  ImageShack
> >> >> > > >> >> has ‘perma cache’ servers that allows writes and serving of
> >> data
> >> >> > even
> >> >> > > >> when
> >> >> > > >> >> hbase is down for hours, consider it 4th replica 😉
> outside of
> >> >> > hadoop
> >> >> > > >> >>
> >> >> > > >> >> Jack
> >> >> > > >> >>
> >> >> > > >> >>  *From:* Mohit Anchlia <mohitanch...@gmail.com>
> >> >> > > >> >> *Sent:* ‎January‎ ‎13‎, ‎2013 ‎7‎:‎48‎ ‎AM
> >> >> > > >> >> *To:* user@hbase.apache.org
> >> >> > > >> >> *Subject:* Re: Storing images in Hbase
> >> >> > > >> >>
> >> >> > > >> >> Thanks Jack for sharing this information. This definitely
> >> makes
> >> >> > sense
> >> >> > > >> when
> >> >> > > >> >> using the type of caching layer. You mentioned about
> >> increasing
> >> >> > write
> >> >> > > >> >> cache, I am assuming you had to increase the following
> >> parameters
> >> >> > in
> >> >> > > >> >> addition to increase the memstore size:
> >> >> > > >> >>
> >> >> > > >> >> hbase.hregion.max.filesize
> >> >> > > >> >> hbase.hregion.memstore.flush.size
> >> >> > > >> >>
> >> >> > > >> >> On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <
> >> magn...@gmail.com>
> >> >> > > wrote:
> >> >> > > >> >>
> >> >> > > >> >> > We buffer all accesses to HBASE with Varnish SSD based
> >> caching
> >> >> > > layer.
> >> >> > > >> >> > So the impact for reads is negligible.  We have 70 node
> >> >> cluster,
> >> >> > 8
> >> >> > > GB
> >> >> > > >> >> > of RAM per node, relatively weak nodes (intel core 2
> duo),
> >> with
> >> >> > > >> >> > 10-12TB per server of disks.  Inserting 600,000 images
> per
> >> day.
> >> >> >  We
> >> >> > > >> >> > have relatively little of compaction activity as we made
> our
> >> >> > write
> >> >> > > >> >> > cache much larger than read cache - so we don't
> experience
> >> >> region
> >> >> > > file
> >> >> > > >> >> > fragmentation as much.
> >> >> > > >> >> >
> >> >> > > >> >> > -Jack
> >> >> > > >> >> >
> >> >> > > >> >> > On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <
> >> >> > > >> mohitanch...@gmail.com>
> >> >> > > >> >> > wrote:
> >> >> > > >> >> > > I think it really depends on volume of the traffic,
> data
> >> >> > > >> distribution
> >> >> > > >> >> per
> >> >> > > >> >> > > region, how and when files compaction occurs, number of
> >> nodes
> >> >> > in
> >> >> > > the
> >> >> > > >> >> > > cluster. In my experience when it comes to blob data
> where
> >> >> you
> >> >> > > are
> >> >> > > >> >> > serving
> >> >> > > >> >> > > 10s of thousand+ requests/sec writes and reads then
> it's
> >> very
> >> >> > > >> difficult
> >> >> > > >> >> > to
> >> >> > > >> >> > > manage HBase without very hard operations and
> maintenance
> >> in
> >> >> > > play.
> >> >> > > >> Jack
> >> >> > > >> >> > > earlier mentioned they have 1 billion images, It would
> be
> >> >> > > >> interesting
> >> >> > > >> >> to
> >> >> > > >> >> > > know what they see in terms of compaction, no of
> requests
> >> per
> >> >> > > sec.
> >> >> > > >> I'd
> >> >> > > >> >> be
> >> >> > > >> >> > > surprised that in high volume site it can be done
> without
> >> any
> >> >> > > >> Caching
> >> >> > > >> >> > layer
> >> >> > > >> >> > > on the top to alleviate IO spikes that occurs because
> of
> >> GC
> >> >> and
> >> >> > > >> >> > compactions.
> >> >> > > >> >> > >
> >> >> > > >> >> > > On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <
> >> >> > > donta...@gmail.com
> >> >> > > >> >
> >> >> > > >> >> > wrote:
> >> >> > > >> >> > >
> >> >> > > >> >> > >> IMHO, if the image files are not too huge, Hbase can
> >> >> > efficiently
> >> >> > > >> serve
> >> >> > > >> >> > the
> >> >> > > >> >> > >> purpose. You can store some additional info along with
> >> the
> >> >> > file
> >> >> > > >> >> > depending
> >> >> > > >> >> > >> upon your search criteria to make the search faster.
> Say
> >> if
> >> >> > you
> >> >> > > >> want
> >> >> > > >> >> to
> >> >> > > >> >> > >> fetch images by the type, you can store images in one
> >> column
> >> >> > and
> >> >> > > >> its
> >> >> > > >> >> > >> extension in another column(jpg, tiff etc).
> >> >> > > >> >> > >>
> >> >> > > >> >> > >> BTW, what exactly is the problem which you are facing.
> >> You
> >> >> > have
> >> >> > > >> >> written
> >> >> > > >> >> > >> "But I still cant do it"?
> >> >> > > >> >> > >>
> >> >> > > >> >> > >> Warm Regards,
> >> >> > > >> >> > >> Tariq
> >> >> > > >> >> > >> https://mtariq.jux.com/
> >> >> > > >> >> > >>
> >> >> > > >> >> > >>
> >> >> > > >> >> > >> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <
> >> >> > > >> >> > michael_se...@hotmail.com
> >> >> > > >> >> > >> >wrote:
> >> >> > > >> >> > >>
> >> >> > > >> >> > >> > That's a viable option.
> >> >> > > >> >> > >> > HDFS reads are faster than HBase, but it would
> require
> >> >> first
> >> >> > > >> hitting
> >> >> > > >> >> > the
> >> >> > > >> >> > >> > index in HBase which points to the file and then
> >> fetching
> >> >> > the
> >> >> > > >> file.
> >> >> > > >> >> > >> > It could be faster... we found storing binary data
> in a
> >> >> > > sequence
> >> >> > > >> >> file
> >> >> > > >> >> > and
> >> >> > > >> >> > >> > indexed on HBase to be faster than HBase, however,
> YMMV
> >> >> and
> >> >> > > HBase
> >> >> > > >> >> has
> >> >> > > >> >> > >> been
> >> >> > > >> >> > >> > improved since we did that project....
> >> >> > > >> >> > >> >
> >> >> > > >> >> > >> >
> >> >> > > >> >> > >> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv <
> >> >> > > >> >> > >> dwivedishash...@gmail.com>
> >> >> > > >> >> > >> > wrote:
> >> >> > > >> >> > >> >
> >> >> > > >> >> > >> > > Hi Kavish,
> >> >> > > >> >> > >> > >
> >> >> > > >> >> > >> > > i have a better idea for you copy your image files
> >> to a
> >> >> > > single
> >> >> > > >> >> file
> >> >> > > >> >> > on
> >> >> > > >> >> > >> > > hdfs, and if new image comes append it to the
> >> existing
> >> >> > > image,
> >> >> > > >> and
> >> >> > > >> >> > keep
> >> >> > > >> >> > >> > and
> >> >> > > >> >> > >> > > update the metadata and the offset to the HBase.
> >> Because
> >> >> > if
> >> >> > > you
> >> >> > > >> >> put
> >> >> > > >> >> > >> > bigger
> >> >> > > >> >> > >> > > image in hbase it wil lead to some issue.
> >> >> > > >> >> > >> > >
> >> >> > > >> >> > >> > >
> >> >> > > >> >> > >> > >
> >> >> > > >> >> > >> > > ∞
> >> >> > > >> >> > >> > > Shashwat Shriparv
> >> >> > > >> >> > >> > >
> >> >> > > >> >> > >> > >
> >> >> > > >> >> > >> > >
> >> >> > > >> >> > >> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <
> >> >> > > >> la...@apache.org>
> >> >> > > >> >> > >> wrote:
> >> >> > > >> >> > >> > >
> >> >> > > >> >> > >> > >> Interesting. That's close to a PB if my math is
> >> >> correct.
> >> >> > > >> >> > >> > >> Is there a write up about this somewhere?
> Something
> >> >> that
> >> >> > we
> >> >> > > >> could
> >> >> > > >> >> > link
> >> >> > > >> >> > >> > >> from the HBase homepage?
> >> >> > > >> >> > >> > >>
> >> >> > > >> >> > >> > >> -- Lars
> >> >> > > >> >> > >> > >>
> >> >> > > >> >> > >> > >>
> >> >> > > >> >> > >> > >> ----- Original Message -----
> >> >> > > >> >> > >> > >> From: Jack Levin <magn...@gmail.com>
> >> >> > > >> >> > >> > >> To: user@hbase.apache.org
> >> >> > > >> >> > >> > >> Cc: Andrew Purtell <apurt...@apache.org>
> >> >> > > >> >> > >> > >> Sent: Thursday, January 10, 2013 9:24 AM
> >> >> > > >> >> > >> > >> Subject: Re: Storing images in Hbase
> >> >> > > >> >> > >> > >>
> >> >> > > >> >> > >> > >> We stored about 1 billion images into hbase with
> >> file
> >> >> > size
> >> >> > > up
> >> >> > > >> to
> >> >> > > >> >> > 10MB.
> >> >> > > >> >> > >> > >> Its been running for close to 2 years without
> issues
> >> >> and
> >> >> > > >> serves
> >> >> > > >> >> > >> > >> delivery of images for Yfrog and ImageShack.  If
> you
> >> >> have
> >> >> > > any
> >> >> > > >> >> > >> > >> questions about the setup, I would be glad to
> answer
> >> >> > them.
> >> >> > > >> >> > >> > >>
> >> >> > > >> >> > >> > >> -Jack
> >> >> > > >> >> > >> > >>
> >> >> > > >> >> > >> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <
> >> >> > > >> >> > mohitanch...@gmail.com
> >> >> > > >> >> > >> >
> >> >> > > >> >> > >> > >> wrote:
> >> >> > > >> >> > >> > >>> I have done extensive testing and have found
> that
> >> >> blobs
> >> >> > > don't
> >> >> > > >> >> > belong
> >> >> > > >> >> > >> in
> >> >> > > >> >> > >> > >> the
> >> >> > > >> >> > >> > >>> databases but are rather best left out on the
> file
> >> >> > system.
> >> >> > > >> >> Andrew
> >> >> > > >> >> > >> > >> outlined
> >> >> > > >> >> > >> > >>> issues that you'll face and not to mention IO
> >> issues
> >> >> > when
> >> >> > > >> >> > compaction
> >> >> > > >> >> > >> > >> occurs
> >> >> > > >> >> > >> > >>> over large files.
> >> >> > > >> >> > >> > >>>
> >> >> > > >> >> > >> > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell
> <
> >> >> > > >> >> > apurt...@apache.org
> >> >> > > >> >> > >> >
> >> >> > > >> >> > >> > >> wrote:
> >> >> > > >> >> > >> > >>>
> >> >> > > >> >> > >> > >>>> I meant this to say "a few really large values"
> >> >> > > >> >> > >> > >>>>
> >> >> > > >> >> > >> > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew
> Purtell <
> >> >> > > >> >> > >> apurt...@apache.org>
> >> >> > > >> >> > >> > >>>> wrote:
> >> >> > > >> >> > >> > >>>>
> >> >> > > >> >> > >> > >>>>> Consider if the split threshold is 2 GB but
> your
> >> one
> >> >> > row
> >> >> > > >> >> > contains
> >> >> > > >> >> > >> 10
> >> >> > > >> >> > >> > >> GB
> >> >> > > >> >> > >> > >>>> as
> >> >> > > >> >> > >> > >>>>> really large value.
> >> >> > > >> >> > >> > >>>>
> >> >> > > >> >> > >> > >>>>
> >> >> > > >> >> > >> > >>>>
> >> >> > > >> >> > >> > >>>>
> >> >> > > >> >> > >> > >>>> --
> >> >> > > >> >> > >> > >>>> Best regards,
> >> >> > > >> >> > >> > >>>>
> >> >> > > >> >> > >> > >>>>   - Andy
> >> >> > > >> >> > >> > >>>>
> >> >> > > >> >> > >> > >>>> Problems worthy of attack prove their worth by
> >> >> hitting
> >> >> > > >> back. -
> >> >> > > >> >> > Piet
> >> >> > > >> >> > >> > Hein
> >> >> > > >> >> > >> > >>>> (via Tom White)
> >> >> > > >> >> > >> > >>>>
> >> >> > > >> >> > >> > >>
> >> >> > > >> >> > >> > >>
> >> >> > > >> >> > >> >
> >> >> > > >> >> > >> >
> >> >> > > >> >> > >>
> >> >> > > >> >> >
> >> >> > > >> >>
> >> >> > > >>
> >> >> > >
> >> >> >
> >> >>
> >>
>

Re: Storing images in Hbase

Reply via email to