Re: Storing images in Hbase

Jack Levin Wed, 23 Jan 2013 20:54:02 -0800

Its best to keep some RAM for caching of the filesystem, besides we
also run datanode which takes heap as well.
Now, please keep in mind that even if you specify heap of say 5GB, if
your server opens threads to communicate with other systems via RPC
(which hbase does a lot), you will indeed use HEAP +
Nthreads*thread*kb_size.  There is a good Sun Microsystems document
about it. (I don't have the link handy).


-Jack



On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <va...@pinterest.com> wrote:
> Thanks for the useful information. I wonder why you use only 5G heap when
> you have an 8G machine ? Is there a reason to not use all of it (the
> DataNode typically takes a 1G of RAM)
>
> On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <magn...@gmail.com> wrote:
>
>> I forgot to mention that I also have this setup:
>>
>> <property>
>>   <name>hbase.hregion.memstore.flush.size</name>
>>   <value>33554432</value>
>>   <description>Flush more often. Default: 67108864</description>
>> </property>
>>
>> This parameter works on per region amount, so this means if any of my
>> 400 (currently) regions on a regionserver has 30MB+ in memstore, the
>> hbase will flush it to disk.
>>
>>
>> Here are some metrics from a regionserver:
>>
>> requests=2, regions=370, stores=370, storefiles=1390,
>> storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0,
>> flushQueueSize=0, usedHeap=3516, maxHeap=4987,
>> blockCacheSize=790656256, blockCacheFree=255245888,
>> blockCacheCount=2436, blockCacheHitCount=218015828,
>> blockCacheMissCount=13514652, blockCacheEvictedCount=2561516,
>> blockCacheHitRatio=94, blockCacheHitCachingRatio=98
>>
>> Note, that memstore is only 2G, this particular regionserver HEAP is set
>> to 5G.
>>
>> And last but not least, its very important to have good GC setup:
>>
>> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m
>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
>> -XX:+PrintGCDateStamps
>> -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log \
>> -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \
>> -XX:+UseParNewGC \
>> -XX:NewSize=128m -XX:MaxNewSize=128m \
>> -XX:-UseAdaptiveSizePolicy \
>> -XX:+CMSParallelRemarkEnabled \
>> -XX:-TraceClassUnloading
>> "
>>
>> -Jack
>>
>> On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma <va...@pinterest.com> wrote:
>> > Hey Jack,
>> >
>> > Thanks for the useful information. By flush size being 15 %, do you mean
>> > the memstore flush size ? 15 % would mean close to 1G, have you seen any
>> > issues with flushes taking too long ?
>> >
>> > Thanks
>> > Varun
>> >
>> > On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <magn...@gmail.com> wrote:
>> >
>> >> That's right, Memstore size , not flush size is increased.  Filesize is
>> >> 10G. Overall write cache is 60% of heap and read cache is 20%.  Flush
>> size
>> >> is 15%.  64 maxlogs at 128MB. One namenode server, one secondary that
>> can
>> >> be promoted.  On the way to hbase images are written to a queue, so
>> that we
>> >> can take Hbase down for maintenance and still do inserts later.
>>  ImageShack
>> >> has ‘perma cache’ servers that allows writes and serving of data even
>> when
>> >> hbase is down for hours, consider it 4th replica 😉 outside of hadoop
>> >>
>> >> Jack
>> >>
>> >>  *From:* Mohit Anchlia <mohitanch...@gmail.com>
>> >> *Sent:* ‎January‎ ‎13‎, ‎2013 ‎7‎:‎48‎ ‎AM
>> >> *To:* user@hbase.apache.org
>> >> *Subject:* Re: Storing images in Hbase
>> >>
>> >> Thanks Jack for sharing this information. This definitely makes sense
>> when
>> >> using the type of caching layer. You mentioned about increasing write
>> >> cache, I am assuming you had to increase the following parameters in
>> >> addition to increase the memstore size:
>> >>
>> >> hbase.hregion.max.filesize
>> >> hbase.hregion.memstore.flush.size
>> >>
>> >> On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <magn...@gmail.com> wrote:
>> >>
>> >> > We buffer all accesses to HBASE with Varnish SSD based caching layer.
>> >> > So the impact for reads is negligible.  We have 70 node cluster, 8 GB
>> >> > of RAM per node, relatively weak nodes (intel core 2 duo), with
>> >> > 10-12TB per server of disks.  Inserting 600,000 images per day.  We
>> >> > have relatively little of compaction activity as we made our write
>> >> > cache much larger than read cache - so we don't experience region file
>> >> > fragmentation as much.
>> >> >
>> >> > -Jack
>> >> >
>> >> > On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <
>> mohitanch...@gmail.com>
>> >> > wrote:
>> >> > > I think it really depends on volume of the traffic, data
>> distribution
>> >> per
>> >> > > region, how and when files compaction occurs, number of nodes in the
>> >> > > cluster. In my experience when it comes to blob data where you are
>> >> > serving
>> >> > > 10s of thousand+ requests/sec writes and reads then it's very
>> difficult
>> >> > to
>> >> > > manage HBase without very hard operations and maintenance in play.
>> Jack
>> >> > > earlier mentioned they have 1 billion images, It would be
>> interesting
>> >> to
>> >> > > know what they see in terms of compaction, no of requests per sec.
>> I'd
>> >> be
>> >> > > surprised that in high volume site it can be done without any
>> Caching
>> >> > layer
>> >> > > on the top to alleviate IO spikes that occurs because of GC and
>> >> > compactions.
>> >> > >
>> >> > > On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <donta...@gmail.com
>> >
>> >> > wrote:
>> >> > >
>> >> > >> IMHO, if the image files are not too huge, Hbase can efficiently
>> serve
>> >> > the
>> >> > >> purpose. You can store some additional info along with the file
>> >> > depending
>> >> > >> upon your search criteria to make the search faster. Say if you
>> want
>> >> to
>> >> > >> fetch images by the type, you can store images in one column and
>> its
>> >> > >> extension in another column(jpg, tiff etc).
>> >> > >>
>> >> > >> BTW, what exactly is the problem which you are facing. You have
>> >> written
>> >> > >> "But I still cant do it"?
>> >> > >>
>> >> > >> Warm Regards,
>> >> > >> Tariq
>> >> > >> https://mtariq.jux.com/
>> >> > >>
>> >> > >>
>> >> > >> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <
>> >> > michael_se...@hotmail.com
>> >> > >> >wrote:
>> >> > >>
>> >> > >> > That's a viable option.
>> >> > >> > HDFS reads are faster than HBase, but it would require first
>> hitting
>> >> > the
>> >> > >> > index in HBase which points to the file and then fetching the
>> file.
>> >> > >> > It could be faster... we found storing binary data in a sequence
>> >> file
>> >> > and
>> >> > >> > indexed on HBase to be faster than HBase, however, YMMV and HBase
>> >> has
>> >> > >> been
>> >> > >> > improved since we did that project....
>> >> > >> >
>> >> > >> >
>> >> > >> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv <
>> >> > >> dwivedishash...@gmail.com>
>> >> > >> > wrote:
>> >> > >> >
>> >> > >> > > Hi Kavish,
>> >> > >> > >
>> >> > >> > > i have a better idea for you copy your image files to a single
>> >> file
>> >> > on
>> >> > >> > > hdfs, and if new image comes append it to the existing image,
>> and
>> >> > keep
>> >> > >> > and
>> >> > >> > > update the metadata and the offset to the HBase. Because if you
>> >> put
>> >> > >> > bigger
>> >> > >> > > image in hbase it wil lead to some issue.
>> >> > >> > >
>> >> > >> > >
>> >> > >> > >
>> >> > >> > > ∞
>> >> > >> > > Shashwat Shriparv
>> >> > >> > >
>> >> > >> > >
>> >> > >> > >
>> >> > >> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <
>> la...@apache.org>
>> >> > >> wrote:
>> >> > >> > >
>> >> > >> > >> Interesting. That's close to a PB if my math is correct.
>> >> > >> > >> Is there a write up about this somewhere? Something that we
>> could
>> >> > link
>> >> > >> > >> from the HBase homepage?
>> >> > >> > >>
>> >> > >> > >> -- Lars
>> >> > >> > >>
>> >> > >> > >>
>> >> > >> > >> ----- Original Message -----
>> >> > >> > >> From: Jack Levin <magn...@gmail.com>
>> >> > >> > >> To: user@hbase.apache.org
>> >> > >> > >> Cc: Andrew Purtell <apurt...@apache.org>
>> >> > >> > >> Sent: Thursday, January 10, 2013 9:24 AM
>> >> > >> > >> Subject: Re: Storing images in Hbase
>> >> > >> > >>
>> >> > >> > >> We stored about 1 billion images into hbase with file size up
>> to
>> >> > 10MB.
>> >> > >> > >> Its been running for close to 2 years without issues and
>> serves
>> >> > >> > >> delivery of images for Yfrog and ImageShack.  If you have any
>> >> > >> > >> questions about the setup, I would be glad to answer them.
>> >> > >> > >>
>> >> > >> > >> -Jack
>> >> > >> > >>
>> >> > >> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <
>> >> > mohitanch...@gmail.com
>> >> > >> >
>> >> > >> > >> wrote:
>> >> > >> > >>> I have done extensive testing and have found that blobs don't
>> >> > belong
>> >> > >> in
>> >> > >> > >> the
>> >> > >> > >>> databases but are rather best left out on the file system.
>> >> Andrew
>> >> > >> > >> outlined
>> >> > >> > >>> issues that you'll face and not to mention IO issues when
>> >> > compaction
>> >> > >> > >> occurs
>> >> > >> > >>> over large files.
>> >> > >> > >>>
>> >> > >> > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <
>> >> > apurt...@apache.org
>> >> > >> >
>> >> > >> > >> wrote:
>> >> > >> > >>>
>> >> > >> > >>>> I meant this to say "a few really large values"
>> >> > >> > >>>>
>> >> > >> > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <
>> >> > >> apurt...@apache.org>
>> >> > >> > >>>> wrote:
>> >> > >> > >>>>
>> >> > >> > >>>>> Consider if the split threshold is 2 GB but your one row
>> >> > contains
>> >> > >> 10
>> >> > >> > >> GB
>> >> > >> > >>>> as
>> >> > >> > >>>>> really large value.
>> >> > >> > >>>>
>> >> > >> > >>>>
>> >> > >> > >>>>
>> >> > >> > >>>>
>> >> > >> > >>>> --
>> >> > >> > >>>> Best regards,
>> >> > >> > >>>>
>> >> > >> > >>>>   - Andy
>> >> > >> > >>>>
>> >> > >> > >>>> Problems worthy of attack prove their worth by hitting
>> back. -
>> >> > Piet
>> >> > >> > Hein
>> >> > >> > >>>> (via Tom White)
>> >> > >> > >>>>
>> >> > >> > >>
>> >> > >> > >>
>> >> > >> >
>> >> > >> >
>> >> > >>
>> >> >
>> >>
>>

Re: Storing images in Hbase

Reply via email to