Take a look at this: http://wiki.apache.org/hadoop/Hbase/DesignOverview
then read the bigtable paper. On Sun, Mar 20, 2011 at 6:39 PM, edward choi <mp2...@gmail.com> wrote: > Hi, > > I'm planning to crawl thousands of news rss feeds via MapReduce, and save > each news article into HBase directly. > > My concern is that Hadoop does not work well with a large number of > small-size files, > > and if I insert every single news article (which is small-size apparently) > into HBase, (without separately storing it into HDFS) > > I might end up with millions of files that are only several kilobytes in > size. > > Or does HBase somehow automatically append each news article into a single > file, so that it would have only a few files of large-size? > > Ed >