Re: storing logs in hbase

Doug Meil Sun, 05 Feb 2012 10:55:24 -0800

Hi there-

You probably want to check out these chapters of the Hbase ref guide:


http://hbase.apache.org/book.html#datamodel
http://hbase.apache.org/book.html#schema
http://hbase.apache.org/book.html#mapreduce

... and with respect to the "40 minutes per report", a common pattern is
to create summary table/files as appropriate.




On 2/5/12 3:37 AM, "mete" <[email protected]> wrote:

>Hello,
>
>i am thinking about using hbase for storing web log data, i like the idea
>to have hdfs underneath so that i wont be worried about failure cases much
>and i can benefit from all the cool HBase features.
>
>The thing i could not figure out is howto effectively store and query the
>data.I am planning to split each kind of log record to 10 - 20 columns and
>then use MR jobs query the table with full scans.
>(I guess i can use hive or pig for this as well but i am not familiar with
>those yet)
>I find this approach simple and easy to implement but on the other hand
>this is like an offline process, it could take a lot of time to get a
>single report. And of course a business user would be very dissappointed
>to
>see that he/she has to wait another 40 mins to get the results of the
>query.
>
>So what i am trying to achieve is to keep this query time as small as
>possible. For this i can sacrifice the write speed as well, i dont really
>have to integrate new logs on-the-fly but a job that runs overnight is
>also
>fine.
>
>So for this kind of situation do you find Hbase useful?
>
>I read about star-schema design to make more effective queries but then
>this makes the developers job a lot more harder because i need to design
>different schemas for different log types, adding a new log type would
>require some time to gather requirements,develop etc...
>
>I thought about creating a very simple hbase shema, like just a key and
>the
>content for each record, and then index this content with lucene, but then
>this sounded like i did not need hbase in the first place because i am not
>really benefiting from it except for storage.Also i could not be sure
>about
>how big my lucene indexes would get, and if i could cope up with bigdata
>on
>lucene. What do you think about lucene indexes on hbase?
>
>I read about how rackspace was doing things, as far as i understood they
>are generating lucene indexes while parsing the logs in hadoop, and then
>merging this index into some system that is serving the previous indexes.(
>http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-que
>ry-terabytes-data)
>
>Does anyone use a similar approach or have any ideas about this?
>
>Do you think any of these are suitable? or if not should i try a different
>way?
>
>Thanks in advance
>Mete

Re: storing logs in hbase

Reply via email to