Re: Hbase and Search Integration

Imran M Yousuf Tue, 20 Mar 2012 06:46:50 -0700

Hi Saurabh,

For integrating HBase and Apache Solr (or any other indexing/search
platform) we came up with Smart CMS [1][2] and there is the Lily
Project [4] too.

We are on the verge of releasing its 0.1 version which we have been
testing for an extensive period of time and will be used in production
straight away. Smart CMS was designed and developed with a goal
of uniting concepts of Objects with (HBase +Solr). IOW, we want to
design objects, and Smart CMS will take care of persisting it and
making it available for search. Though initially we have chosen
Apache Solr as the search engine but it is very easy to plugin any
other search engine of our choice, since we expose the integration of
search functionality through SPI.

A little bit of history of how we came into developing it and what it
is currently being used for. We started the development of it as we
needed a flexible content management system for an e-Commerce Platform
as a Service of ours. As we engrossed ourselves into it we found
'content' to be synonymous to 'Object' in OOP paradigm and we started
development around it. As a result now we have a system that can be
used both as a traditional Content Management System and as a Content
Repository.

We used it in as a traditional CMS capacity to manage Pages for the
partner websites for our e-Commerce PaaS; i.e. customers can create
pages for - product, promotion, store, etc. manage page contents for
front page, category page; links associated products, related products
etc from UI where the UI is dynamically generated using the content
definitions. We also used the CMS for extensive search functionalities
such as, full text search, facet search, range search, auto completion
etc. For this we access the CMS using its Web Service library, we use
Solr directly for advance searches and to access both of them we use a
tag library. The flexibility Smart CMS provided us in fact helped us
win 2 big customers.

We used CMS as a content repository where Smart CMS is actually being
used to generate domain/dto, data access layer codes for API/Service
layers to use them to persist Java POJOs; i.e.users of it defines a
XML we call 'Content Type Definition'. A content type definition is
synonymous to an Object Diagram; where we define objects, their
inheritance and compositions. This code generation is an approach we
took to bypass Java Reflection API and it is done by a Maven Plugin we
have written. We have another plugin which helps us start all CMS
related applications within Maven so that we can write integration
tests on the fly. An example of repository mode is available in our
Application Smart Email Queue [3], which is designed to send emails
from our PaaS. After proving sustainable performance in this mode,
Smart CMS has also been chosen for 4G Telecom Application Server
project's database.

[1] Smart CMS - http://smart-cms.org
[2] Smart CMS Source - https://github.com/SmartITEngineering/smart-cms
[3] Smart Email Queue - https://github.com/SmartITEngineering/smart-email-queue
[4] http://www.lilyproject.org/lily/index.html

We would welcome any feedback, criticism, involvement in Smart CMS. If
you have any further queries please feel free to ask them.

Thank you,

Imran

On Tue, Mar 20, 2012 at 7:38 PM, Agarwal, Saurabh
<saurabh.agar...@citi.com> wrote:
> Hi,
>
> Has anyone integrated search ( Luence, Solr or Elastic) with HBase?
>
> We are implementing log search functionality using HBase. Through Flume, the 
> logs from multiple apps are getting streamed into HBase directly.
>
> A very basic use case is to search a keyword for an application for a certain 
> timeframe ( for example - last hour).
>
> Our row key is app_id:timestamp and all log contents are stored in columns. 
> We started with Regex filter. It worked but do not provide the consistent 
> result.
>
> Now, we are exploring the index search capability in HBase. Our thought 
> process is that first create an inverted index table with row key - search 
> documents and column - the row key of the content table. The search will 
> return all the row keys.
>
> Additional requirement - We would like to limit the results for certain time 
> frame. Second, we would like to display only limited records in descending 
> time order and come back for more if user want to see more records.
>
> Let me know if someone has integrated the search with HBase.
>
> Thanks,
> Saurabh.
>
> -----Original Message-----
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Monday, March 19, 2012 12:33 PM
> To: user@hbase.apache.org
> Subject: Re: There is no data value information in HLog?
>
> Hi,
> Have you noticed this in HLogPrettyPrinter ?
>    options.addOption("p", "printvals", false, "Print values");
>
> Looks like you should have specified the above option.
>
> On Mon, Mar 19, 2012 at 7:31 AM, yonghu <yongyong...@gmail.com> wrote:
>
>> Hello,
>>
>> I used the $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog
>> --dump command to check the HLog information. But I can not find any
>> data information. The output of my HLog file is looks like follows:
>>
>> Sequence 933 from region 85986149309dff24ecf7be4873136f15 in table test
>>  Action:
>>    row: Udo
>>    column: Course:Computer
>>    at time: Mon Mar 19 14:09:29 CET 2012
>>
>> Sequence 935 from region 85986149309dff24ecf7be4873136f15 in table test
>>  Action:
>>    row: Udo
>>    column: Course:Math
>>    at time: Mon Mar 19 14:09:29 CET 2012
>>
>> The functionality of HLog is for recovery. But without data value
>> information, how can hbase use the information in HLog to do recovery.
>> My hbase version is 0.92.0.
>>
>> Regards!
>>
>> Yong
>>

-- 
Imran M Yousuf
Entrepreneur & CEO
Smart IT Engineering Ltd.
Dhaka, Bangladesh
Twitter: @imyousuf - http://twitter.com/imyousuf
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

Re: Hbase and Search Integration

Reply via email to