Pig has a Log loader in Piggybank. You can use that to generate the columns
of that table and make the table point to it.

Take a look--
https://github.com/apache/pig/tree/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/apachelog

Thanks,
Aniket

On Tue, Dec 6, 2011 at 10:19 AM, Abhishek Pratap Singh
<manu.i...@gmail.com>wrote:

> Hi Sangeetha,
>
> One more easier option is to use Flume Decorators to put some delimiter in
> you stream of data and then load the data into table.
>
> For example:
> Below data can be converted to say PIPE Delimited data (You an code for
> any delimiter) by using Flume decorators.
>
> [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0]
> [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource]
> [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User:
> 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server:
> server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering
> Method = getKey()
>
> PIPE Delimited---
> 2011-10-17 16:30:57,281 |  INFO 
> |33157362@qtp-28456974-0|net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource|Organization:
> Travelocity|Client: AA|Location of device: DFW|User: 550393|user_role:
> |CorelationId: 248|Component: Crossplane|Server: server01|Request:
> seats=5|Response: yes|Status: pass| - Entering Method = getKey()
>
> Now once you have this pipe delimited data, you can create a table with
> pipe delimiter and load this file.
>
> You can choose any delimiter as well as remove some data in flume
> decorator and finally load into Hive table with same schema and delimiter.
> Hope it helps.
>
> ~Abhishek P Singh
>
>  On Tue, Dec 6, 2011 at 7:58 AM, alo alt <wget.n...@googlemail.com> wrote:
>
>> Hi Sangeetha,
>>
>> sry, was on road and the answer tooks a while.
>>
>> As Mark wrote, SerDe will be a good start. If its usefull for you take a
>> look at http://code.google.com/p/hive-json-serde/wiki/GettingStarted.
>>
>> - alex
>>
>>
>> On Tue, Dec 6, 2011 at 10:26 AM, sangeetha k <get2sa...@yahoo.com> wrote:
>>
>>> Hi,
>>>
>>> Thanks for the response.
>>> Yes, You got my question.
>>>
>>> An example of my log message line will be as below:
>>>
>>> [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0]
>>> [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource]
>>> [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User:
>>> 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server:
>>> server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering
>>> Method = getKey()
>>>
>>> How to specify the delimiter, while describing the table?
>>>
>>> Thanks,
>>> Sangeetha
>>>
>>>   *From:* alo alt <wget.n...@googlemail.com>
>>> *To:* user@hive.apache.org; sangeetha k <get2sa...@yahoo.com>
>>> *Sent:* Tuesday, December 6, 2011 2:01 PM
>>> *Subject:* Re: log4j format logs in Hive table
>>>
>>> Hi,
>>>
>>> I hope I understood your question correct - did you describe your table?
>>> Like
>>> "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW
>>> FORMAT DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS
>>> TEXTFILE;"
>>>
>>> row* = a name of your descision, Datatype look @documentation.
>>>
>>> After import via "insert (overwrite) table YOURTABLE"
>>>
>>> - alex
>>>
>>>
>>> On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k <get2sa...@yahoo.com> wrote:
>>>
>>>  Hi,
>>>
>>> I am new to Hive.
>>>
>>> I am using Flume agent to collect log4j logs and sending to HDFS.
>>> Now i wanted to load the log4j format logs from HDFS to Hive tables.
>>> Each of the attributes in log statements like timestamp, level,
>>> classname etc... should be loaded in seperate columns in the Hive tables.
>>>
>>> I tried creating table in Hive and loaded the entire log in one column,
>>> but dont know how to load the above mentioned data in seperate columns.
>>>
>>> Please send me your suggestions, any links, tutorials on this.
>>>
>>> Thanks,
>>> Sangeetha
>>>
>>>
>>>
>>>
>>> --
>>> Alexander Lorenz
>>> http://mapredit.blogspot.com
>>>
>>> *P **Think of the environment: please don't print this email unless you
>>> really need to.*
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Alexander Lorenz
>> http://mapredit.blogspot.com
>>
>> *P **Think of the environment: please don't print this email unless you
>> really need to.*
>>
>>
>>
>


-- 
"...:::Aniket:::... Quetzalco@tl"

Reply via email to