subject:"Concatenate adjacent lines with hadoop"

Concatenate adjacent lines with hadoop

2013-02-26 Thread Matthieu Labour

Hi Please find below the issue I need to solve. Thank you in advance for your help/ tips. I have log files where sometimes log lines are splited (this happens when the log line exceeds a specific length) Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] LOGTAGFIELD-0FIELD-MAX

Re: Concatenate adjacent lines with hadoop

2013-02-26 Thread Azuryy Yu

That's easy, in your example, Map output key: FIELD-N ; Map output value: just original value. In the reduece: if there is LOGTAG in the value, then this is the first log entry. if not, this is a splitted log entry. just get a sub string and concat with the first log entry. Am I explain clearly?

Re: Concatenate adjacent lines with hadoop

2013-02-26 Thread Matthieu Labour

Thank you for your answer. I am not sure i understand fully. My email was most likely not very clear. Here is an example of log line. Please note the beginning of the log line YSLOGROW. Please note that the second line should be concatenated with the first line. Dec 16 21:47:20 d.14b48e47-abf2-403

Re: Concatenate adjacent lines with hadoop

2013-02-26 Thread Azuryy Yu

I just noticed your two lines are all started with: Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app does that different for other lines? if your answer is yes, then just using this prefix as map output key. On Wed, Feb 27, 2013 at 1:01 PM, Matthieu Labour wrote: > Thank you for your