A242&dq=hadoop+definitive+guide+WholeFileInputFormat&source=bl&ots=i7BUTBU8Vw&sig=0m5effHuOY1kuqiRofqTbeEl7KU&hl=en&sa=X&ei=yijCUs_YLqHJsQSZ1oD4DQ&ved=0CD0Q6AEwAA>
>>> "
>>>
>>> Yong
>>>
>>>
>>> -
ots=i7BUTBU8Vw&sig=0m5effHuOY1kuqiRofqTbeEl7KU&hl=en&sa=X&ei=yijCUs_YLqHJsQSZ1oD4DQ&ved=0CD0Q6AEwAA>
>> "
>>
>> Yong
>>
>>
>> --
>> Date: Tue, 31 Dec 2013 09:39:58 +0800
>> Subject: Re: any su
l&ots=i7BUTBU8Vw&sig=0m5effHuOY1kuqiRofqTbeEl7KU&hl=en&sa=X&ei=yijCUs_YLqHJsQSZ1oD4DQ&ved=0CD0Q6AEwAA>
> "
>
> Yong
>
>
> --------------------------
> Date: Tue, 31 Dec 2013 09:39:58 +0800
> Subject: Re: any suggestions on IIS log sto
Google "Hadoop WholeFileInputFormat" or search it in book "Hadoop: The
Definitive Guide"
Yong
Date: Tue, 31 Dec 2013 09:39:58 +0800
Subject: Re: any suggestions on IIS log storage and analysis?
From: raofeng...@gmail.com
To: user@hadoop.apache.org
Thanks, Yong!
The dependen
Thanks, Yong!
The dependence never cross files, but since HDFS splits files into blocks,
it may cross blocks, which makes it difficult to write MR job. I don't
quite understand what you mean by "WholeFileInputFormat ". Actually, I have
no idea how to deal with dependence across blocks.
2013/12/3
I don't know any example of IIS log files. But from what you described, it
looks like analyzing one line of log data depends on some previous lines data.
You should be more clear about what is this dependence and what you are trying
to do.
Just based on your questions, you still have different o
what do you mean by join the data sets?
a fake sample log file:
#Software: Microsoft Internet Information Services 7.5
#Version: 1.0
#Date: 2013-07-04 20:00:00
#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port
cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status
t
You can run a mapreduce firstly, Join these data sets into one data set.
then analyze the joined dataset.
On Mon, Dec 30, 2013 at 3:58 PM, Fengyun RAO wrote:
> Hi,
>
> HDFS splits files into blocks, and mapreduce runs a map task for each
> block. However, Fields could be changed in IIS log file