Re: any suggestions on IIS log storage and analysis?

2014-01-02 Thread Fengyun RAO
A242&dq=hadoop+definitive+guide+WholeFileInputFormat&source=bl&ots=i7BUTBU8Vw&sig=0m5effHuOY1kuqiRofqTbeEl7KU&hl=en&sa=X&ei=yijCUs_YLqHJsQSZ1oD4DQ&ved=0CD0Q6AEwAA> >>> " >>> >>> Yong >>> >>> >>> -

Re: any suggestions on IIS log storage and analysis?

2013-12-31 Thread Peyman Mohajerian
ots=i7BUTBU8Vw&sig=0m5effHuOY1kuqiRofqTbeEl7KU&hl=en&sa=X&ei=yijCUs_YLqHJsQSZ1oD4DQ&ved=0CD0Q6AEwAA> >> " >> >> Yong >> >> >> -- >> Date: Tue, 31 Dec 2013 09:39:58 +0800 >> Subject: Re: any su

Re: any suggestions on IIS log storage and analysis?

2013-12-30 Thread Fengyun RAO
l&ots=i7BUTBU8Vw&sig=0m5effHuOY1kuqiRofqTbeEl7KU&hl=en&sa=X&ei=yijCUs_YLqHJsQSZ1oD4DQ&ved=0CD0Q6AEwAA> > " > > Yong > > > -------------------------- > Date: Tue, 31 Dec 2013 09:39:58 +0800 > Subject: Re: any suggestions on IIS log sto

RE: any suggestions on IIS log storage and analysis?

2013-12-30 Thread java8964
Google "Hadoop WholeFileInputFormat" or search it in book "Hadoop: The Definitive Guide" Yong Date: Tue, 31 Dec 2013 09:39:58 +0800 Subject: Re: any suggestions on IIS log storage and analysis? From: raofeng...@gmail.com To: user@hadoop.apache.org Thanks, Yong! The dependen

Re: any suggestions on IIS log storage and analysis?

2013-12-30 Thread Fengyun RAO
Thanks, Yong! The dependence never cross files, but since HDFS splits files into blocks, it may cross blocks, which makes it difficult to write MR job. I don't quite understand what you mean by "WholeFileInputFormat ". Actually, I have no idea how to deal with dependence across blocks. 2013/12/3

RE: any suggestions on IIS log storage and analysis?

2013-12-30 Thread java8964
I don't know any example of IIS log files. But from what you described, it looks like analyzing one line of log data depends on some previous lines data. You should be more clear about what is this dependence and what you are trying to do. Just based on your questions, you still have different o

Re: any suggestions on IIS log storage and analysis?

2013-12-30 Thread Fengyun RAO
what do you mean by join the data sets? a fake sample log file: #Software: Microsoft Internet Information Services 7.5 #Version: 1.0 #Date: 2013-07-04 20:00:00 #Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status t

Re: any suggestions on IIS log storage and analysis?

2013-12-30 Thread Azuryy Yu
You can run a mapreduce firstly, Join these data sets into one data set. then analyze the joined dataset. On Mon, Dec 30, 2013 at 3:58 PM, Fengyun RAO wrote: > Hi, > > HDFS splits files into blocks, and mapreduce runs a map task for each > block. However, Fields could be changed in IIS log file