Hi All,

I know that my query is not related to the flume, but it has correlation to the 
flume based solution. It’ll help others too, to understand the design of the 
flume based solution.
So, the story begin from here, we have 300+ servers running about 20+ apps on 
these hosts. These apps generates five different types of logs as per their 
functional behavior. I am designing the solution to collect all these logs from 
these hosts and store them to hadoop cluster. We want to analyze all these logs 
for purpose of monitoring, and current trends etc. I want to design the 
solution from both collection and analysis point of view. The solution should 
be robust to support the requirement from both ends. So, I need your help to 
design the solution for storage of logs so that we could efficiently analyze.
According to my design, I defined the structure of the log store as follows:
<MainDirectory>…<LogType>…<Host>…<Date>…<logfile>  // rolling interval is 1 min.

I think above directory structure to store the logs is fine enough as it’ll be 
simple enough to utilize the data for analysis as it’s clearly define the data 
belongs to which server, log type and the date. But, in my PoC, I ended up with 
lots of small files as each host generate 20-50 logs per second with 240 bytes 
log size. We expect more number of logs generated by the system in future e.g. 
100-150. So, according to the above numbers, should I change my directory 
structure by removing the host directory and combine the logs from all 
available sources for a particular type of log and store them date wise? In 
this case, I don’t want to lose the host information associated with each log 
event. So, I can store the host information as part of the log itself. So, the 
changed directory structure would be as follows:
<MainDirectory>…<LogType>…<Date>…<logfile>

So, what should be the idle directory structure to store logs data?

Please provide your valuable inputs or suggest me some forum where I could get 
suggestion from the experts.

Thanks & Regards,
Ashutosh Sharma



이 메일은 지정된 수취인만을 위해 작성되었으며, 중요한 정보나 저작권을 포함하고 있을 수 있습니다. 어떠한 권한 없이, 본 문서에 포함된 
정보의 전부 또는 일부를 무단으로 제3자에게 공개, 배포, 복사 또는 사용하는 것을 엄격히 금지합니다. 만약, 본 메일이 잘못 전송된 경우, 
발신인 또는 당사에 알려주시고, 본 메일을 즉시 삭제하여 주시기 바랍니다.
This E-mail may contain confidential information and/or copyright material. 
This email is intended for the use of the addressee only. If you receive this 
email by mistake, please either delete it without reproducing, distributing or 
retaining copies thereof or notify the sender immediately.

Reply via email to