Hi All, I know that my query is not related to the flume, but it has correlation to the flume based solution. It’ll help others too, to understand the design of the flume based solution. So, the story begin from here, we have 300+ servers running about 20+ apps on these hosts. These apps generates five different types of logs as per their functional behavior. I am designing the solution to collect all these logs from these hosts and store them to hadoop cluster. We want to analyze all these logs for purpose of monitoring, and current trends etc. I want to design the solution from both collection and analysis point of view. The solution should be robust to support the requirement from both ends. So, I need your help to design the solution for storage of logs so that we could efficiently analyze. According to my design, I defined the structure of the log store as follows: <MainDirectory>…<LogType>…<Host>…<Date>…<logfile> // rolling interval is 1 min.
I think above directory structure to store the logs is fine enough as it’ll be simple enough to utilize the data for analysis as it’s clearly define the data belongs to which server, log type and the date. But, in my PoC, I ended up with lots of small files as each host generate 20-50 logs per second with 240 bytes log size. We expect more number of logs generated by the system in future e.g. 100-150. So, according to the above numbers, should I change my directory structure by removing the host directory and combine the logs from all available sources for a particular type of log and store them date wise? In this case, I don’t want to lose the host information associated with each log event. So, I can store the host information as part of the log itself. So, the changed directory structure would be as follows: <MainDirectory>…<LogType>…<Date>…<logfile> So, what should be the idle directory structure to store logs data? Please provide your valuable inputs or suggest me some forum where I could get suggestion from the experts. Thanks & Regards, Ashutosh Sharma 이 메일은 지정된 수취인만을 위해 작성되었으며, 중요한 정보나 저작권을 포함하고 있을 수 있습니다. 어떠한 권한 없이, 본 문서에 포함된 정보의 전부 또는 일부를 무단으로 제3자에게 공개, 배포, 복사 또는 사용하는 것을 엄격히 금지합니다. 만약, 본 메일이 잘못 전송된 경우, 발신인 또는 당사에 알려주시고, 본 메일을 즉시 삭제하여 주시기 바랍니다. This E-mail may contain confidential information and/or copyright material. This email is intended for the use of the addressee only. If you receive this email by mistake, please either delete it without reproducing, distributing or retaining copies thereof or notify the sender immediately.
