Hi Sean, I may not be able to divulge a lot of information about the business because of confidentiality and since I am a new employee here :), but
the log data has 1. different types of user requests - Different types of requests and its related data. 2. different session parameters - When did the user session start. What is the stickiness of the user etc. 3. different context parameters, such as user location, where he is going etc. Per my understanding of hive, we can do some statistical reporting, like frequency of user sessions, which geographical region, which device he is using the most etc. But we also want to mine this data to get some predictive capabilities like what is the likelihood that the user will use the same device again or if we get sales/marketing data (on the roadmap for future), we want to possibly predict which region to put more marketing/sales efforts. What is the pattern for growth of user base, in which geographical regions etc. What is the pattern of user requests failing and a number of requirements like these from the business. Does that fit the data mining bill? or am I looking in the wrong place. Again thanks for your time and help. HDev On Tue, Aug 31, 2010 at 2:40 PM, Sean Owen <[email protected]> wrote: > I think you'd have to begin to define what you want to do with the > logs? What do you mean when you say "data mining"? > > On Tue, Aug 31, 2010 at 10:21 PM, hdev ml <[email protected]> wrote: > > Hi all, > > > > I am currently trying to find out what frameworks/software/product will > > support data warehousing/data mining the best. > > > > We get around 1.5+ TB of log data every month and we want to do some > > reporting on top of that and later on move on to data mining. > > > > I am a total newbie in this world, coming from a RDBMS background and > wanted > > to get your opinion on what is the best approach to take in this regard. > > > > I looked around the hadoop movement and the corresponding sub projects. > > > > I found Hive as a framework can support and scale for this large data. > > > > So first of phase of reporting can be done using hive. But can I reuse > the > > same data for data mining through the Mahout project? > > > > Can somebody please guide me regarding this? > > > > Thanks for your help. > > > > HDev. > > >
