The KNime program ("nime") from KNime.org is a great way to get your
feet wet in data mining. It has some machine learning stuff as well.
It lets you poke around your data and prototype ways to tease out
facts. It has a bunch of machine learning tools and just plain
data-shuffling tools. It's a visual graph programming language, so buy
a very big monitor. And it wraps Weka and R.On Wed, Sep 1, 2010 at 10:48 AM, hdev ml <[email protected]> wrote: > I agree with you that there is preparation needed for Mahout processing. > > I was just trying to save on that effort by re-using the data in hive > instead of double processing it. > > I may have some more questions when I actually dive into the mining part. > (possibly a couple of months down the line). > > Thanks for your inputs. > > On Wed, Sep 1, 2010 at 12:58 AM, Sean Owen <[email protected]> wrote: > >> Hive does something fairly unrelated to Mahout. It's an indexing and >> query system. Both might start from the same source data, but to do >> different things. There is no common format, no. Mahout generally >> operates on text files or "Vectors" in SequenceFiles. So there's some >> translation there at least. >> >> But I think a message here is that there's more preparation and >> thought necessary to start data mining. It's not like you point a data >> mining tool at some data and answers start flowing automatically. >> You'd have to be deliberately extracting and preparing data anyhow. >> >> On Tue, Aug 31, 2010 at 11:41 PM, hdev ml <[email protected]> wrote: >> > Thanks Sean for the answers. Thanks for Ted for validation. >> > >> > Now my question is, since I want to do both reporting of large data/ >> > datawarehouse, let's assume I choose Hive for that. >> > >> > Now can Mahout integrate with Hive to make use of this data for learning, >> > mining etc.? or do I have to export the hive data into text files which >> can >> > be hosted by Haddop/HDFS which later on Mahout can use for data mining. >> > >> > In short, can data warehousing part be done by Hive and then can data >> mining >> > part be done by Mahout on this hive data? >> > >> > -H >> > >> > On Tue, Aug 31, 2010 at 3:03 PM, Sean Owen <[email protected]> wrote: >> > >> >> On Tue, Aug 31, 2010 at 10:55 PM, hdev ml <[email protected]> wrote: >> >> > Per my understanding of hive, we can do some statistical reporting, >> like >> >> > frequency of user sessions, which geographical region, which device he >> is >> >> > using the most etc. >> >> >> >> Yes that's about what Hive is good for, if you're looking for some >> >> open-source libraries along those lines. >> >> >> >> > >> >> > But we also want to mine this data to get some predictive capabilities >> >> like >> >> > what is the likelihood that the user will use the same device again or >> if >> >> we >> >> > get sales/marketing data (on the roadmap for future), we want to >> possibly >> >> > predict which region to put more marketing/sales efforts. What is the >> >> > pattern for growth of user base, in which geographical regions etc. >> What >> >> is >> >> > the pattern of user requests failing and a number of requirements like >> >> these >> >> > from the business. >> >> >> >> This is pretty broad but I can try to give you the names of problems >> >> this sounds like, to guide your search. >> >> >> >> Predicting user usage of device sounds like a classification problem, >> >> like developing a probabilistic model of behavior. >> >> >> >> Deciding where to put marketing dollars sounds like a business >> >> problem, not machine learning. I don't think a computer can tell you >> >> that. Some techniques might help you identify trends in sales, but >> >> this is simple regression, not really machine learning. >> >> >> >> Looking for patterns in failure sounds a bit like frequent pattern >> >> mining -- trying to find events that go together unusually often. >> >> >> > >> > -- Lance Norskog [email protected]
