Adding hbase user. On Wed, Oct 1, 2014 at 11:02 AM, Wilm Schumacher <wilm.schumac...@cawoom.com > wrote:
> Hi, > > first: I think hbase is what you are looking for. If I understand > correctly you want to show the customer his or her data very fast and > let them manipulate their data. So you need something like a data > warehouse system. Thus, hbase is the method of choice for you (and I > think for your kind of data, hbase is a better choice than cassandra or > mongoDB). But of course you need a running hadoop system to run a hbase. > So it's not an either/or ;) > > (my answers are for hbase, as I think it's what you are looking for. If > you are not interested, just ignore the following text. Sry @all by > writing about hbase on this list ;).) > > Am 01.10.2014 um 17:24 schrieb mani kandan: > > 1) How much web usage data will a typical website like ours collect on a > > daily basis? (I know I can ask our IT department, but I would like to > > gather some background idea before talking to them.) > well, if you have the option to ask your IT department you should do > that, because everyone here would have to guess. You would have to > explain very detailed what you have to do to let us guess. If you e.g. > want to track the user on what he or she has clicked, perhaps to make > personalized ads, than you have to save more data. So, you should ask > the persons who have the data right away without guessing. > > > 3) How many clusters/nodes would I need to ​run a web usage analytics > > system? > in the book "hbase in action" there are some recommendations for some > "case studies" (part IV "deploying hbase"). There are some thoughts on > the number of nodes, and how to use them, depending on the size of your > data > > > 4) What are the ways for me to use our data? (One use case I'm thinking > > of is to analyze the error messages log for each page on quote process > > to redesign the UI. Is this possible?) > sure. And this should be very easy. I would pump the error log into a > hbase table. By this method you could read the messages directly from > the hbase shell (if they are few enough). Or you could use hive to query > your log a little more "sql like" and make statistics very easy. > > > 5) How long would it take for me to set up and start such a system? > for a novice who have to do it for the first time: for the stand alone > hbase system perhaps 2 hours. For a complete distributed test cluster > ... perhaps a day. For the real producing system, with all security > features ... a little longer ;). > > > I'm sorry if some/all of these questions are unanswerable. I just want > > to discuss my thoughts, and get an idea of what things can I achieve by > > going the way of Hadoop. > well, I think, but I could err, that you think of hadoop (or hbase) in a > way that you just can change the "database backend" from "SQL" to > "hbase/hadoop" and everything would run right away. This will not be > that easy. You would have to change the code of your web application in > a very fundamental way. You have to rethink all the table designs etc., > so this could be more complicate than you think right know. > > However, hbase/hadoop hase some advantages which are very interesing for > you. Well first, it is distributed, which enables your company to grow > almost limitless, or to collect more data about your customers so you > can get more informations (and sell more stuff). And map reduce is a > wonderful tool for making real fancy "statistics", which is very > interesting for an insurance company. Your mathematical economist will > REALLY love it ;). > > Hope this helped. > > best wishes > > Wilm > > >