Analysis include: Visitor level Session level - visitors could have multiple levels Page hits, conversions - popular pages, sequence of pages hit in one session Orders purchased - mostly determined by URL and query parameters
How should I go about designing schema? Thanks Sent from my iPad On Jun 27, 2012, at 2:01 PM, Amandeep Khurana <ama...@gmail.com> wrote: > Mohit, > > What would be your read patterns later on? Are you going to read per > session, or for a time period, or for a set of users, or process through > the entire dataset every time? That would play an important role in > defining your keys and columns. > > -Amandeep > > On Tue, Jun 26, 2012 at 1:34 PM, Mohit Anchlia <mohitanch...@gmail.com>wrote: > >> I am starting out with a new application where I need to store users >> clickstream data. I'll have Visitor Id, session id along with other page >> related data. I am wondering if I should just key off randomly generated >> session id and store all the page related data as columns inside that row >> assuming that this would also give good distribution accross region >> servers. In a session user could send 100s of HTML requests and get >> responses. If someone is already doing this in HBase I would like to learn >> more about it as to how they have designed the schema. >>