That's not a whole lot of information to give you recommendations about the
schema. However, at a high level, you should think about structuring your row
keys such that you minimize the requirement for scans and can get the required
data based on the row keys.
So, putting the user in the row k
Analysis include:
Visitor level
Session level - visitors could have multiple levels
Page hits, conversions - popular pages, sequence of pages hit in one session
Orders purchased - mostly determined by URL and query parameters
How should I go about designing schema?
Thanks
Sent from my iPad
On
Mohit,
What would be your read patterns later on? Are you going to read per
session, or for a time period, or for a set of users, or process through
the entire dataset every time? That would play an important role in
defining your keys and columns.
-Amandeep
On Tue, Jun 26, 2012 at 1:34 PM, Mohi
: HBase Schema Design for clickstream data
I am starting out with a new application where I need to store users
clickstream data. I'll have Visitor Id, session id along with other page
related data. I am wondering if I should just key off randomly generated
session id and store all the page re
I am starting out with a new application where I need to store users
clickstream data. I'll have Visitor Id, session id along with other page
related data. I am wondering if I should just key off randomly generated
session id and store all the page related data as columns inside that row
assuming t