Hi, I am working on expanding the PigMix benchmark. I am interested to add queries matching more realistic use cases, such as finding what are the highest revenue of a page or what is the burst of activity for a specific page. Additionally, I would like to add OLTP-like queries such as finding other users from the same neighborhood looking at a specific page.
The current PigMix table does not have an id for a page access (see details on page_views here <https://cwiki.apache.org/confluence/display/PIG/PigMix>). Therefore I cannot run the above queries. I am wondering why was this field omitted from the schema of page_views? It seems a fundamental field for all aggregation queries on page_views. I see two options: either there is another use case that this schema targets (what is it?) or the benchmark's goal is not to target real use cases and is merely oriented towards a synthetic performance and measurement goal. Any ideas? Thank you, Keren PS: I sent this email to both the devs and users' mailing list, not to spam us :) but because these queries are both a users and a development concern. -- Keren Ouaknine www.kereno.com