Re: [vox-tech] MySQL question: making stats out of orders

2009-12-03 Thread Brian Lavender
What if you just use RRD tool?
http://oss.oetiker.ch/rrdtool/

On Tue, Dec 01, 2009 at 05:32:20PM -0800, Bill Kendrick wrote:
 
 Ok, so I've got a few queries our website does that
 are used to determine which products are 'top sellers'
 in the past 90 days' worth of sales.
 
 Right now, this involves joining the product table
 with the orders table, by way of an items-in-the-order table.
 
 In other words, to figure out what sold the best in the
 last 90 days, it's something like:
 
   SELECT * FROM products
   JOIN orderitem ON products.id = orderitem.productid
   JOIN orders ON orders.id = orderitem.orderid
   WHERE orders.timestamp = { 90 days ago }
 
 As time goes on, our records or orders, order items, and
 actual products all keep growing.  That means this task
 gets more and more intensive.
 
 
 Now, we've got some statistics tables set up for other
 purposes -- how many times a product was viewed,
 as well as when any particular logged-in-users last
 visited particular items, which allows us to do the
 people who recently looked at X also recently looked at Y.
 
 I'm thinking we could do something similar for sales,
 to simplify the top sellers queries.  They'd hit
 a single stats table (which could, on a daily basis,
 be cleared of any 90-days-ago entries), rather than
 having to JOIN two large tables that go back to day 1.
 
 
 My current problem -- no doubt due to lack of brain power
 at the end of the day -- is how to take the CURRENT data
 from the orders and orderitem tables so that I can do an
 initial population of a new stats table.
 
 Our views stats table, for example, is something
 along the lines of:
 
   productid, date, views
 
 where those first two columns act as a key.  (Date is at the
 'day' level of granularity.)
 
 
 And we count a view like so:
 
   INSERT INTO views (productid, date, views) VALUES (###, NOW(), 1)
   ON DUPLICATE KEY UPDATE views=views+1
 
 In other words, the first time a product is viewed on any given day,
 we insert a new row, otherwise we update its existing row.
 
 
 Anyone feel like flexing their MySQL muscles and provide some
 ideas on how to populate a similar table (tracking sales) based
 on the orders/orderitem table combo explained above?
 
 Thanks.  Sorry for asking you to do my (home)work for me.
 
 
 -- 
 -bill!
 Sent from my computer
 ___
 vox-tech mailing list
 vox-tech@lists.lugod.org
 http://lists.lugod.org/mailman/listinfo/vox-tech

-- 
Brian Lavender
http://www.brie.com/brian/
___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


[vox-tech] MySQL question: making stats out of orders

2009-12-01 Thread Bill Kendrick

Ok, so I've got a few queries our website does that
are used to determine which products are 'top sellers'
in the past 90 days' worth of sales.

Right now, this involves joining the product table
with the orders table, by way of an items-in-the-order table.

In other words, to figure out what sold the best in the
last 90 days, it's something like:

  SELECT * FROM products
  JOIN orderitem ON products.id = orderitem.productid
  JOIN orders ON orders.id = orderitem.orderid
  WHERE orders.timestamp = { 90 days ago }

As time goes on, our records or orders, order items, and
actual products all keep growing.  That means this task
gets more and more intensive.


Now, we've got some statistics tables set up for other
purposes -- how many times a product was viewed,
as well as when any particular logged-in-users last
visited particular items, which allows us to do the
people who recently looked at X also recently looked at Y.

I'm thinking we could do something similar for sales,
to simplify the top sellers queries.  They'd hit
a single stats table (which could, on a daily basis,
be cleared of any 90-days-ago entries), rather than
having to JOIN two large tables that go back to day 1.


My current problem -- no doubt due to lack of brain power
at the end of the day -- is how to take the CURRENT data
from the orders and orderitem tables so that I can do an
initial population of a new stats table.

Our views stats table, for example, is something
along the lines of:

  productid, date, views

where those first two columns act as a key.  (Date is at the
'day' level of granularity.)


And we count a view like so:

  INSERT INTO views (productid, date, views) VALUES (###, NOW(), 1)
  ON DUPLICATE KEY UPDATE views=views+1

In other words, the first time a product is viewed on any given day,
we insert a new row, otherwise we update its existing row.


Anyone feel like flexing their MySQL muscles and provide some
ideas on how to populate a similar table (tracking sales) based
on the orders/orderitem table combo explained above?

Thanks.  Sorry for asking you to do my (home)work for me.


-- 
-bill!
Sent from my computer
___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] MySQL question: making stats out of orders

2009-12-01 Thread Bill Kendrick
On Tue, Dec 01, 2009 at 05:32:20PM -0800, Bill Kendrick wrote:
 Anyone feel like flexing their MySQL muscles and provide some
 ideas on how to populate a similar table (tracking sales) based
 on the orders/orderitem table combo explained above?
 
 Thanks.  Sorry for asking you to do my (home)work for me.

Maybe as simple as something like:

SELECT
  orderitem.productid,
  SUM(orderitem.qty),
  FROM_DAYS(TO_DAYS(orders.ordertime)) AS day
FROM orders
JOIN orderitem ON orders.id = orderitem.orderid
GROUP BY productid,day

???

-bill!
___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] MySQL question: making stats out of orders

2009-12-01 Thread Bill Kendrick
On Tue, Dec 01, 2009 at 08:04:31PM -0600, Chanoch (Ken) Bloom wrote:
 Is this task getting more intensive?

I think so, but it's hard to tell, with other things going on
at the same time.


 If is (and the number of
 transactions per day hasn't grown significantly),

Site traffic has been increasing steadily.


 then you probably don't have indices set up correctly.

Highly likely.  Most of these tables were set up prior to my
taking over maintenance of all this, and any similar ones were
no doubt based on the ones the previous developer designed.


 I think that having an an index on orders.timestamp,
 and an index on orderitem.orderid (in addition to the primary keys
 products.id and orders.id) should optimize this query,

Ok, I'll look into that.  Thanks!


 but it always
 pays to determine what's going on with EXPLAIN SELECT.

Good idea.


 If that still doesn't speed things up enough, maybe consider using a
 table with the MERGE storage engine to move old entries from the
 orders table and the orderitem table into archives. Then only query
 the table full of recent order data for queries, and only query the
 other tables in the merge when you need to.

Products are virtual, so a buyer's access to them are determined
by the orders, so I'm not going to change how all that logic works. :)


  Now, we've got some statistics tables set up for other
  purposes -- how many times a product was viewed,
  as well as when any particular logged-in-users last
  visited particular items, which allows us to do the
  people who recently looked at X also recently looked at Y.
  
  I'm thinking we could do something similar for sales,
snip 

 I'm not really clear on what you're looking for here. We need a better
 idea of what stats you're running to know how to structure the table.
 
 It would seem pretty simple to create the analagous table for orders:
 
 orderstats consisting of productid, date, orders
 
 and every time you complete an order, you run the query:
 
 insert into orderstats(productid, date, orders) VALUES (###, NOW(),
 QTY) on duplicate key update orders=orders+QTY

Yes, that's great for today-on, but the moment I switch to using this
new way of recording purchases, there won't be any data.

That is, unless I start collecting data for the next 90 days BEFORE
changing how the top sellers are determined.

What I'm looking for is a way to populate a new 'order stats' table
based on the [last 90 days', or whatever, of] existing orders.


Thanks!

-bill!
___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] MySQL question: making stats out of orders

2009-12-01 Thread Bill Kendrick
On Wed, Dec 02, 2009 at 12:20:25AM -0600, Chanoch (Ken) Bloom wrote:
 INSERT into orderstats(productid, date, orders)
 SELECT
  orderitem.productid, orders.timestamp, SUM(orderitem.qty)
 FROM orderitem
   JOIN orders ON orders.id = orderitem.orderid
   WHERE orders.timestamp = { 90 days ago }
 group by orderitem.productid, orders.timestamp
 
 assuming orders.timestamp has one-day granularity

Timestamp is actually a timestamp (down the second).

-bill!
___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] MySQL question: making stats out of orders

2009-12-01 Thread Chanoch (Ken) Bloom
On Tue, 2009-12-01 at 22:33 -0800, Bill Kendrick wrote:
 On Wed, Dec 02, 2009 at 12:20:25AM -0600, Chanoch (Ken) Bloom wrote:
  INSERT into orderstats(productid, date, orders)
  SELECT
   orderitem.productid, orders.timestamp, SUM(orderitem.qty)
  FROM orderitem
JOIN orders ON orders.id = orderitem.orderid
WHERE orders.timestamp = { 90 days ago }
  group by orderitem.productid, orders.timestamp
  
  assuming orders.timestamp has one-day granularity
 
 Timestamp is actually a timestamp (down the second).

Then you have to throw in a conversion function that truncates it to the
date in order for the `group by` to work correctly. I'm not sure what
function that is, but you can play around with the functions listed at 
http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html
___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] MySQL question: making stats out of orders

2009-12-01 Thread Bill Broadley

Hrm, trying to figure out a non-proprietary question ask, that would be useful.

Ah, how about what is the ratio between:
 (unique orderitem.productid today)
 --
 orders today

Assuming it's high you could make it much easier to deal with by building a
table of:
date,orderitem.productid,qty

Then finding the most popular in the last 90 days, or today, or the hot item
that raised in the rankings the most today becomes easy.

I used this approach for handling a few 100k web hits/day and it allowed for
much more complex queries and still got my answers very quickly.  We logged
directly to Mysql and wanted to handle lots of complex queries of years of
logs.  We typically got our answers in a few seconds or less.
___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech