Hello all.

We're periodically scan HBase tables to aggregate statistic information,
and store it to MySQL.

We have 3 kinds of CP (kind of data source), each has one Channel and one
Article table.
(Channel : Article is 1:N relation.)

All CPs table schema are different a bit, so in order to aggregate we
should apply different logics, with joining Channel and Article.

I've thought about workflow like this, but I wonder it can make sense.

1. run single process which initializes MySQL by creating table, deleting
row, etc.
2. run 3 M/Rs simultaneously to aggregate statistic information for each
CP, and insert rows  per Channel to MySQL.
3. run single process which finalizes whole aggregation - runs aggregation
query from MySQL to insert new row to MySQL, rolling table, etc.

Definitely 1,2,3 should be run in a row.

Any helps are really appreciated!
Thanks.

Regards.
Jungtaek Lim (HeartSaVioR)

Reply via email to