The design really needs to look at other stack as well.
If the visualisation layer is going to use Tableau then you cannot use
Spark functional programming. Only Spark SQL or anything that works with
SQL like Hive or Phoenix.
Tableau is not a real time dashboard so for analytics it maps tables in
Careful Hbase with Phoenix is only in certain scenarios faster. When it is
about processing small amounts out of a bigger amount of data (depends on node
memory, the operation etc). Hive+tez+orc can be rather competitive, llap
makes sense for interactive ad-hoc queries that are rather similar.
yes Hive external table is partitioned on a daily basis (datestamp below)
CREATE EXTERNAL TABLE IF NOT EXISTS ${DATABASE}.externalMarketData (
KEY string
, SECURITY string
, TIMECREATED string
, PRICE float
)
COMMENT 'From prices Kakfa delivered by Flume location by day'
ROW FORMAT s
I do not see a rationale to have hbase in this scheme of thingsmay be I
am missing something?
If data is delivered in HDFS, why not just add partition to an existing
Hive table?
On Tue, Oct 18, 2016 at 8:23 AM, Mich Talebzadeh
wrote:
> Thanks Mike,
>
> My test csv data comes as
>
> UUID,
Thanks Mike,
My test csv data comes as
UUID, ticker, timecreated, price
a2c844ed-137f-4820-aa6e-c49739e46fa6, S01, 2016-10-17T22:02:09,
53.36665625650533484995
a912b65e-b6bc-41d4-9e10-d6a44ea1a2b0, S02, 2016-10-17T22:02:09,
86.31917515824627016510
Mitch,
Short answer… no, it doesn’t scale.
Longer answer…
You are using an UUID as the row key? Why? (My guess is that you want to
avoid hot spotting)
So you’re going to have to pull in all of the data… meaning a full table scan…
and then perform a sort order transformation, dropping the UU