[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

Zhijie Shen (JIRA) Fri, 24 Apr 2015 17:18:07 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512080#comment-14512080
 ]


Zhijie Shen commented on YARN-3134:
-----------------------------------

I'd like to raise another important issue. According the schema, 
config/info/metrics value are written strings. However, the current data model 
assume that they are objects. Therefore they can be Integer, Long and even 
nested structure that can be passed by jackson ObjectMapper. If we write them 
as strings, how can we read them back and convert them into the corresponding 
objects? Shall we write them as bytes[] instead?

On the other side, I'm not sure we can narrow down config/info/metrics value to 
String only, as we previously allow users to put Integer/Long/Float/Double and 
so on directly into the entity instance.

For the metrics, I even think it shouldn't be String, but usually the decimal 
object. Otherwise, I'm not sure how should we do aggregation upon string values.

bq.  I'm concerned about is the TimelineCollector's dependency on 
TimelineCollectorManager just to get the writer.


To Sangjin's question, I suggest we can change the way to let collect manager 
set the collector writer, and Collector doesn't need to have manager insider, 
but have a setWriter for manage to call.

Some other comments about the patch detail?

1. Should we make it configurable?
{code}
static final String CONN_STRING = "jdbc:phoenix:localhost:2181:/hbase";
{code}

2. putEntities are invoked by multiple threads. However, HashMap should not be 
thread safe.
{code}
private HashMap<Thread, Connection> connectionMap = null;
{code}

3. When stopping the writer, should we wait until the current outstanding 
writing gets finished?
{code}
100       @Override
101       protected void serviceStop() throws Exception {
102         // Close all Phoenix connections
103         for (Connection conn : connectionMap.values()) {
104           try {
105             conn.close();
{code}

4. Shall we set auto commit to false, because we commit the batch? Or it's not 
necessary for Phoenix?
{code}
128         Connection conn = getConnection(Thread.currentThread());
{code}

5. So in the backend, we don't put flow name and version in different columns?
 {code}
217               + "flow VARCHAR NOT NULL, run UNSIGNED_LONG NOT NULL, "
{code}
{code}
281         ps.setString(idx++, context.getFlowName() + 
context.getFlowVersion());
{code}

6. The conn seems not to be closed and removed after finishing the writes.  It 
may use a lot of necessary resource after running for a long time.

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3134
>                 URL: https://issues.apache.org/jira/browse/YARN-3134
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Li Lu
>         Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
> YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf
>
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

Reply via email to