[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

Junping Du (JIRA) Thu, 16 Apr 2015 21:08:29 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498924#comment-14498924
 ]


Junping Du commented on YARN-3134:
----------------------------------

Thanks [~gtCarrera9] for delivering a patch here! 
Just start to look at the patch. Some initiative comments so far:

{code}
+      String sql = "CREATE TABLE IF NOT EXISTS " + ENTITY_TABLE_NAME
+          + "(cluster VARCHAR NOT NULL, user VARCHAR NOT NULL, "
+          + "flow VARCHAR NOT NULL, run UNSIGNED_LONG NOT NULL, "
+          + "appid VARCHAR NOT NULL, type VARCHAR NOT NULL, "
+          + "entityid VARCHAR NOT NULL, "
+          + "creationtime UNSIGNED_LONG, modifiedtime UNSIGNED_LONG, "
...
+      stmt.executeUpdate(sql);
+      stmt.close();
+      conn.commit();
{code}
Putting raw SQL sentences in this way sounds a little headache to me as this 
means difficult to debug/maintain in future. Given we could have more tables in 
pipeline, we may want to refactor this in some way to be more maintainable? 
BTW, I don't think HBase support any atomic operation across multiple tables. 
Here we create 3 tables but only one commit which means if 2nd table created 
failed, 1st table should still be created and commit successfully and won't be 
rollback. These partial success after commit doesn't sounds a good practice to 
me.
Additional problem is we didn't close connection here but we need to.

{code}
+  private class TimelineEntityCtxt {
{code}
TimelineEntityCtxt => TimelineEntityContext, better not omit full name (except 
very obviously, like conf for configuration) in method name. It looks like 
exactly the same with TimelineCollectorContext.java. Can we reuse that class 
instead of creating a new duplicated one?

{code}
+  private <K, V> int setStringsForCf(
{code}
What Cf means? Just like I mentioned above, don't omit the character of a word 
in a method which break code's readability.

{code}
+  private int setStringsForPk(PreparedStatement ps, String clusterId, String 
userId,
{code}
setStringsForPk => setStringsForPrimaryKeys

{code}
+  ResultSet executeQuery(String sql) {
+    ResultSet rs = null;
+    try {
+      Statement stmt = conn.createStatement();
+      rs = stmt.executeQuery(sql);
+    } catch (SQLException se) {
+      LOG.error("SQL exception! " + se.getLocalizedMessage());
+    }
+    return rs;
+  }
{code}
Does getLocalizedMessage contains enough info (at least the SQL sentences 
executed)? If not, I would prefer we add raw SQL sentences in error message 
when Exception get throw. 

{code}
+    // Execute and close
+    psConfigInfo.execute();
+    psConfigInfo.close();
{code}
Many places like here that we are forgetting to put closable resources to 
finally block. We should close it even exception get throw. 

More comments could comes later.

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3134
>                 URL: https://issues.apache.org/jira/browse/YARN-3134
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Li Lu
>         Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
> YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf
>
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

Reply via email to