Re: Indexes

James Taylor Mon, 03 Feb 2014 09:19:31 -0800

Sometime this month.  Thanks,


On Mon, Feb 3, 2014 at 9:14 AM, Justin Workman <[email protected]>wrote:

> Thanks. Is there an ETA on the 3.0 release?
>
>
> On Mon, Feb 3, 2014 at 9:52 AM, James Taylor <[email protected]>wrote:
>
>> There will be an upgrade step required to go from 2.x to 3.0, as the
>> system table has changed (and probably will a bit more still before we
>> release).
>>
>> For now, you can do the following if you want to test out 3.0.0-snapshot:
>> - Remove com.salesforce.* coprocessors on existing tables. If you haven't
>> added any of your own, probably easiest to just remove all coprocessors.
>> - Re-issue your DDL commands. If you have existing data against that
>> table, it'd be best to open a connection at a timestamp earlier than any of
>> your data using the CURRENT_SCN connection property. If you don't care
>> about doing point-in-time queries at an earlier timestamp (or flash-back
>> queries), than you don't need to worry about this, and you can just
>> re-issue the DDL.
>>
>> Thanks,
>> James
>>
>>
>> On Mon, Feb 3, 2014 at 8:40 AM, Justin Workman 
>> <[email protected]>wrote:
>>
>>> We updated to the 3.0.0-SNAPSHOT in an effort to also test the Flume
>>> component, and we are not able to query any of our existing tables now
>>> through sqlline or a java jdbc connection. However the Flume component
>>> works fine writing to a new table. Here is the error we are getting when
>>> doing a select count(1) from keywords;
>>>
>>> Error: org.apache.hadoop.hbase.DoNotRetryIOException: keywords: at index
>>> 4
>>> at
>>> com.salesforce.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:83)
>>>  at
>>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:1034)
>>> at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
>>>  at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>  at org.apache.hadoop.hbase.regionserver.HRegion.exec(HRegion.java:5482)
>>> at
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.execCoprocessor(HRegionServer.java:3720)
>>>  at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>  at java.lang.reflect.Method.invoke(Method.java:606)
>>> at
>>> org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:308)
>>>  at
>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
>>> Caused by: java.lang.NullPointerException: at index 4
>>> at
>>> com.google.common.collect.ImmutableList.checkElementNotNull(ImmutableList.java:305)
>>>  at
>>> com.google.common.collect.ImmutableList.construct(ImmutableList.java:296)
>>> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:272)
>>>  at com.salesforce.phoenix.schema.PTableImpl.init(PTableImpl.java:290)
>>> at com.salesforce.phoenix.schema.PTableImpl.<init>(PTableImpl.java:219)
>>>  at
>>> com.salesforce.phoenix.schema.PTableImpl.makePTable(PTableImpl.java:212)
>>> at
>>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:436)
>>>  at
>>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:254)
>>> at
>>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:1082)
>>>  at
>>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.addIndexToTable(MetaDataEndpointImpl.java:279)
>>> at
>>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:430)
>>>  at
>>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:254)
>>> at
>>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:1082)
>>>  at
>>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:1028)
>>> ... 10 more (state=08000,code=101)
>>>
>>>
>>> On Thu, Jan 30, 2014 at 4:01 PM, Justin Workman <
>>> [email protected]> wrote:
>>>
>>>> I will test with the latest master build. When this table goes live we
>>>> will shorten the cf name, that was a mistake. Thanks for all the info. I do
>>>> think going forward we will be creating these tables via Phoenix. We are
>>>> still testing the flume sink and pig handlers before completely committing.
>>>>
>>>> I'll update the list once I've had a chance to test with the latest
>>>> build and file a Jira if the problem persists.
>>>>
>>>> Thanks!
>>>> Justin
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Jan 30, 2014, at 1:25 PM, James Taylor <[email protected]>
>>>> wrote:
>>>>
>>>> Thanks for all the detail, Justin. Based on this, it looks like a bug
>>>> related to using case sensitive column names. Maryann checked in a fix
>>>> related to this, so it might be fixed in the latest on master.
>>>>
>>>> If it's not fixed, would you mind filing a JIRA?
>>>>
>>>> FWIW, you may want to consider a shorter column family name, like "k"
>>>> or "kw" as that'll make your table smaller. Also, did you know you can
>>>> provide your HBase table and column family config parameters in your CREATE
>>>> TABLE statement and it'll create the HBase table and the column families,
>>>> like below?
>>>>
>>>> CREATE TABLE SEO.KEYWORDIDEAS (
>>>>     "pk" VARCHAR PRIMARY KEY,
>>>>     "keyword"."jobId" VARCHAR,
>>>>     "keyword"."jobName" VARCHAR,
>>>>     "keyword"."jobType" VARCHAR,
>>>>     "keyword"."keywordText" VARCHAR,
>>>>     "keyword"."parentKeywordText" VARCHAR,
>>>>     "keyword"."refinementName" VARCHAR,
>>>>     "keyword"."refinementValue" VARCHAR,
>>>>     "keyword"."relatedKeywordRank" VARCHAR
>>>>     ) IMMUTABLE_ROWS=true, COMPRESSION='SNAPPY' ;
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jan 30, 2014 at 8:50 AM, Justin Workman <
>>>> [email protected]> wrote:
>>>>
>>>>> I don't think that is the issue we are hitting. Details are below. The
>>>>> Hbase table does have more columns than we are defining the phoenix table.
>>>>> We were hoping to just be able to use the dynamic column features for
>>>>> if/when we need to access data in other columns in the underlying table. 
>>>>> As
>>>>> you can see from the output of the explain statement below, a simple query
>>>>> does not use the index.
>>>>>
>>>>> However if I create another identical table using Phoenix and upsert
>>>>> into that new table from the table below, create the same index on that
>>>>> table and then run the same select query, it does use the index on that
>>>>> table.
>>>>>
>>>>> So I am still very confused as to why the index is not invoked when
>>>>> the table is created on top of an existing Hbase table.
>>>>>
>>>>> Hbase Create Table
>>>>> > create 'SEO.KEYWORDIDEAS', { NAME=>'keyword', COMPRESSION=>'SNAPPY' }
>>>>>
>>>>> Phoenix Create Table
>>>>> CREATE TABLE SEO.KEYWORDIDEAS (
>>>>>     "pk" VARCHAR PRIMARY KEY,
>>>>>     "keyword"."jobId" VARCHAR,
>>>>>     "keyword"."jobName" VARCHAR,
>>>>>     "keyword"."jobType" VARCHAR,
>>>>>     "keyword"."keywordText" VARCHAR,
>>>>>     "keyword"."parentKeywordText" VARCHAR,
>>>>>     "keyword"."refinementName" VARCHAR,
>>>>>     "keyword"."refinementValue" VARCHAR,
>>>>>     "keyword"."relatedKeywordRank" VARCHAR
>>>>>     ) IMMUTABLE_ROWS=true;
>>>>>
>>>>> Create Index
>>>>> CREATE INDEX KWDIDX ON SEO.KEYWORDIDEAS ("parentKeywordText");
>>>>>
>>>>> Show and count indexes
>>>>>
>>>>> +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+-------------+--------+------------------+-----------+-----------+
>>>>> | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | NON_UNIQUE | INDEX_QUALIFIER
>>>>> | INDEX_NAME | TYPE | ORDINAL_POSITION | COLUMN_NAME | ASC_OR_DESC |
>>>>> CARDINALITY | PAGES  | FILTER_CONDITION | DATA_TYPE | TYPE_NAME |
>>>>>
>>>>> +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+-------------+--------+------------------+-----------+-----------+
>>>>> | null      | SEO         | KEYWORDIDEAS | true       | null
>>>>>  | KWDIDX     | 3    | 1                | keyword:parentKeywordText | A
>>>>>       | null        | null   | null             | |
>>>>> | null      | SEO         | KEYWORDIDEAS | true       | null
>>>>>  | KWDIDX     | 3    | 2                | :pk         | A           | null
>>>>>        | null   | null             | 12        | V |
>>>>> | null      | SEO         | KEYWORDIDEAS | true       | null
>>>>>  | RA_TEST_ID | 3    | 1                | keyword:jobId | A           |
>>>>> null        | null   | null             | 12        | |
>>>>> | null      | SEO         | KEYWORDIDEAS | true       | null
>>>>>  | RA_TEST_ID | 3    | 2                | :pk         | A           | null
>>>>>        | null   | null             | 12        | V |
>>>>>
>>>>> +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+-------------+--------+------------------+-----------+-----------+
>>>>>
>>>>> > select count(1) from seo.keywordideas;
>>>>> +----------+
>>>>> | COUNT(1) |
>>>>> +----------+
>>>>> | 423229   |
>>>>> +----------+
>>>>> > select count(1) from seo.kwdidx;
>>>>> +----------+
>>>>> | COUNT(1) |
>>>>> +----------+
>>>>> | 423229   |
>>>>> +----------+
>>>>>
>>>>> > explain select count(1) from seo.keywords where "parentKeywordText"
>>>>> = 'table';
>>>>> +------------+
>>>>> |    PLAN    |
>>>>> +------------+
>>>>> | CLIENT PARALLEL 18-WAY FULL SCAN OVER SEO.KEYWORDIDEAS |
>>>>> |     SERVER FILTER BY keyword.parentKeywordText = 'sheets' |
>>>>> |     SERVER AGGREGATE INTO SINGLE ROW |
>>>>> +------------+
>>>>>
>>>>> Now here is where I can get the index to be invoked.
>>>>> > CREATE TABLE SEO.NEW_KEYWORDIDEAS (
>>>>>     PK VARCHAR PRIMARY KEY,
>>>>>     JOB_ID VARCHAR
>>>>>     JOB_NAME VARCHAR,
>>>>>     JOB_TYPE VARCHAR,
>>>>>     KEYWORD_TEXT VARCHAR,
>>>>>     PARENT_KEYWORD_TEXT VARCHAR,
>>>>>     REFINEMENT_NAME VARCHAR,
>>>>>     REFINEMENT_VALUE VARCHAR,
>>>>>     RELATED_KEYWORD_RANK VARCHAR
>>>>>     ) IMMUTABLE_ROWS=true;
>>>>>
>>>>> > UPSERT INTO SEO.NEW_KEYWORDIEAS SELECT * FROM SEO.KEYWORDIDEAS;
>>>>>
>>>>> > CREATE INDEX NEW_KWD_IDX ON SEO.NEW_KEYWORDIDEAS
>>>>> (PARENT_KEYWORD_TEXT);
>>>>>
>>>>> > explain select count(1) from seo.new_keywordideas where
>>>>> parent_keyword_text = 'table';
>>>>>
>>>>> +------------+
>>>>>
>>>>> |    PLAN    |
>>>>>
>>>>> +------------+
>>>>>
>>>>> | CLIENT PARALLEL 1-WAY RANGE SCAN OVER SEO.NEW_KWD_IDX ['table'] |
>>>>>
>>>>> |     SERVER AGGREGATE INTO SINGLE ROW |
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 29, 2014 at 5:21 PM, James Taylor 
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> Hi Justin,
>>>>>> Please take a look at this FAQ:
>>>>>> http://phoenix.incubator.apache.org/faq.html#/Why_isnnullt_my_secondary_index_being_used
>>>>>>
>>>>>> If that's not the case for you, can you include your CREATE TABLE,
>>>>>> CREATE INDEX, SELECT statement, and EXPLAIN plan?
>>>>>>
>>>>>> Thanks,
>>>>>> James
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 29, 2014 at 4:13 PM, Justin Workman <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> I am seeing some odd behavior with indexes and want some
>>>>>>> clarification on how they are used.
>>>>>>>
>>>>>>> When I create an table in phoenix on top of an existing Hbase table,
>>>>>>> and then create an index on this table, I can see the index get built 
>>>>>>> and
>>>>>>> populated properly, however no queries show that they are using this 
>>>>>>> index
>>>>>>> when I run an explain on the query.
>>>>>>>
>>>>>>> However, if I create an seperete table in Phoenix and do an upsert
>>>>>>> from my hbase table into the new table that I created, and create the 
>>>>>>> same
>>>>>>> index as on the previous table. Then my queries show that they would use
>>>>>>> the index when running them through the explain plan.
>>>>>>>
>>>>>>> Are we not able to create or use an index on a table we create over
>>>>>>> an exiting Hbase table, or am I doing something wrong?
>>>>>>>
>>>>>>> Thanks in advance for any help.
>>>>>>> Justin
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Indexes

Reply via email to