I would try an array for that use case. From my experience in hbase for the 
execution time querying the same data, more rows > more columns > fewer 
columns. Also note that running the query in Phoenix it creates a plan every 
time, and the number of columns might matter there. Also the sqlline tool can 
create performance issues on its own - both with the fetch size and with the 
default output format of the data. Try using csv output and incremental 
fetching of rows.

> On Dec 27, 2016, at 8:53 AM, Josh Elser <[email protected]> wrote:
> 
> Maybe you could separate some of the columns into separate column families so 
> you have some physical partitioning on disk?
> 
> Whether you select one or many columns, you presently have to read through 
> each column on disk.
> 
> AFAIK, there shouldn't really be an upper limit here (in terms of what will 
> execute). The price to pay would be relative to the data that has to be 
> inspected to answer your query.
> 
> Arvind S wrote:
>> Setup ..
>> hbase (1.1.2.2.4) cluster on azure with 1 Region server. (8core 28 gb
>> ram ..~16gb RS heap)
>> phoenix .. 4.4
>> 
>> Observation ..
>> created a table with 3 col composite PK and 3600 float type columns (1
>> per sec).
>> loaded with <5000 lines of data (<100 MB compressed snappy & fast diff
>> encoding)
>> 
>> On performing "select * " or select with individually naming each of
>> these 3600 columns the query takes around 2+ mins to just return a few
>> lines (limit 2,10 etc).
>> 
>> Subsequently on selecting lesser number of columns the performance seems
>> to improve.
>> 
>> is it an anti-pattern to have large number of columns in phoenix tables?
>> 
>> *Cheers !!*
>> Arvind

Reply via email to