I would try an array for that use case. From my experience in hbase for the execution time querying the same data, more rows > more columns > fewer columns. Also note that running the query in Phoenix it creates a plan every time, and the number of columns might matter there. Also the sqlline tool can create performance issues on its own - both with the fetch size and with the default output format of the data. Try using csv output and incremental fetching of rows.
> On Dec 27, 2016, at 8:53 AM, Josh Elser <[email protected]> wrote: > > Maybe you could separate some of the columns into separate column families so > you have some physical partitioning on disk? > > Whether you select one or many columns, you presently have to read through > each column on disk. > > AFAIK, there shouldn't really be an upper limit here (in terms of what will > execute). The price to pay would be relative to the data that has to be > inspected to answer your query. > > Arvind S wrote: >> Setup .. >> hbase (1.1.2.2.4) cluster on azure with 1 Region server. (8core 28 gb >> ram ..~16gb RS heap) >> phoenix .. 4.4 >> >> Observation .. >> created a table with 3 col composite PK and 3600 float type columns (1 >> per sec). >> loaded with <5000 lines of data (<100 MB compressed snappy & fast diff >> encoding) >> >> On performing "select * " or select with individually naming each of >> these 3600 columns the query takes around 2+ mins to just return a few >> lines (limit 2,10 etc). >> >> Subsequently on selecting lesser number of columns the performance seems >> to improve. >> >> is it an anti-pattern to have large number of columns in phoenix tables? >> >> *Cheers !!* >> Arvind
