Re: count on large table

Abe Weinograd Mon, 13 Oct 2014 09:32:51 -0700

Hi Lars,

Thanks for following up.


Table Size - 120G doing a du on HDFS.  We are using Snappy compression on
the table.
Column Family - We have 1 column family for all columns and are using the
Phoenix default one.
Regions - right now we have a ton of regions (250) because we pre split to
help out bulk loads.  I haven't collapsed them yet, but in a DEV
environment that is configured the same way, we have ~50 regions and
experience the same performance issues.  I am planning on squaring this
away and trying again.
Resource Utilization - Really high CPU usage on the region servers and
noticing a spike in IO too.

Based on your questions and what I know, the # of regions needs to be
compacted first, though I am not sure this is going to solve my issue.  the
data nodes in HDFS have 3 1TB disks so I am not convinced that my IO is the
bottleneck here.

Thanks,
Abe

On Thu, Oct 9, 2014 at 8:36 PM, lars hofhansl <[email protected]> wrote:

> Hi Abe,
>
> this is interesting.
>
> How big are your rows (i.e. how much data is in the table, you tell with
> du in HDFS)? And how many columns do you have? Any column families?
> How many regions are in this table? (you can tell that through the HBase
> HMaster UI page)
> When you execute the query, are all HBase region servers busy? Do you see
> IO, or just high CPU?
>
> Client batching won't help with an aggregate (such as count) where not
> much data is transferred back to the client.
>
> Thanks.
>
> -- Lars
>
>   ------------------------------
>  *From:* Abe Weinograd <[email protected]>
> *To:* user <[email protected]>
> *Sent:* Wednesday, October 8, 2014 9:15 AM
> *Subject:* Re: count on large table
>
> Good point.  I have to figure out how to do that in a SQL Tool like
> Squirrel or workbench.
>
> Is there any obvious thing i can do to help tune this?  I know that's a
> loaded question.  My client scanner batches are 1000 (also tried 10000 with
> no luck).
>
> Thanks,
> Abe
>
>
>
> On Tue, Oct 7, 2014 at 9:09 PM, [email protected] <
> [email protected]> wrote:
>
> Hi, Abe
> Maybe setting the following property would help...
> <property>
>     <name>phoenix.query.timeoutMs</name>
>     <value>3600000</value>
> </property>
>
> Thanks,
> Sun
>
> ------------------------------
> ------------------------------
>
> *From:* Abe Weinograd <[email protected]>
> *Date:* 2014-10-08 04:34
> *To:* user <[email protected]>
> *Subject:* count on large table
> I have a table with 1B  rows.  I know this can is very specific to my
> environment, but just doing a SELECT COUNT(1) on the table   It never
> finished.
>
> We have a 10 node cluster with the RS's Heap size at 26GiB and skewed
> towards the block cache.  In the RS logs, i see a lot of these:
>
> 2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"
> 10.10.0.10:44791
> ","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"}
>
> They stop eventually, but i the query times out and the query tool
> reports: org.apache.phoenix.exception.PhoenixIOException: 187541ms passed
> since the last invocation, timeout is currently set to 60000
>
> Any ideas of where I can start in order to figure this out?
>
> using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1)
>
> Thanks,
> Abe
>
>
>
>
>

Re: count on large table

Reply via email to