RE: Help tuning for bursts of high traffic?

Riesland, Zack Thu, 10 Dec 2015 12:16:21 -0800

Thanks Jain,

The 3800 seconds was just for the executeQUery()

I've seen it as high as 5900
________________________________________
From: Samarth Jain [[email protected]]
Sent: Thursday, December 10, 2015 2:31 PM
To: [email protected]
Subject: Re: Help tuning for bursts of high traffic?

Thanks for the additional information, Zack. Looking at the numbers it looks 
like the bottle-neck is probably not coming from the phoenix thread pool.

For request level metrics:

TASK_QUEUE_WAIT_TIME - represents the length of time (wall clock)  phoenix 
scans had to wait in the thread pool's queue before they were picked up for 
execution. A higher value of this would mean that you likely need to increase 
your thread pool size.

TASK_EXECUTION_TIME - represents the time taken by scans to complete.

TASK_END_TO_END_TIME - is roughly task_queue_wait_time + task_execution_time

Did you get 3800ms for stmt.executeQuery() itself or did that time include time 
spent in retrieving records via resultSet.next() too?

On Thu, Dec 10, 2015 at 7:38 AM, Riesland, Zack 
<[email protected]<mailto:[email protected]>> wrote:
Thanks,

I did some experimenting.

Now, anytime I get a query that lasts longer than 500 ms, I call 
getRequestReadMetrics() and print everything.

The output suggests to me that Phoenix thinks that the query is still 
reasonably fast:

TASK_QUEUE_WAIT_TIME: 0
TASK_EXECUTION_TIME: 80
TASK_END_TO_END_TIME: 80

Should I read this as: “The query did not have to wait at all and took 80 ms to 
execute” ?

Comparing System.currentTimeMillis() immediately before and after calling 
stmt.executeQuery() is about 3,800 ms for this particular example.

The other (non-zero) counters, from getGlobalPhoenixClientMetrics() at this 
time, are:

QUERY_TIME: 347802
NUM_PARALLEL_SCANS: 5638
SCAN_BYTES: 3074163200<tel:3074163200>
TASK_QUEUE_WAIT_TIME: 12129
TASK_END_TO_END_TIME: 851524
TASK_EXECUTION_TIME: 839347
MUTATION_SQL_COUNTER: 93
SELECT_SQL_COUNTER: 5610 (this is the number of  total queries on this 
connection)

Can you please help me understand how to interpret all this and possibly 
understand how to configure this scenario to make it faster?

Right now, I’m only averaging about 5 queries/second, even though I’m querying 
by the primary key.

Before I upgraded, I was getting a lot closer to 100.

Thanks!

From: Samarth Jain [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, December 09, 2015 1:59 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Help tuning for bursts of high traffic?

Zack,

These stats are collected continuously and at the global client level. So 
collecting them only when the query takes more than 1 second won't work. A 
better alternative for you would be to report stats at a request level. You 
could then conditionally report the metrics for queries that exceed 1 second 
execution. The metric you want to make that decision on is WALL_CLOCK_TIME_MS.

Request level metric collection isn't enabled by default. You can turn it 
on/off for a PhoenixConnection by doing the following:

Properties props = new Properties();
props.put(QueryServices.COLLECT_REQUEST_LEVEL_METRICS, String.valueOf(true));
Connection conn = DriverManager.getConnection(url, props);

Then see PhoenixRuntime.getRequestReadMetrics() on how to get hold of these 
metrics.

I have a JIRA filed - https://issues.apache.org/jira/browse/PHOENIX-2486 for 
adding the documentation for these metrics. Will get to it shortly.

On Wed, Dec 9, 2015 at 10:38 AM, Riesland, Zack 
<[email protected]<mailto:[email protected]>> wrote:
Thanks a lot, James.

And now that I’m using 4.6, I have access to some statistics.

I added code to collect stats whenever a query takes more than 1 second.

The first time this happened was after about 51,000 queries, and the output is 
below, for whatever it’s worth.

Can you point me to any documentation to describe these values, or to utilize 
this information?

I have a hunch that SCAN_BYTES and TASK_QUEUE_WAIT_TIME are the interesting 
numbers here?

PHOENIX STATS:
Batch sizes of mutations | MUTATION_BATCH_SIZE: 0, Number of samples: 0 | 0
Size of mutations in bytes | MUTATION_BYTES: 0, Number of samples: 0 | 0
Time it took to commit mutations | MUTATION_COMMIT_TIME: 0, Number of samples: 
0 | 0
Query times | QUERY_TIME: 2424030, Number of samples: 51555 | 2424030
Number of scans that were executed in parallel | NUM_PARALLEL_SCANS: 51769, 
Number of samples: 51554 | 51769
Number of bytes read by scans | SCAN_BYTES: 28193431552, Number of samples: 
220261184 | 28193431552
Size of spool files created in bytes | SPOOL_FILE_SIZE: 0, Number of samples: 0 
| 0
Number of bytes allocated by the memory manager | MEMORY_CHUNK_BYTES: 0, Number 
of samples: 0 | 0
Number of milliseconds threads needed to wait for memory to be allocated 
through memory manager | MEMORY_WAIT_TIME: 0, Number of samples: 0 | 0
Time in milliseconds tasks had to wait in the queue of the thread pool executor 
| TASK_QUEUE_WAIT_TIME: 114349, Number of samples: 2235211 | 114349
Time in milliseconds spent by tasks from creation to completion | 
TASK_END_TO_END_TIME: 6030797, Number of samples: 2235211 | 6030797
Time in milliseconds tasks took to execute | TASK_EXECUTION_TIME: 5916067, 
Number of samples: 2235211 | 5916067
Counter for number of mutation sql statements | MUTATION_SQL_COUNTER: 93, 
Number of samples: 93 | 93
Counter for number of sql queries | SELECT_SQL_COUNTER: 51555, Number of 
samples: 51555 | 51555
Counter for number of tasks submitted to the thread pool executor | 
TASK_EXECUTED_COUNTER: 2235211, Number of samples: 2235211 | 2235211
Counter for number of tasks that were rejected by the thread pool executor | 
TASK_REJECTED_COUNTER: 0, Number of samples: 0 | 0
Number of times query timed out | QUERY_TIMEOUT_COUNTER: 0, Number of samples: 
0 | 0
Number of times query failed | QUERY_FAILED_COUNTER: 0, Number of samples: 0 | 0
Number of spool files created | SPOOL_FILE_COUNTER: 0, Number of samples: 0 | 0

From: James Taylor 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Tuesday, December 08, 2015 7:10 PM

To: user
Subject: Re: Help tuning for bursts of high traffic?

Thanks for clarifying, Zack. For profiling on your client machine, you can just 
use VisualVM as it's built into the JDK 
(https://dzone.com/articles/best-kept-secret-jdk-visualvm).

    James

On Tue, Dec 8, 2015 at 6:31 AM, Riesland, Zack 
<[email protected]<mailto:[email protected]>> wrote:
Thanks for your reply, James.

I apologize if I’m not being very clear.

To clarify, NO DATA is being modified (upserted or deleted) in HBase/Phoenix 
during this use case. Only queries during the day time.

Throughout the day, different files are consumed by this java process 
(performing the queries), and resulting files are generated.

Then, at night, long after this code is finished running, a series of batch 
processes take these files and eventually ingest their data into Hive. THEN, 
the Hive data is sent to Phoenix via CsvBulkUpload.

These processes are completely independent and do not run at the same time. 
Data in Phoenix is ONLY modified at night.

Finally, yes, the pattern that I’m seeing regarding CPU utilization and queries 
taking minutes instead of milliseconds is on a  CLIENT machine – NOT on a data 
node. My theory is that the Phoenix client is somehow getting behind or doing 
garbage collection in a way that causes the queries to get backed up.

Some comments (replies inline):
- 4.2.2 is a pretty old release at this point - we've up to 4.6.0 as our last 
stable release. I'd recommend upgrading.
--> Agreed. I’m going to follow the instructions here and try to upgrade to 
4.6.0 today: https://phoenix.apache.org/upgrading.html

- If the problem is on the client-side, you should be able to profile the Java 
application during the period of slowness and get us the details we need to 
diagnose this.
--> Do you have a recommendation for a tool that I should go learn about for 
this?

- Phoenix executes a single query using many threads already, so your ~2000 
thread calculations wouldn't be accurate. For a little bit more info on how 
Phoenix parallelized queries, see 
https://phoenix.apache.org/tuning.html#Parallelization. At a minimum, Phoenix 
will execute one thread per region involved in your query. You can control the 
level of parallelization through the phoenix.stats.guidepost.width config 
parameter. If you want to minimize the parallelization that Phoenix does, you 
can set this config parameter to a very larger value in the server side 
hbase-site.xml. The default is 104857600 (or 10MB). If you set it to your 
MAX_FILESIZE (the size you allow a region to grow to before it splits - default 
20GB), then you're essentially disabling it. You could also try increasing it 
somewhere in between to maybe 5 or 10GB.
--> Thank you. This is helpful. I will experiment with this.

- Running 8 simultaneous instances on the same client will overload that 
machine. Run them on separate client machines - but start with a single one on 
one machine.
--> I only have one machine for this (for now), but it is a very beefy machine 
(8 very fast CPUs and almost 1 TB of RAM). I need to perform this input/dedup 
operation for about 500 files each day (one per customer), and each file will 
require 5,000-100,000 queries, depending on the size of the customer. Supposing 
an average of 10K queries/customer, that’s about 5 million queries per day. At 
10ms per query, that’s about 14 hours, which I need to improve considerably, 
especially as our data grows. Based on some experimenting, I’ve found that I 
can get an overall average much closer to 2-3ms / query if I run several 
instances in parallel. However, this is when I start to encounter the thread 
limit issue. So I’ll continue to experiment with this and I appreciate any 
feedback or recommendations this community can provide.

Thanks!

From: James Taylor 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Monday, December 07, 2015 6:34 PM

To: user
Subject: Re: Help tuning for bursts of high traffic?

Thanks for the additional information, Zack. I'd like to confirm I understand 
how you're using Phoenix:
- Data is not being upserted at the same time it is being queried. But I'm 
confused by your statement of "and THEN some of the Hive data is send to 
HBase/Phoenix". How is the data being sent? How much data? Is querying 
occurring while this data is being ingested?
- Querying happens *after* any data ingest. "So, per-widget, I perform a query 
against Phoenix". Are you determining duplicates through these queries and then 
issuing a Phoenix DELETE of the duplicates?
- The CPUs you mention being pegged are the CPUs on the *client* machine: "I 
see that, periodically – maybe every 60 or 90 seconds – all of my CPUs (there 
are 8 on this machine) go from mildly busy to almost totally pegged".

If you could confirm the above, that'd be super helpful. I can't think of any 
period process in Phoenix that runs every 60 or 90 seconds.

Some comments:
- 4.2.2 is a pretty old release at this point - we've up to 4.6.0 as our last 
stable release. I'd recommend upgrading.
- If the problem is on the client-side, you should be able to profile the Java 
application during the period of slowness and get us the details we need to 
diagnose this.
- Phoenix executes a single query using many threads already, so your ~2000 
thread calculations wouldn't be accurate. For a little bit more info on how 
Phoenix parallelized queries, see 
https://phoenix.apache.org/tuning.html#Parallelization. At a minimum, Phoenix 
will execute one thread per region involved in your query. You can control the 
level of parallelization through the phoenix.stats.guidepost.width config 
parameter. If you want to minimize the parallelization that Phoenix does, you 
can set this config parameter to a very larger value in the server side 
hbase-site.xml. The default is 104857600 (or 10MB). If you set it to your 
MAX_FILESIZE (the size you allow a region to grow to before it splits - default 
20GB), then you're essentially disabling it. You could also try increasing it 
somewhere in between to maybe 5 or 10GB.
- Running 8 simultaneous instances on the same client will overload that 
machine. Run them on separate client machines - but start with a single one on 
one machine.

On Mon, Dec 7, 2015 at 8:32 AM, Riesland, Zack 
<[email protected]<mailto:[email protected]>> wrote:
Also, and somewhat related:

I’m trying to running 8 simultaneous instances of this code (on 8 separate 
input files), since I have 8 CPUs on the machine.

When I try this, I get java.lang.RuntimeException: java.lang.OutOfMemory: 
unable to create a new native thread

My phoenix connection has the “phoenix.query.threadPoolSize” set to “256”, 
which should result in 8x256 = ~2,000 threads being spawned by Phoenix.

Is that correct?

I’m running RH Linux 6, with my ulimit set to “unlimited”, so I should be able 
to handle thousands of threads.

Any ideas?

From: Andrew Purtell 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Friday, December 04, 2015 4:24 PM
To: [email protected]<mailto:[email protected]>
Cc: Haisty, Geoffrey
Subject: Re: Help tuning for bursts of high traffic?

Any chance of stack dumps from the debug servlet? Impossible to get anywhere 
with 'pegged the CPU' otherwise. Thanks.

On Dec 4, 2015, at 12:20 PM, Riesland, Zack 
<[email protected]<mailto:[email protected]>> wrote:
James,

2 quick followups, for whatever they’re worth:

1 – There is nothing phoenix-related in /tmp

2 – I added a ton of logging, and played with the properties a bit, and I think 
I see a pattern:

Watching the logging and the system profiler side-by-side, I see that, 
periodically – maybe every 60 or 90 seconds – all of my CPUs (there are 8 on 
this machine) go from mildly busy to almost totally pegged.

They USUALLY stay pegged for 5-10 seconds, and then calm down.

However, occasionally, they stay pegged for around a minute. When this happens, 
I get the very slow queries. I added logic so that when I get a very slow 
response (> 1 second), I pause for 30 seconds.

This ‘fixes’ everything, in the sense that I’m usually able to get a couple 
thousand good queries before the whole pattern repeats.

For reference, there’s nothing external that should be causing those CPU 
spikes, so I’m guessing that it’s maybe java GC (?) or perhaps something that 
the phoenix client is doing ?

Can you guess at what Phoenix might do periodically that would peg the CPUs – 
and in such a way that a query has to wait as much as 2 minutes to execute (I’m 
guessing from the pattern that it’s not actually the query that is slow, but a 
very long between when it gets queued and when it actually gets executed).

Oh and the methods you mentioned aren’t in my version of PhoenixRuntime, 
evidently. I’m on 4.2.2.something.

Thanks for any further feedback you can provide on this. Hopefully the 
conversation is helpful to the whole Phoenix community.

From: Riesland, Zack
Sent: Friday, December 04, 2015 1:36 PM
To: [email protected]<mailto:[email protected]>
Cc: [email protected]<mailto:[email protected]>
Subject: RE: Help tuning for bursts of high traffic?

Thanks, James

I'll work on gathering more information.

In the meantime, answers to a few of your questions inline below just narrow 
the scope a bit:

________________________________
From: James Taylor [[email protected]<mailto:[email protected]>]
Sent: Friday, December 04, 2015 12:21 PM
To: user
Subject: Re: Help tuning for bursts of high traffic?
Zack,
Thanks for reporting this and for the detailed description. Here's a bunch of 
questions and some things you can try in addition to what Andrew suggested:
1) Is this reproducible in a test environment (perhaps through Pherf: 
https://phoenix.apache.org/pherf.html) so you can experiment more?
-Will check

2) Do you get a sense of whether the bottleneck is on the client or the server? 
CPU, IO, or network? How many clients are you running and have you tried 
increasing this? Do you think your network is saturated by the data being 
returned?
-I'm no expert on this. When I look at the HBase dashboard on Ambari, 
everything looks good. When I look at the stats on the machine running the java 
code, it also looks good. Certainly no bottleneck related to memory or CPU. 
Network wise, the box is on the same rack as the cluster, with 10GB switches 
everywhere, so I'd be surprised if network latency were an issue.

3) From your description, it sounds like you're querying the data as your 
ingesting. When it gets slow, have you tried running a major compaction to see 
if that helps? Perhaps queries are getting slower because of the number of 
HFiles that need to be merged.
-Rereading my original email, I see where you get that. But actually, there is 
nothing being ingested by HBase during this process. At the end of the process, 
I generate a CSV file that is then consumed and altered by Pentaho, then 
consumed by Hive, and THEN some of the Hive data is send to HBase/Phoenix. So 
this is part of the ingest process, but a precursor to the cluster Ingesting 
any data.

4) If you bounce your cluster when it gets slow, does this have any impact?
-Can check. What should I expect to happen if I restart HBase-related services 
while trying to query Phoenix? Will the query just wait until everything is 
back up? Will I get strange exceptions? (Of course I'll go find this out myself)

5) What kinds of queries are running? Aggregation? Joins? Or just plain single 
table selects? Any ORDER BY clauses? Are you using secondary indexes, and if 
so, what kind?
-Very simple query: select x, y, z from my_table where key = 'my_key' and 
sample_point <= upper_range and sample_point >= lower_range. x, y, and z are 
integers.

6) Are you seeing GC pauses on the server during times of slowness (correlate 
time of slowness with your server logs)?
-Can look

7) Sounds like your queries are returning a lot of data. On the client-side, 
Phoenix will keep phoenix.query.spoolThresholdBytes in memory and then spool to 
disk as parallel execution happens. Are you seeing many spool files on the 
client side your /tmp directory (this is where Phoenix puts these by default 
with a name of ResultSpoolerXXX.bin). Try increasing this spool threshold if 
that's the case.
-I'll look into this

8) For the data ingest, are you using UPSERT VALUES? How big of batches are you 
committing? That's one thing to tune, especially if you're using secondary 
indexing.
-Again, nothing ingesting

9) Have you tried tuning the level of parallelization that Phoenix is doing for 
queries? This is controlled by the server-side phoenix.stats.guidepost.width 
parameter (assuming you haven't set the phoenix.stats.guidepost.per.region 
parameter) and defaults to 300MB. Try increasing it (you'll need to run a major 
compaction for this to take effect, and there's 15min lag to when the client 
sees it).
-This sounds interesting. I'll have to learn more about it.

10) If you're doing aggregation or join queries, try increasing the 
phoenix.query.maxGlobalMemorySize property on the server side. Both hash joins 
and aggregation are done in memory, up to this % limit. If the limit is 
reached, then on the server side, Phoenix will wait 
phoenix.query.maxGlobalMemoryWaitMs time for the usage to go below the limit 
(and then throw an exception if it doesn't). You can try tuning this wait time 
down to see if it has an impact.
-Not applicable

11) There a bunch of client-side metrics you can collect (but little 
documentation yet - keep your eye on PHOENIX-2486) that might help you diagnose 
this. See PhoenixRuntime.getGlobalPhoenixClientMetrics(), 
PhoenixRuntime.getOverAllReadRequestMetrics(), and other methods with Metrics 
in the name.
-I'll look into this

12) There's also tracing, which is end-to-end client/server, but it's in a bit 
on the raw side still: 
https://phoenix.apache.org/tracing.htmlhttps://phoenix.apache.org/tracing.html
-OK

There's more information on these tuning parameters here: 
https://phoenix.apache.org/tuning.html and you should take a look at Andrew's 
excellent tuning presentation here: https://phoenix.apache.org/resources.html.

Thanks,
James

On Fri, Dec 4, 2015 at 8:28 AM, Andrew Purtell 
<[email protected]<mailto:[email protected]>> wrote:
Kumar - I believe you mentioned you are seeing this in a cluster of ~20 
regionservers.

Zack - Yours is smaller yet, at 9.

These clusters are small enough to make getting stack dumps through the HBase 
debug servlet during periods of unusually slow response possible. Perhaps you 
can write a script that queries all of the debug servlets (can use curl) and 
dumps the received output into per-regionserver files? Scrape every 10 or so 
seconds during the observed periods of slowness? Then compress them and make 
them available for Phoenix devs up on S3? Consider it a poor man's sampler. I 
don't know what we might find, but this could prove very helpful.

On Fri, Dec 4, 2015 at 8:11 AM, Kumar Palaniappan 
<[email protected]<mailto:[email protected]>> wrote:
I'm in the same exact position as Zack described. Appreciate your feedback.

So far we tried the call queue n the handlers, nope. Planned to try off-heap 
cache.

Kumar Palaniappan<http://about.me/kumar.palaniappan>   
<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

On Dec 4, 2015, at 6:45 AM, Riesland, Zack <[email protected]> 
wrote:<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Thanks 
Satish,<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
To clarify: I’m not looking up single rows. I’m looking up the history of each 
widget, which returns hundreds-to-thousands of results per widget (per query). 
<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Each query is a range scan, it’s just that I’m performing thousands of 
them.<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
From: Satish Iyengar [mailto:[email protected]]
Sent: Friday, December 04, 2015 9:43 AM
To: [email protected]
Subject: Re: Help tuning for bursts of high 
traffic?<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Hi 
Zack,<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Did you consider avoiding hitting hbase for every single row by doing that step 
in an offline mode? I was thinking if you could have some kind of daily export 
of hbase table and then use pig to perform join (co-group perhaps) to do the 
same. Obviously this would work only when your hbase table is not maintained by 
stream based system. Hbase is really good at range scans and may not be ideal 
for single row (large number 
of).<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Thanks,<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Satish<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
On Fri, Dec 4, 2015 at 9:09 AM, Riesland, Zack <[email protected]> 
wrote:<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
SHORT EXPLANATION: a much higher percentage of queries to phoenix return 
exceptionally slow after querying very heavily for several 
minutes.<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
LONGER 
EXPLANATION:<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
I’ve been using Pheonix for about a year as a data store for web-based 
reporting tools and it works 
well.<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Now, I’m trying to use the data in a different (much more request-intensive) 
way and encountering some 
issues.<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
The scenario is basically 
this:<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Daily, ingest very large CSV files with data for widgets. 
<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Each input file has hundreds of rows of data for each widget, and tens of 
thousands of unique 
widgets.<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
As a first step, I want to de-duplicate this data against my Phoenix-based DB 
(I can’t rely on just upserting the data for de-dup because it will go through 
several ETL steps before being stored into 
Phoenix/HBase).<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
So, per-widget, I perform a query against Phoenix (the table is keyed against 
the unique widget ID + sample point). I get all the data for a given widget id, 
within a certain period of time, and then I only ingest rows for that widget 
that are new to 
me.<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
I’m doing this in Java in a single step: I loop through my input file and 
perform one query per widget, using the same Connection object to 
Phoenix.<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
THE 
ISSUE:<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
What I’m finding is that for the first several thousand queries, I almost 
always get a very fast (less than 10 ms) response 
(good).<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
But after 15-20 thousand queries, the response starts to get MUCH slower. Some 
queries respond as expected, but many take as many as 2-3 minutes, pushing the 
total time to prime the data structure into the 12-15 hour range, when it would 
only take 2-3 hours if all the queries were 
fast.<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
The same exact queries, when run manually and not part of this bulk process, 
return in the (expected) < 10 ms. 
<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
So it SEEMS like the burst of queries puts Phoenix into some sort of busy state 
that causes it to respond far too slowly. 
<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
The connection properties I’m setting are: 
<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Phoenix.query.timeoutMs: 
90000<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Phoenix.query.keepAliveMs: 
90000<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Phenix.query.threadPoolSize: 
256<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Our cluster is 9 (beefy) region servers and the table I’m referencing is 511 
regions. We went through a lot of pain to get the data split extremely well, 
and I don’t think Schema design is the issue here. 
<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Can anyone help me understand how to make this better? Is there a better 
approach I could take? A better set of configuration parameters? Is our cluster 
just too small for 
this?<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Thanks!<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
-- 
<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Satish Iyengar

"Anyone who has never made a mistake has never tried anything new."
Albert 
Einstein<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
-- 
<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom 
White)<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

<https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>

RE: Help tuning for bursts of high traffic?

Reply via email to