Re: Help tuning for bursts of high traffic?

Samarth Jain Wed, 09 Dec 2015 11:00:01 -0800

Zack,

These stats are collected continuously and at the global client level. So
collecting them only when the query takes more than 1 second won't work. A
better alternative for you would be to report stats at a request level. You
could then conditionally report the metrics for queries that exceed 1
second execution. The metric you want to make that decision on is
WALL_CLOCK_TIME_MS.


Request level metric collection isn't enabled by default. You can turn it
on/off for a PhoenixConnection by doing the following:

Properties props = new Properties();
props.put(QueryServices.COLLECT_REQUEST_LEVEL_METRICS, String.valueOf(true
));
Connection conn = DriverManager.getConnection(url, props);

Then see PhoenixRuntime.getRequestReadMetrics() on how to get hold of these
metrics.

I have a JIRA filed - https://issues.apache.org/jira/browse/PHOENIX-2486
for adding the documentation for these metrics. Will get to it shortly.






On Wed, Dec 9, 2015 at 10:38 AM, Riesland, Zack <[email protected]>
wrote:

> Thanks a lot, James.
>
>
>
> And now that I’m using 4.6, I have access to some statistics.
>
>
>
> I added code to collect stats whenever a query takes more than 1 second.
>
>
>
> The first time this happened was after about 51,000 queries, and the
> output is below, for whatever it’s worth.
>
>
>
> Can you point me to any documentation to describe these values, or to
> utilize this information?
>
>
>
> I have a hunch that SCAN_BYTES and TASK_QUEUE_WAIT_TIME are the
> interesting numbers here?
>
>
>
> PHOENIX STATS:
>
> Batch sizes of mutations | MUTATION_BATCH_SIZE: 0, Number of samples: 0 | 0
>
> Size of mutations in bytes | MUTATION_BYTES: 0, Number of samples: 0 | 0
>
> Time it took to commit mutations | MUTATION_COMMIT_TIME: 0, Number of
> samples: 0 | 0
>
> Query times | QUERY_TIME: 2424030, Number of samples: 51555 | 2424030
>
> Number of scans that were executed in parallel | NUM_PARALLEL_SCANS:
> 51769, Number of samples: 51554 | 51769
>
> Number of bytes read by scans | SCAN_BYTES: 28193431552, Number of
> samples: 220261184 | 28193431552
>
> Size of spool files created in bytes | SPOOL_FILE_SIZE: 0, Number of
> samples: 0 | 0
>
> Number of bytes allocated by the memory manager | MEMORY_CHUNK_BYTES: 0,
> Number of samples: 0 | 0
>
> Number of milliseconds threads needed to wait for memory to be allocated
> through memory manager | MEMORY_WAIT_TIME: 0, Number of samples: 0 | 0
>
> Time in milliseconds tasks had to wait in the queue of the thread pool
> executor | TASK_QUEUE_WAIT_TIME: 114349, Number of samples: 2235211 | 114349
>
> Time in milliseconds spent by tasks from creation to completion |
> TASK_END_TO_END_TIME: 6030797, Number of samples: 2235211 | 6030797
>
> Time in milliseconds tasks took to execute | TASK_EXECUTION_TIME: 5916067,
> Number of samples: 2235211 | 5916067
>
> Counter for number of mutation sql statements | MUTATION_SQL_COUNTER: 93,
> Number of samples: 93 | 93
>
> Counter for number of sql queries | SELECT_SQL_COUNTER: 51555, Number of
> samples: 51555 | 51555
>
> Counter for number of tasks submitted to the thread pool executor |
> TASK_EXECUTED_COUNTER: 2235211, Number of samples: 2235211 | 2235211
>
> Counter for number of tasks that were rejected by the thread pool executor
> | TASK_REJECTED_COUNTER: 0, Number of samples: 0 | 0
>
> Number of times query timed out | QUERY_TIMEOUT_COUNTER: 0, Number of
> samples: 0 | 0
>
> Number of times query failed | QUERY_FAILED_COUNTER: 0, Number of samples:
> 0 | 0
>
> Number of spool files created | SPOOL_FILE_COUNTER: 0, Number of samples:
> 0 | 0
>
>
>
> *From:* James Taylor [mailto:[email protected]]
> *Sent:* Tuesday, December 08, 2015 7:10 PM
>
> *To:* user
> *Subject:* Re: Help tuning for bursts of high traffic?
>
>
>
> Thanks for clarifying, Zack. For profiling on your client machine, you can
> just use VisualVM as it's built into the JDK (
> https://dzone.com/articles/best-kept-secret-jdk-visualvm).
>
>
>
>     James
>
>
>
> On Tue, Dec 8, 2015 at 6:31 AM, Riesland, Zack <[email protected]>
> wrote:
>
> Thanks for your reply, James.
>
>
>
> I apologize if I’m not being very clear.
>
>
>
> To clarify, NO DATA is being modified (upserted or deleted) in
> HBase/Phoenix during this use case. Only queries during the day time.
>
>
>
> Throughout the day, different files are consumed by this java process
> (performing the queries), and resulting files are generated.
>
>
>
> Then, at night, long after this code is finished running, a series of
> batch processes take these files and eventually ingest their data into
> Hive. THEN, the Hive data is sent to Phoenix via CsvBulkUpload.
>
>
>
> These processes are completely independent and do not run at the same
> time. Data in Phoenix is ONLY modified at night.
>
>
>
> Finally, yes, the pattern that I’m seeing regarding CPU utilization and
> queries taking minutes instead of milliseconds is on a  CLIENT machine –
> NOT on a data node. My theory is that the Phoenix client is somehow getting
> behind or doing garbage collection in a way that causes the queries to get
> backed up.
>
>
>
> Some comments (replies inline):
>
> - 4.2.2 is a pretty old release at this point - we've up to 4.6.0 as our
> last stable release. I'd recommend upgrading.
>
> --> Agreed. I’m going to follow the instructions here and try to upgrade
> to 4.6.0 today: https://phoenix.apache.org/upgrading.html
>
>
>
> - If the problem is on the client-side, you should be able to profile the
> Java application during the period of slowness and get us the details we
> need to diagnose this.
>
> --> Do you have a recommendation for a tool that I should go learn about
> for this?
>
>
>
> - Phoenix executes a single query using many threads already, so your
> ~2000 thread calculations wouldn't be accurate. For a little bit more info
> on how Phoenix parallelized queries, see
> https://phoenix.apache.org/tuning.html#Parallelization. At a minimum,
> Phoenix will execute one thread per region involved in your query. You can
> control the level of parallelization through the
> phoenix.stats.guidepost.width config parameter. If you want to minimize the
> parallelization that Phoenix does, you can set this config parameter to a
> very larger value in the server side hbase-site.xml. The default is
> 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the size you allow
> a region to grow to before it splits - default 20GB), then you're
> essentially disabling it. You could also try increasing it somewhere in
> between to maybe 5 or 10GB.
>
> --> Thank you. This is helpful. I will experiment with this.
>
>
>
> - Running 8 simultaneous instances on the same client will overload that
> machine. Run them on separate client machines - but start with a single one
> on one machine.
>
> --> I only have one machine for this (for now), but it is a very beefy
> machine (8 very fast CPUs and almost 1 TB of RAM). I need to perform this
> input/dedup operation for about 500 files each day (one per customer), and
> each file will require 5,000-100,000 queries, depending on the size of the
> customer. Supposing an average of 10K queries/customer, that’s about 5
> million queries per day. At 10ms per query, that’s about 14 hours, which I
> need to improve considerably, especially as our data grows. Based on some
> experimenting, I’ve found that I can get an overall average much closer to
> 2-3ms / query if I run several instances in parallel. However, this is when
> I start to encounter the thread limit issue. So I’ll continue to experiment
> with this and I appreciate any feedback or recommendations this community
> can provide.
>
>
>
> Thanks!
>
>
>
>
>
> *From:* James Taylor [mailto:[email protected]]
> *Sent:* Monday, December 07, 2015 6:34 PM
>
>
> *To:* user
> *Subject:* Re: Help tuning for bursts of high traffic?
>
>
>
> Thanks for the additional information, Zack. I'd like to confirm I
> understand how you're using Phoenix:
>
> - Data is not being upserted at the same time it is being queried. But I'm
> confused by your statement of "and THEN some of the Hive data is send to
> HBase/Phoenix". How is the data being sent? How much data? Is querying
> occurring while this data is being ingested?
>
> - Querying happens *after* any data ingest. "So, per-widget, I perform a
> query against Phoenix". Are you determining duplicates through these
> queries and then issuing a Phoenix DELETE of the duplicates?
>
> - The CPUs you mention being pegged are the CPUs on the *client* machine:
> "I see that, periodically – maybe every 60 or 90 seconds – all of my CPUs
> (there are 8 on this machine) go from mildly busy to almost totally pegged".
>
>
>
> If you could confirm the above, that'd be super helpful. I can't think of
> any period process in Phoenix that runs every 60 or 90 seconds.
>
>
>
> Some comments:
>
> - 4.2.2 is a pretty old release at this point - we've up to 4.6.0 as our
> last stable release. I'd recommend upgrading.
>
> - If the problem is on the client-side, you should be able to profile the
> Java application during the period of slowness and get us the details we
> need to diagnose this.
>
> - Phoenix executes a single query using many threads already, so your
> ~2000 thread calculations wouldn't be accurate. For a little bit more info
> on how Phoenix parallelized queries, see
> https://phoenix.apache.org/tuning.html#Parallelization. At a minimum,
> Phoenix will execute one thread per region involved in your query. You can
> control the level of parallelization through the
> phoenix.stats.guidepost.width config parameter. If you want to minimize the
> parallelization that Phoenix does, you can set this config parameter to a
> very larger value in the server side hbase-site.xml. The default is
> 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the size you allow
> a region to grow to before it splits - default 20GB), then you're
> essentially disabling it. You could also try increasing it somewhere in
> between to maybe 5 or 10GB.
>
> - Running 8 simultaneous instances on the same client will overload that
> machine. Run them on separate client machines - but start with a single one
> on one machine.
>
>
>
> On Mon, Dec 7, 2015 at 8:32 AM, Riesland, Zack <[email protected]>
> wrote:
>
> Also, and somewhat related:
>
>
>
> I’m trying to running 8 simultaneous instances of this code (on 8 separate
> input files), since I have 8 CPUs on the machine.
>
>
>
> When I try this, I get java.lang.RuntimeException: java.lang.OutOfMemory:
> unable to create a new native thread
>
>
>
> My phoenix connection has the “phoenix.query.threadPoolSize” set to “256”,
> which should result in 8x256 = ~2,000 threads being spawned by Phoenix.
>
>
>
> Is that correct?
>
>
>
> I’m running RH Linux 6, with my ulimit set to “unlimited”, so I should be
> able to handle thousands of threads.
>
>
>
> Any ideas?
>
>
>
> *From:* Andrew Purtell [mailto:[email protected]]
> *Sent:* Friday, December 04, 2015 4:24 PM
> *To:* [email protected]
> *Cc:* Haisty, Geoffrey
>
> *Subject:* Re: Help tuning for bursts of high traffic?
>
>
>
> Any chance of stack dumps from the debug servlet? Impossible to get
> anywhere with 'pegged the CPU' otherwise. Thanks.
>
>
> On Dec 4, 2015, at 12:20 PM, Riesland, Zack <[email protected]>
> wrote:
>
> James,
>
>
>
> 2 quick followups, for whatever they’re worth:
>
>
>
> 1 – There is nothing phoenix-related in /tmp
>
>
>
> 2 – I added a ton of logging, and played with the properties a bit, and I
> think I see a pattern:
>
>
>
> Watching the logging and the system profiler side-by-side, I see that,
> periodically – maybe every 60 or 90 seconds – all of my CPUs (there are 8
> on this machine) go from mildly busy to almost totally pegged.
>
>
>
> They USUALLY stay pegged for 5-10 seconds, and then calm down.
>
>
>
> However, occasionally, they stay pegged for around a minute. When this
> happens, I get the very slow queries. I added logic so that when I get a
> very slow response (> 1 second), I pause for 30 seconds.
>
>
>
> This ‘fixes’ everything, in the sense that I’m usually able to get a
> couple thousand good queries before the whole pattern repeats.
>
>
>
> For reference, there’s nothing external that should be causing those CPU
> spikes, so I’m guessing that it’s maybe java GC (?) or perhaps something
> that the phoenix client is doing ?
>
>
>
> Can you guess at what Phoenix might do periodically that would peg the
> CPUs – and in such a way that a query has to wait as much as 2 minutes to
> execute (I’m guessing from the pattern that it’s not actually the query
> that is slow, but a very long between when it gets queued and when it
> actually gets executed).
>
>
>
> Oh and the methods you mentioned aren’t in my version of PhoenixRuntime,
> evidently. I’m on 4.2.2.something.
>
>
>
> Thanks for any further feedback you can provide on this. Hopefully the
> conversation is helpful to the whole Phoenix community.
>
>
>
> *From:* Riesland, Zack
> *Sent:* Friday, December 04, 2015 1:36 PM
> *To:* [email protected]
> *Cc:* [email protected]
> *Subject:* RE: Help tuning for bursts of high traffic?
>
>
>
> Thanks, James
>
>
>
> I'll work on gathering more information.
>
>
>
> In the meantime, answers to a few of your questions inline below just
> narrow the scope a bit:
>
>
> ------------------------------
>
> *From:* James Taylor [[email protected]]
> *Sent:* Friday, December 04, 2015 12:21 PM
> *To:* user
> *Subject:* Re: Help tuning for bursts of high traffic?
>
> Zack,
>
> Thanks for reporting this and for the detailed description. Here's a bunch
> of questions and some things you can try in addition to what Andrew
> suggested:
>
> 1) Is this reproducible in a test environment (perhaps through Pherf:
> https://phoenix.apache.org/pherf.html) so you can experiment more?
>
> -Will check
>
>
>
> 2) Do you get a sense of whether the bottleneck is on the client or the
> server? CPU, IO, or network? How many clients are you running and have you
> tried increasing this? Do you think your network is saturated by the data
> being returned?
>
> -I'm no expert on this. When I look at the HBase dashboard on Ambari,
> everything looks good. When I look at the stats on the machine running the
> java code, it also looks good. Certainly no bottleneck related to memory or
> CPU. Network wise, the box is on the same rack as the cluster, with 10GB
> switches everywhere, so I'd be surprised if network latency were an issue.
>
>
>
> 3) From your description, it sounds like you're querying the data as your
> ingesting. When it gets slow, have you tried running a major compaction to
> see if that helps? Perhaps queries are getting slower because of the number
> of HFiles that need to be merged.
>
> -Rereading my original email, I see where you get that. But actually,
> there is nothing being ingested by HBase during this process. At the end of
> the process, I generate a CSV file that is then consumed and altered by
> Pentaho, then consumed by Hive, and THEN some of the Hive data is send to
> HBase/Phoenix. So this is part of the ingest process, but a precursor to
> the cluster Ingesting any data.
>
>
>
> 4) If you bounce your cluster when it gets slow, does this have any impact?
>
> -Can check. What should I expect to happen if I restart HBase-related
> services while trying to query Phoenix? Will the query just wait until
> everything is back up? Will I get strange exceptions? (Of course I'll go
> find this out myself)
>
>
>
> 5) What kinds of queries are running? Aggregation? Joins? Or just plain
> single table selects? Any ORDER BY clauses? Are you using secondary
> indexes, and if so, what kind?
>
> -Very simple query: select x, y, z from my_table where key = 'my_key' and
> sample_point <= upper_range and sample_point >= lower_range. x, y, and z
> are integers.
>
>
>
> 6) Are you seeing GC pauses on the server during times of slowness
> (correlate time of slowness with your server logs)?
>
> -Can look
>
>
>
> 7) Sounds like your queries are returning a lot of data. On the
> client-side, Phoenix will keep phoenix.query.spoolThresholdBytes in memory
> and then spool to disk as parallel execution happens. Are you seeing many
> spool files on the client side your /tmp directory (this is where Phoenix
> puts these by default with a name of ResultSpoolerXXX.bin). Try increasing
> this spool threshold if that's the case.
>
> -I'll look into this
>
>
>
> 8) For the data ingest, are you using UPSERT VALUES? How big of batches
> are you committing? That's one thing to tune, especially if you're using
> secondary indexing.
>
> -Again, nothing ingesting
>
>
>
> 9) Have you tried tuning the level of parallelization that Phoenix is
> doing for queries? This is controlled by the
> server-side phoenix.stats.guidepost.width parameter (assuming you haven't
> set the phoenix.stats.guidepost.per.region parameter) and defaults to
> 300MB. Try increasing it (you'll need to run a major compaction for this to
> take effect, and there's 15min lag to when the client sees it).
>
> -This sounds interesting. I'll have to learn more about it.
>
>
>
> 10) If you're doing aggregation or join queries, try increasing
> the phoenix.query.maxGlobalMemorySize property on the server side. Both
> hash joins and aggregation are done in memory, up to this % limit. If the
> limit is reached, then on the server side, Phoenix will
> wait phoenix.query.maxGlobalMemoryWaitMs time for the usage to go below the
> limit (and then throw an exception if it doesn't). You can try tuning this
> wait time down to see if it has an impact.
>
> -Not applicable
>
>
>
> 11) There a bunch of client-side metrics you can collect (but little
> documentation yet - keep your eye on PHOENIX-2486) that might help you
> diagnose this. See PhoenixRuntime.getGlobalPhoenixClientMetrics(),
> PhoenixRuntime.getOverAllReadRequestMetrics(), and other methods with
> Metrics in the name.
>
> -I'll look into this
>
>
>
> 12) There's also tracing, which is end-to-end client/server, but it's in a
> bit on the raw side still: https://phoenix.apache.org/tracing.html
> https://phoenix.apache.org/tracing.html
>
> -OK
>
>
>
> There's more information on these tuning parameters here:
> https://phoenix.apache.org/tuning.html and you should take a look at
> Andrew's excellent tuning presentation here:
> https://phoenix.apache.org/resources.html.
>
>
>
> Thanks,
>
> James
>
>
>
>
>
> On Fri, Dec 4, 2015 at 8:28 AM, Andrew Purtell <[email protected]>
> wrote:
>
> Kumar - I believe you mentioned you are seeing this in a cluster of ~20
> regionservers.
>
>
>
> Zack - Yours is smaller yet, at 9.
>
>
>
> These clusters are small enough to make getting stack dumps through the
> HBase debug servlet during periods of unusually slow response possible.
> Perhaps you can write a script that queries all of the debug servlets (can
> use curl) and dumps the received output into per-regionserver files? Scrape
> every 10 or so seconds during the observed periods of slowness? Then
> compress them and make them available for Phoenix devs up on S3? Consider
> it a poor man's sampler. I don't know what we might find, but this could
> prove very helpful.
>
>
>
>
>
> On Fri, Dec 4, 2015 at 8:11 AM, Kumar Palaniappan <
> [email protected]> wrote:
>
> I'm in the same exact position as Zack described. Appreciate your feedback.
>
>
>
> So far we tried the call queue n the handlers, nope. Planned to try
> off-heap cache.
>
> Kumar Palaniappan <http://about.me/kumar.palaniappan>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> On Dec 4, 2015, at 6:45 AM, Riesland, Zack <[email protected]>
> wrote:
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Thanks Satish,
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> To clarify: I’m not looking up single rows. I’m looking up the history of
> each widget, which returns hundreds-to-thousands of results per widget (per
> query).
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Each query is a range scan, it’s just that I’m performing thousands of
> them.
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
>
>
> *From: Satish Iyengar [mailto:[email protected]] Sent: Friday, December 04,
> 2015 9:43 AM To: [email protected] Subject: Re: Help tuning for
> bursts of high traffic?
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>*
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Hi Zack,
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Did you consider avoiding hitting hbase for every single row by doing that
> step in an offline mode? I was thinking if you could have some kind of
> daily export of hbase table and then use pig to perform join (co-group
> perhaps) to do the same. Obviously this would work only when your hbase
> table is not maintained by stream based system. Hbase is really good at
> range scans and may not be ideal for single row (large number of).
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Thanks,
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Satish
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> On Fri, Dec 4, 2015 at 9:09 AM, Riesland, Zack <[email protected]>
> wrote:
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> SHORT EXPLANATION: a much higher percentage of queries to phoenix return
> exceptionally slow after querying very heavily for several minutes.
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> LONGER EXPLANATION:
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> I’ve been using Pheonix for about a year as a data store for web-based
> reporting tools and it works well.
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Now, I’m trying to use the data in a different (much more
> request-intensive) way and encountering some issues.
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> The scenario is basically this:
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Daily, ingest very large CSV files with data for widgets.
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Each input file has hundreds of rows of data for each widget, and tens of
> thousands of unique widgets.
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> As a first step, I want to de-duplicate this data against my Phoenix-based
> DB (I can’t rely on just upserting the data for de-dup because it will go
> through several ETL steps before being stored into Phoenix/HBase).
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> So, per-widget, I perform a query against Phoenix (the table is keyed
> against the unique widget ID + sample point). I get all the data for a
> given widget id, within a certain period of time, and then I only ingest
> rows for that widget that are new to me.
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> I’m doing this in Java in a single step: I loop through my input file and
> perform one query per widget, using the same Connection object to Phoenix.
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> THE ISSUE:
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> What I’m finding is that for the first several thousand queries, I almost
> always get a very fast (less than 10 ms) response (good).
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> But after 15-20 thousand queries, the response starts to get MUCH slower.
> Some queries respond as expected, but many take as many as 2-3 minutes,
> pushing the total time to prime the data structure into the 12-15 hour
> range, when it would only take 2-3 hours if all the queries were fast.
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> The same exact queries, when run manually and not part of this bulk
> process, return in the (expected) < 10 ms.
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> So it SEEMS like the burst of queries puts Phoenix into some sort of busy
> state that causes it to respond far too slowly.
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> The connection properties I’m setting are:
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Phoenix.query.timeoutMs: 90000
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Phoenix.query.keepAliveMs: 90000
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Phenix.query.threadPoolSize: 256
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Our cluster is 9 (beefy) region servers and the table I’m referencing is
> 511 regions. We went through a lot of pain to get the data split extremely
> well, and I don’t think Schema design is the issue here.
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Can anyone help me understand how to make this better? Is there a better
> approach I could take? A better set of configuration parameters? Is our
> cluster just too small for this?
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Thanks!
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> --
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Satish Iyengar
>
> "Anyone who has never made a mistake has never tried anything new."
> Albert Einstein
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> --
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
> <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0>
>
>
>
>
>

Re: Help tuning for bursts of high traffic?

Reply via email to