Zack, These stats are collected continuously and at the global client level. So collecting them only when the query takes more than 1 second won't work. A better alternative for you would be to report stats at a request level. You could then conditionally report the metrics for queries that exceed 1 second execution. The metric you want to make that decision on is WALL_CLOCK_TIME_MS.
Request level metric collection isn't enabled by default. You can turn it on/off for a PhoenixConnection by doing the following: Properties props = new Properties(); props.put(QueryServices.COLLECT_REQUEST_LEVEL_METRICS, String.valueOf(true )); Connection conn = DriverManager.getConnection(url, props); Then see PhoenixRuntime.getRequestReadMetrics() on how to get hold of these metrics. I have a JIRA filed - https://issues.apache.org/jira/browse/PHOENIX-2486 for adding the documentation for these metrics. Will get to it shortly. On Wed, Dec 9, 2015 at 10:38 AM, Riesland, Zack <[email protected]> wrote: > Thanks a lot, James. > > > > And now that I’m using 4.6, I have access to some statistics. > > > > I added code to collect stats whenever a query takes more than 1 second. > > > > The first time this happened was after about 51,000 queries, and the > output is below, for whatever it’s worth. > > > > Can you point me to any documentation to describe these values, or to > utilize this information? > > > > I have a hunch that SCAN_BYTES and TASK_QUEUE_WAIT_TIME are the > interesting numbers here? > > > > PHOENIX STATS: > > Batch sizes of mutations | MUTATION_BATCH_SIZE: 0, Number of samples: 0 | 0 > > Size of mutations in bytes | MUTATION_BYTES: 0, Number of samples: 0 | 0 > > Time it took to commit mutations | MUTATION_COMMIT_TIME: 0, Number of > samples: 0 | 0 > > Query times | QUERY_TIME: 2424030, Number of samples: 51555 | 2424030 > > Number of scans that were executed in parallel | NUM_PARALLEL_SCANS: > 51769, Number of samples: 51554 | 51769 > > Number of bytes read by scans | SCAN_BYTES: 28193431552, Number of > samples: 220261184 | 28193431552 > > Size of spool files created in bytes | SPOOL_FILE_SIZE: 0, Number of > samples: 0 | 0 > > Number of bytes allocated by the memory manager | MEMORY_CHUNK_BYTES: 0, > Number of samples: 0 | 0 > > Number of milliseconds threads needed to wait for memory to be allocated > through memory manager | MEMORY_WAIT_TIME: 0, Number of samples: 0 | 0 > > Time in milliseconds tasks had to wait in the queue of the thread pool > executor | TASK_QUEUE_WAIT_TIME: 114349, Number of samples: 2235211 | 114349 > > Time in milliseconds spent by tasks from creation to completion | > TASK_END_TO_END_TIME: 6030797, Number of samples: 2235211 | 6030797 > > Time in milliseconds tasks took to execute | TASK_EXECUTION_TIME: 5916067, > Number of samples: 2235211 | 5916067 > > Counter for number of mutation sql statements | MUTATION_SQL_COUNTER: 93, > Number of samples: 93 | 93 > > Counter for number of sql queries | SELECT_SQL_COUNTER: 51555, Number of > samples: 51555 | 51555 > > Counter for number of tasks submitted to the thread pool executor | > TASK_EXECUTED_COUNTER: 2235211, Number of samples: 2235211 | 2235211 > > Counter for number of tasks that were rejected by the thread pool executor > | TASK_REJECTED_COUNTER: 0, Number of samples: 0 | 0 > > Number of times query timed out | QUERY_TIMEOUT_COUNTER: 0, Number of > samples: 0 | 0 > > Number of times query failed | QUERY_FAILED_COUNTER: 0, Number of samples: > 0 | 0 > > Number of spool files created | SPOOL_FILE_COUNTER: 0, Number of samples: > 0 | 0 > > > > *From:* James Taylor [mailto:[email protected]] > *Sent:* Tuesday, December 08, 2015 7:10 PM > > *To:* user > *Subject:* Re: Help tuning for bursts of high traffic? > > > > Thanks for clarifying, Zack. For profiling on your client machine, you can > just use VisualVM as it's built into the JDK ( > https://dzone.com/articles/best-kept-secret-jdk-visualvm). > > > > James > > > > On Tue, Dec 8, 2015 at 6:31 AM, Riesland, Zack <[email protected]> > wrote: > > Thanks for your reply, James. > > > > I apologize if I’m not being very clear. > > > > To clarify, NO DATA is being modified (upserted or deleted) in > HBase/Phoenix during this use case. Only queries during the day time. > > > > Throughout the day, different files are consumed by this java process > (performing the queries), and resulting files are generated. > > > > Then, at night, long after this code is finished running, a series of > batch processes take these files and eventually ingest their data into > Hive. THEN, the Hive data is sent to Phoenix via CsvBulkUpload. > > > > These processes are completely independent and do not run at the same > time. Data in Phoenix is ONLY modified at night. > > > > Finally, yes, the pattern that I’m seeing regarding CPU utilization and > queries taking minutes instead of milliseconds is on a CLIENT machine – > NOT on a data node. My theory is that the Phoenix client is somehow getting > behind or doing garbage collection in a way that causes the queries to get > backed up. > > > > Some comments (replies inline): > > - 4.2.2 is a pretty old release at this point - we've up to 4.6.0 as our > last stable release. I'd recommend upgrading. > > --> Agreed. I’m going to follow the instructions here and try to upgrade > to 4.6.0 today: https://phoenix.apache.org/upgrading.html > > > > - If the problem is on the client-side, you should be able to profile the > Java application during the period of slowness and get us the details we > need to diagnose this. > > --> Do you have a recommendation for a tool that I should go learn about > for this? > > > > - Phoenix executes a single query using many threads already, so your > ~2000 thread calculations wouldn't be accurate. For a little bit more info > on how Phoenix parallelized queries, see > https://phoenix.apache.org/tuning.html#Parallelization. At a minimum, > Phoenix will execute one thread per region involved in your query. You can > control the level of parallelization through the > phoenix.stats.guidepost.width config parameter. If you want to minimize the > parallelization that Phoenix does, you can set this config parameter to a > very larger value in the server side hbase-site.xml. The default is > 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the size you allow > a region to grow to before it splits - default 20GB), then you're > essentially disabling it. You could also try increasing it somewhere in > between to maybe 5 or 10GB. > > --> Thank you. This is helpful. I will experiment with this. > > > > - Running 8 simultaneous instances on the same client will overload that > machine. Run them on separate client machines - but start with a single one > on one machine. > > --> I only have one machine for this (for now), but it is a very beefy > machine (8 very fast CPUs and almost 1 TB of RAM). I need to perform this > input/dedup operation for about 500 files each day (one per customer), and > each file will require 5,000-100,000 queries, depending on the size of the > customer. Supposing an average of 10K queries/customer, that’s about 5 > million queries per day. At 10ms per query, that’s about 14 hours, which I > need to improve considerably, especially as our data grows. Based on some > experimenting, I’ve found that I can get an overall average much closer to > 2-3ms / query if I run several instances in parallel. However, this is when > I start to encounter the thread limit issue. So I’ll continue to experiment > with this and I appreciate any feedback or recommendations this community > can provide. > > > > Thanks! > > > > > > *From:* James Taylor [mailto:[email protected]] > *Sent:* Monday, December 07, 2015 6:34 PM > > > *To:* user > *Subject:* Re: Help tuning for bursts of high traffic? > > > > Thanks for the additional information, Zack. I'd like to confirm I > understand how you're using Phoenix: > > - Data is not being upserted at the same time it is being queried. But I'm > confused by your statement of "and THEN some of the Hive data is send to > HBase/Phoenix". How is the data being sent? How much data? Is querying > occurring while this data is being ingested? > > - Querying happens *after* any data ingest. "So, per-widget, I perform a > query against Phoenix". Are you determining duplicates through these > queries and then issuing a Phoenix DELETE of the duplicates? > > - The CPUs you mention being pegged are the CPUs on the *client* machine: > "I see that, periodically – maybe every 60 or 90 seconds – all of my CPUs > (there are 8 on this machine) go from mildly busy to almost totally pegged". > > > > If you could confirm the above, that'd be super helpful. I can't think of > any period process in Phoenix that runs every 60 or 90 seconds. > > > > Some comments: > > - 4.2.2 is a pretty old release at this point - we've up to 4.6.0 as our > last stable release. I'd recommend upgrading. > > - If the problem is on the client-side, you should be able to profile the > Java application during the period of slowness and get us the details we > need to diagnose this. > > - Phoenix executes a single query using many threads already, so your > ~2000 thread calculations wouldn't be accurate. For a little bit more info > on how Phoenix parallelized queries, see > https://phoenix.apache.org/tuning.html#Parallelization. At a minimum, > Phoenix will execute one thread per region involved in your query. You can > control the level of parallelization through the > phoenix.stats.guidepost.width config parameter. If you want to minimize the > parallelization that Phoenix does, you can set this config parameter to a > very larger value in the server side hbase-site.xml. The default is > 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the size you allow > a region to grow to before it splits - default 20GB), then you're > essentially disabling it. You could also try increasing it somewhere in > between to maybe 5 or 10GB. > > - Running 8 simultaneous instances on the same client will overload that > machine. Run them on separate client machines - but start with a single one > on one machine. > > > > On Mon, Dec 7, 2015 at 8:32 AM, Riesland, Zack <[email protected]> > wrote: > > Also, and somewhat related: > > > > I’m trying to running 8 simultaneous instances of this code (on 8 separate > input files), since I have 8 CPUs on the machine. > > > > When I try this, I get java.lang.RuntimeException: java.lang.OutOfMemory: > unable to create a new native thread > > > > My phoenix connection has the “phoenix.query.threadPoolSize” set to “256”, > which should result in 8x256 = ~2,000 threads being spawned by Phoenix. > > > > Is that correct? > > > > I’m running RH Linux 6, with my ulimit set to “unlimited”, so I should be > able to handle thousands of threads. > > > > Any ideas? > > > > *From:* Andrew Purtell [mailto:[email protected]] > *Sent:* Friday, December 04, 2015 4:24 PM > *To:* [email protected] > *Cc:* Haisty, Geoffrey > > *Subject:* Re: Help tuning for bursts of high traffic? > > > > Any chance of stack dumps from the debug servlet? Impossible to get > anywhere with 'pegged the CPU' otherwise. Thanks. > > > On Dec 4, 2015, at 12:20 PM, Riesland, Zack <[email protected]> > wrote: > > James, > > > > 2 quick followups, for whatever they’re worth: > > > > 1 – There is nothing phoenix-related in /tmp > > > > 2 – I added a ton of logging, and played with the properties a bit, and I > think I see a pattern: > > > > Watching the logging and the system profiler side-by-side, I see that, > periodically – maybe every 60 or 90 seconds – all of my CPUs (there are 8 > on this machine) go from mildly busy to almost totally pegged. > > > > They USUALLY stay pegged for 5-10 seconds, and then calm down. > > > > However, occasionally, they stay pegged for around a minute. When this > happens, I get the very slow queries. I added logic so that when I get a > very slow response (> 1 second), I pause for 30 seconds. > > > > This ‘fixes’ everything, in the sense that I’m usually able to get a > couple thousand good queries before the whole pattern repeats. > > > > For reference, there’s nothing external that should be causing those CPU > spikes, so I’m guessing that it’s maybe java GC (?) or perhaps something > that the phoenix client is doing ? > > > > Can you guess at what Phoenix might do periodically that would peg the > CPUs – and in such a way that a query has to wait as much as 2 minutes to > execute (I’m guessing from the pattern that it’s not actually the query > that is slow, but a very long between when it gets queued and when it > actually gets executed). > > > > Oh and the methods you mentioned aren’t in my version of PhoenixRuntime, > evidently. I’m on 4.2.2.something. > > > > Thanks for any further feedback you can provide on this. Hopefully the > conversation is helpful to the whole Phoenix community. > > > > *From:* Riesland, Zack > *Sent:* Friday, December 04, 2015 1:36 PM > *To:* [email protected] > *Cc:* [email protected] > *Subject:* RE: Help tuning for bursts of high traffic? > > > > Thanks, James > > > > I'll work on gathering more information. > > > > In the meantime, answers to a few of your questions inline below just > narrow the scope a bit: > > > ------------------------------ > > *From:* James Taylor [[email protected]] > *Sent:* Friday, December 04, 2015 12:21 PM > *To:* user > *Subject:* Re: Help tuning for bursts of high traffic? > > Zack, > > Thanks for reporting this and for the detailed description. Here's a bunch > of questions and some things you can try in addition to what Andrew > suggested: > > 1) Is this reproducible in a test environment (perhaps through Pherf: > https://phoenix.apache.org/pherf.html) so you can experiment more? > > -Will check > > > > 2) Do you get a sense of whether the bottleneck is on the client or the > server? CPU, IO, or network? How many clients are you running and have you > tried increasing this? Do you think your network is saturated by the data > being returned? > > -I'm no expert on this. When I look at the HBase dashboard on Ambari, > everything looks good. When I look at the stats on the machine running the > java code, it also looks good. Certainly no bottleneck related to memory or > CPU. Network wise, the box is on the same rack as the cluster, with 10GB > switches everywhere, so I'd be surprised if network latency were an issue. > > > > 3) From your description, it sounds like you're querying the data as your > ingesting. When it gets slow, have you tried running a major compaction to > see if that helps? Perhaps queries are getting slower because of the number > of HFiles that need to be merged. > > -Rereading my original email, I see where you get that. But actually, > there is nothing being ingested by HBase during this process. At the end of > the process, I generate a CSV file that is then consumed and altered by > Pentaho, then consumed by Hive, and THEN some of the Hive data is send to > HBase/Phoenix. So this is part of the ingest process, but a precursor to > the cluster Ingesting any data. > > > > 4) If you bounce your cluster when it gets slow, does this have any impact? > > -Can check. What should I expect to happen if I restart HBase-related > services while trying to query Phoenix? Will the query just wait until > everything is back up? Will I get strange exceptions? (Of course I'll go > find this out myself) > > > > 5) What kinds of queries are running? Aggregation? Joins? Or just plain > single table selects? Any ORDER BY clauses? Are you using secondary > indexes, and if so, what kind? > > -Very simple query: select x, y, z from my_table where key = 'my_key' and > sample_point <= upper_range and sample_point >= lower_range. x, y, and z > are integers. > > > > 6) Are you seeing GC pauses on the server during times of slowness > (correlate time of slowness with your server logs)? > > -Can look > > > > 7) Sounds like your queries are returning a lot of data. On the > client-side, Phoenix will keep phoenix.query.spoolThresholdBytes in memory > and then spool to disk as parallel execution happens. Are you seeing many > spool files on the client side your /tmp directory (this is where Phoenix > puts these by default with a name of ResultSpoolerXXX.bin). Try increasing > this spool threshold if that's the case. > > -I'll look into this > > > > 8) For the data ingest, are you using UPSERT VALUES? How big of batches > are you committing? That's one thing to tune, especially if you're using > secondary indexing. > > -Again, nothing ingesting > > > > 9) Have you tried tuning the level of parallelization that Phoenix is > doing for queries? This is controlled by the > server-side phoenix.stats.guidepost.width parameter (assuming you haven't > set the phoenix.stats.guidepost.per.region parameter) and defaults to > 300MB. Try increasing it (you'll need to run a major compaction for this to > take effect, and there's 15min lag to when the client sees it). > > -This sounds interesting. I'll have to learn more about it. > > > > 10) If you're doing aggregation or join queries, try increasing > the phoenix.query.maxGlobalMemorySize property on the server side. Both > hash joins and aggregation are done in memory, up to this % limit. If the > limit is reached, then on the server side, Phoenix will > wait phoenix.query.maxGlobalMemoryWaitMs time for the usage to go below the > limit (and then throw an exception if it doesn't). You can try tuning this > wait time down to see if it has an impact. > > -Not applicable > > > > 11) There a bunch of client-side metrics you can collect (but little > documentation yet - keep your eye on PHOENIX-2486) that might help you > diagnose this. See PhoenixRuntime.getGlobalPhoenixClientMetrics(), > PhoenixRuntime.getOverAllReadRequestMetrics(), and other methods with > Metrics in the name. > > -I'll look into this > > > > 12) There's also tracing, which is end-to-end client/server, but it's in a > bit on the raw side still: https://phoenix.apache.org/tracing.html > https://phoenix.apache.org/tracing.html > > -OK > > > > There's more information on these tuning parameters here: > https://phoenix.apache.org/tuning.html and you should take a look at > Andrew's excellent tuning presentation here: > https://phoenix.apache.org/resources.html. > > > > Thanks, > > James > > > > > > On Fri, Dec 4, 2015 at 8:28 AM, Andrew Purtell <[email protected]> > wrote: > > Kumar - I believe you mentioned you are seeing this in a cluster of ~20 > regionservers. > > > > Zack - Yours is smaller yet, at 9. > > > > These clusters are small enough to make getting stack dumps through the > HBase debug servlet during periods of unusually slow response possible. > Perhaps you can write a script that queries all of the debug servlets (can > use curl) and dumps the received output into per-regionserver files? Scrape > every 10 or so seconds during the observed periods of slowness? Then > compress them and make them available for Phoenix devs up on S3? Consider > it a poor man's sampler. I don't know what we might find, but this could > prove very helpful. > > > > > > On Fri, Dec 4, 2015 at 8:11 AM, Kumar Palaniappan < > [email protected]> wrote: > > I'm in the same exact position as Zack described. Appreciate your feedback. > > > > So far we tried the call queue n the handlers, nope. Planned to try > off-heap cache. > > Kumar Palaniappan <http://about.me/kumar.palaniappan> > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > On Dec 4, 2015, at 6:45 AM, Riesland, Zack <[email protected]> > wrote: > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Thanks Satish, > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > To clarify: I’m not looking up single rows. I’m looking up the history of > each widget, which returns hundreds-to-thousands of results per widget (per > query). > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Each query is a range scan, it’s just that I’m performing thousands of > them. > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > > > *From: Satish Iyengar [mailto:[email protected]] Sent: Friday, December 04, > 2015 9:43 AM To: [email protected] Subject: Re: Help tuning for > bursts of high traffic? > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0>* > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Hi Zack, > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Did you consider avoiding hitting hbase for every single row by doing that > step in an offline mode? I was thinking if you could have some kind of > daily export of hbase table and then use pig to perform join (co-group > perhaps) to do the same. Obviously this would work only when your hbase > table is not maintained by stream based system. Hbase is really good at > range scans and may not be ideal for single row (large number of). > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Thanks, > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Satish > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > On Fri, Dec 4, 2015 at 9:09 AM, Riesland, Zack <[email protected]> > wrote: > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > SHORT EXPLANATION: a much higher percentage of queries to phoenix return > exceptionally slow after querying very heavily for several minutes. > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > LONGER EXPLANATION: > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > I’ve been using Pheonix for about a year as a data store for web-based > reporting tools and it works well. > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Now, I’m trying to use the data in a different (much more > request-intensive) way and encountering some issues. > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > The scenario is basically this: > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Daily, ingest very large CSV files with data for widgets. > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Each input file has hundreds of rows of data for each widget, and tens of > thousands of unique widgets. > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > As a first step, I want to de-duplicate this data against my Phoenix-based > DB (I can’t rely on just upserting the data for de-dup because it will go > through several ETL steps before being stored into Phoenix/HBase). > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > So, per-widget, I perform a query against Phoenix (the table is keyed > against the unique widget ID + sample point). I get all the data for a > given widget id, within a certain period of time, and then I only ingest > rows for that widget that are new to me. > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > I’m doing this in Java in a single step: I loop through my input file and > perform one query per widget, using the same Connection object to Phoenix. > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > THE ISSUE: > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > What I’m finding is that for the first several thousand queries, I almost > always get a very fast (less than 10 ms) response (good). > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > But after 15-20 thousand queries, the response starts to get MUCH slower. > Some queries respond as expected, but many take as many as 2-3 minutes, > pushing the total time to prime the data structure into the 12-15 hour > range, when it would only take 2-3 hours if all the queries were fast. > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > The same exact queries, when run manually and not part of this bulk > process, return in the (expected) < 10 ms. > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > So it SEEMS like the burst of queries puts Phoenix into some sort of busy > state that causes it to respond far too slowly. > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > The connection properties I’m setting are: > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Phoenix.query.timeoutMs: 90000 > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Phoenix.query.keepAliveMs: 90000 > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Phenix.query.threadPoolSize: 256 > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Our cluster is 9 (beefy) region servers and the table I’m referencing is > 511 regions. We went through a lot of pain to get the data split extremely > well, and I don’t think Schema design is the issue here. > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Can anyone help me understand how to make this better? Is there a better > approach I could take? A better set of configuration parameters? Is our > cluster just too small for this? > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Thanks! > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > -- > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Satish Iyengar > > "Anyone who has never made a mistake has never tried anything new." > Albert Einstein > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > -- > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > <https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> > > > > >
