Sergey, thanks for this tip! Since our client data volume varies a lot from site to site, would splitting only on the first letters of the client_id lead to some regions being much larger than others? Or does phoenix distribute fairly across the different region servers?
Would this continue to work as need client_id's are added to the cluster, or does splitting need to be done again manually somehow? As a follow up question, is there a way to get hbase to balance the regions based on their size rather than the region count? Cheers, Michael On Fri, Jun 23, 2017 at 12:38 PM, Sergey Soldatov <[email protected]> wrote: > You may check "Are there any tips for optimizing Phoenix?" section of > Apache Phoenix FAQ at https://phoenix.apache.org/faq.html. It says how to > pre-split table. In your case you may split on the first letters of > client_id. > > When we are talking about monotonous data, we usually mean the primary key > only. For example if we have primary key integer ID and writing something > with auto increment ID, all the data will go to a single region, creating a > hot spot there. In this case (and actually only in this case) salting may > be useful, since it adds an additional random byte in front of primary key, > giving us a chance to distribute the write load across the cluster. In all > other cases salting causes more work on the cluster since we will be unable > to do a single point lookup/range scan by primary key and need to make > lookup for all salting keys + pk. > > Thanks, > Sergey > > > On Fri, Jun 23, 2017 at 12:00 PM, Michael Young <[email protected]> > wrote: > >> >>Don't you have any other column which is obligatory in queries during >> reading but not monotonous with ingestion? >> We have several columns used in typical query WHERE clauses (like >> userID='abc' or a specific user attributes, data types). However, there are >> a number of columns which are monotonous with many rows having the same >> value. >> >> We have tried running after update STATISTICS on tables, but that would >> be worth investigating again. >> >> Can you give me a hint how to pre-split the data? >> >> Let's say we have the following PK columns (all varchar except dt=date): >> client_id,dt (date),rule_id,user_id,attribute_1,attribute_2,rule_name, >> browser_type,device_type,os_type,page,group_name,period >> >> and non-PK columns in the same table >> requests,connections,queues,queue_time >> >> What would be the suggested way to pre-split? I'm not familiar with this >> technique beyond very simple use cases. >> >> Thanks! >> >> On Thu, Jun 22, 2017 at 11:31 PM, Ankit Singhal <[email protected] >> > wrote: >> >>> bq. A leading date column is in our schema model:- >>> Don't you have any other column which is obligatory in queries during >>> reading but not monotonous with ingestion? As pre-split can help you >>> avoiding hot-spotting. >>> For parallelism/performance comparison, have you tried running a query >>> on a non-salted table after updating the stats and comparing performance >>> with a salted table? >>> >>> >>> On Fri, Jun 23, 2017 at 9:49 AM, Michael Young <[email protected]> >>> wrote: >>> >>>> We started with no salt buckets, but the performance was terrible in >>>> our testing. >>>> >>>> A leading date column is in our schema model. We don't seem to be >>>> getting hotspotting after salting. Date range scans are very common as are >>>> slice and dice on many dimension columns. >>>> >>>> We have tested with a range of SALT values from 0 to 120 for bulk >>>> loading, upserts, selects at different concurrent load levels on a test >>>> cluster before moving to production (with some tweaking post-production). >>>> However, we had fewer average regions per RS during the testing. The >>>> larger SALT numbers definitely gave overall better performance on our >>>> predominantly read-heavy environment. >>>> >>>> I appreciate any insights to identify bottlenecks. >>>> >>>> On Thu, Jun 22, 2017 at 6:26 PM, James Taylor <[email protected]> >>>> wrote: >>>> >>>>> My recommendation: don't use salt buckets unless you have a >>>>> monatomically increasing row key, for example one that leads with the >>>>> current date/time. Otherwise you'll be putting more load (# of salt >>>>> buckets >>>>> more load worst case) for bread-and-butter small-range-scan Phoenix >>>>> queries. >>>>> >>>>> Thanks, >>>>> James >>>>> >>>>> On Fri, Jun 23, 2017 at 10:06 AM Michael Young <[email protected]> >>>>> wrote: >>>>> >>>>>> The ulimit open files was only 1024 for the user executing the >>>>>> query. After increasing, the queries behaves better. >>>>>> >>>>>> How can we tell if we need to reduce/increase the number of salt >>>>>> buckets? >>>>>> >>>>>> Our team set this based on read/write performance using data volume >>>>>> and expected queries to be run by users. >>>>>> >>>>>> However, now it seems the performance has degraded. We can recreate >>>>>> the schemas using fewer/more buckets and reload the data, but I haven't >>>>>> seen a hard and fast rule for setting the number of buckets. >>>>>> >>>>>> We have 12 data nodes, 4 SSDs per node, 128 GB Ram per node, 24 core >>>>>> w/ hyperthreading (HDP 2.5 running, hbase is primary service). >>>>>> and 800+ regions per RS (seems high) >>>>>> >>>>>> Any orientation on this would be greatly appreciated. >>>>>> >>>>>> >>>>>> On Tue, Jun 20, 2017 at 11:54 AM, Josh Elser <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I think this is more of an issue of your 78 salt buckets than the >>>>>>> width of your table. Each chunk, running in parallel, is spilling >>>>>>> incremental counts to disk. >>>>>>> >>>>>>> I'd check your ulimit settings on the node which you run this query >>>>>>> from and try to increase the number of open files allowed before going >>>>>>> into >>>>>>> this one in more depth :) >>>>>>> >>>>>>> >>>>>>> On 6/16/17 2:31 PM, Michael Young wrote: >>>>>>> >>>>>>>> >>>>>>>> We are running a 13-node hbase cluster. One table uses 78 SALT >>>>>>>> BUCKETS which seems to work reasonable well for both read and write. >>>>>>>> This >>>>>>>> table has 130 columns with a PK having 30 columns (fairly wide table). >>>>>>>> >>>>>>>> However, after adding several new tables we are seeing errors about >>>>>>>> too many open files when running a full table scan. >>>>>>>> >>>>>>>> >>>>>>>> Caused by: org.apache.phoenix.exception.PhoenixIOException: Too >>>>>>>> many open files >>>>>>>> at org.apache.phoenix.util.Server >>>>>>>> Util.parseServerException(ServerUtil.java:111) >>>>>>>> at org.apache.phoenix.iterate.Spo >>>>>>>> olingResultIterator.<init>(SpoolingResultIterator.java:152) >>>>>>>> at org.apache.phoenix.iterate.Spo >>>>>>>> olingResultIterator.<init>(SpoolingResultIterator.java:84) >>>>>>>> at org.apache.phoenix.iterate.Spo >>>>>>>> olingResultIterator.<init>(SpoolingResultIterator.java:63) >>>>>>>> at org.apache.phoenix.iterate.Spo >>>>>>>> olingResultIterator$SpoolingResultIteratorFactory.newIterato >>>>>>>> r(SpoolingResultIterator.java:79) >>>>>>>> at org.apache.phoenix.iterate.Par >>>>>>>> allelIterators$1.call(ParallelIterators.java:112) >>>>>>>> at org.apache.phoenix.iterate.Par >>>>>>>> allelIterators$1.call(ParallelIterators.java:103) >>>>>>>> at java.util.concurrent.FutureTas >>>>>>>> k.run(FutureTask.java:266) >>>>>>>> at org.apache.phoenix.job.JobMana >>>>>>>> ger$InstrumentedJobFutureTask.run(JobManager.java:183) >>>>>>>> at java.util.concurrent.ThreadPoo >>>>>>>> lExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>>>>> at java.util.concurrent.ThreadPoo >>>>>>>> lExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>>> Caused by: java.io.IOException: Too many open files >>>>>>>> at java.io.UnixFileSystem.createFileExclusively(Native >>>>>>>> Method) >>>>>>>> at java.io.File.createTempFile(File.java:2024) >>>>>>>> at org.apache.phoenix.shaded.org. >>>>>>>> apache.commons.io.output.DeferredFileOutputStream.thresholdR >>>>>>>> eached(DeferredFileOutputStream.java:176) >>>>>>>> at org.apache.phoenix.iterate.Spo >>>>>>>> olingResultIterator$1.thresholdReached(SpoolingResultIterato >>>>>>>> r.java:116) >>>>>>>> at org.apache.phoenix.shaded.org. >>>>>>>> apache.commons.io.output.ThresholdingOutputStream.checkThres >>>>>>>> hold(ThresholdingOutputStream.java:224) >>>>>>>> at org.apache.phoenix.shaded.org. >>>>>>>> apache.commons.io.output.ThresholdingOutputStream.write(Thre >>>>>>>> sholdingOutputStream.java:92) >>>>>>>> at java.io.DataOutputStream.write >>>>>>>> Byte(DataOutputStream.java:153) >>>>>>>> at org.apache.hadoop.io.WritableU >>>>>>>> tils.writeVLong(WritableUtils.java:273) >>>>>>>> at org.apache.hadoop.io.WritableU >>>>>>>> tils.writeVInt(WritableUtils.java:253) >>>>>>>> at org.apache.phoenix.util.TupleU >>>>>>>> til.write(TupleUtil.java:149) >>>>>>>> at org.apache.phoenix.iterate.Spo >>>>>>>> olingResultIterator.<init>(SpoolingResultIterator.java:127) >>>>>>>> ... 10 more >>>>>>>> >>>>>>>> >>>>>>>> When running an explain plan: >>>>>>>> explain select count(1) from MYBIGTABLE >>>>>>>> >>>>>>>> +----------------------------------------------------------- >>>>>>>> -------------------------------------------------------+ >>>>>>>> | PLAN >>>>>>>> | >>>>>>>> +----------------------------------------------------------- >>>>>>>> -------------------------------------------------------+ >>>>>>>> | CLIENT 8728-CHUNK 674830174 ROWS 2721056772632 BYTES PARALLEL >>>>>>>> 78-WAY FULL SCAN OVER ATT.PRE_ENG_CONVERSION_OLAP | >>>>>>>> | ROW TIMESTAMP FILTER [0, 9223372036854775807) >>>>>>>> | >>>>>>>> | SERVER FILTER BY FIRST KEY ONLY >>>>>>>> | >>>>>>>> | SERVER AGGREGATE INTO SINGLE ROW >>>>>>>> | >>>>>>>> +----------------------------------------------------------- >>>>>>>> -------------------------------------------------------+ >>>>>>>> >>>>>>>> I has a lot of chunks. Normally this query would return at least >>>>>>>> some result after running for a few minutes. With appropriate filters >>>>>>>> in >>>>>>>> the WHERE clause, the queries run fine. >>>>>>>> >>>>>>>> Any suggestions on how to avoid this error and get better >>>>>>>> performance from the table scans? Realizing that we don't need to run >>>>>>>> full >>>>>>>> table scans regularly, just trying to understand better best practices >>>>>>>> for >>>>>>>> Phoenix Hbase. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Michael >>>>>>>> >>>>>>> >>>>>> >>>> >>> >> >
