Hi folks, This is a nice write-up of the round-table meeting at HBaseConAsia. I would like to address the points I have pulled out from write-up (at the bottom of this message).
Many in the HBase community may not be aware that besides Apache Phoenix, there has been a project called Apache Trafodion, contributed by Hewlett-Packard in 2015 that has now been top-level project for a while. Apache Trafodion is essentially technology from Tandem-Compaq-HP that started its OLTP / Operational journey as NonStop SQL effectively in the early 1990s. Granted it is a C++ project, but it has 170+ patents as part of it that were contributed to Apache. These are capabilities that still don’t exist in other databases. It is a full-fledged SQL relational database engine with the breadth of ANSI SQL support, including OLAP functions mentioned, and including many de facto standard functions from databases like Oracle. You can go to the Apache Trafodion wiki to see the documentation as to what all is supported by Trafodion. When we introduced Apache Trafodion, we implemented a completely distributed transaction management capability right into the HBase engine using coprocessors, that is completely scalable with no bottlenecks what-so-ever. We have made this infrastructure very efficient over time, e.g. reducing two-phase commit overhead for single region transactions. We have presented this at HBaseCon. The engine also supports secondary indexes. However, because of our Multi-dimensional Access Method patented technology the need to use a secondary index is substantially reduced. All DDL and index updates are completely protected by ACID transactions. Probably because of our own inability to create excitement about the project, and potentially other reasons, we could not get community involvement as we were expecting. That is why you may see that while we are maintaining the code base and introducing enhancements to it, much of our focus has shifted to the commercial product based on Apache Trafodion, namely EsgynDB. But if the community involvement increases, we can certainly refresh Trafodion with some of the additional functionality we have added on the HBase side of the product. But let me be clear. We are about 150 employees at Esgyn with 40 or so in the US, mostly in Milpitas, and the rest in Shanghai, Beijing, and Guiyang. We cannot sustain the company on service revenue alone. You have seen companies that tried to do that have not been successful, unless they have a way to leverage the open source project for a different business model – enhanced capabilities, Cloud services, etc. To that end we have added to EsgynDB complete Disaster Recovery, Point-in-Time, fuzzy Backup and Restore, Manageability via a Database Manager, Multi-tenancy, and a large number of other capabilities for High Availability scale-out production deployments. EsgynDB also provides full BI and Analytics capabilities, again because of our heritage products supporting up to 250TB EDWs for HP and customers like Walmart competing with Teradata, leveraging Apache ORC and Parquet. So yes, it can integrate with other storage engines as needed. However, in spite of all this, the pricing on EsgynDB is very competitive – in other words “cheap” compared to anything else with the same caliber of capabilities. We have demonstrated the capability of the product by running the TPC-C and TPC-DS (all 99 queries) benchmarks, especially at high concurrency which our product is especially well suited for, based on its architecture and patents. (The TPC-DS benchmarks are run on ORC and Parquet for obvious reasons.) We just closed a couple of very large Core Banking deals in Guiyang where we are replacing the entire Core Banking system for these banks from their current Oracle implementations – where they were having challenges scaling at a reasonable cost. But we have many customers both in the US and China that are using EsgynDB for operational, BI and Analytics needs. And now finally … OLTP. I know that this is sounding more like a commercial for Esgyn, but that is not my intent. I would like to make you aware of Apache Trafodion as a solution to many of these issues that the community is facing. We will provide full support for Trafodion with community involvement and hope that some of that involvement results in EsgynDB revenue that we can sustain the company on 😊. I would like to encourage the community to look at Trafodion to address many of the concerns sighted below. “Allan Yang said that most of their customers want secondary index, even more than SQL. And for global strong consistent secondary index, we agree that the only safe way is to use transaction. Other 'local' solutions will be in trouble when splitting/merging.” “We talked about Phoenix, the problem for Phoenix is well known: not stable enough. We even had a user on the mailing-list said he/she will never use Phoenix again.” “Some guys said that the current feature set for 3.0.0 is not good enough to attract more users, especially for small companies. Only internal improvements, no users visible features. SQL and secondary index are very important.” “Then we back to SQL again. Alibaba said that most of their customers are migrate from old business, so they need 'full' SQL support. That's why they need Phoenix. And lots of small companies wants to run OLAP queries directly on the database, they do no want to use ETL. So maybe in the SQL proxy (planned above), we should delegate the OLAP queries to spark SQL or something else, rather than just rejecting them.” “And a Phoenix committer said that, the Phoenix community are currently re-evaluate the relationship with HBase, because when upgrading to HBase 2.1.x, lots of things are broken. They plan to break the tie between Phoenix and HBase, which means Phoenix plans to also run on other storage systems. Note: This is not on the meeting but personally, I think this maybe a good news, since Phoenix is not HBase only, we have more reasons to introduce our own SQL layer.” Rohit Jain CTO Esgyn -----Original Message----- From: Stack <st...@duboce.net> Sent: Friday, July 26, 2019 12:01 PM To: HBase Dev List <d...@hbase.apache.org> Cc: hbase-user <user@hbase.apache.org> Subject: Re: The note of the round table meeting after HBaseConAsia 2019 External Thanks for the thorough write-up Duo. Made for a good read.... S On Fri, Jul 26, 2019 at 6:43 AM 张铎(Duo Zhang) <palomino...@gmail.com<mailto:palomino...@gmail.com>> wrote: > The conclusion of the HBaseConAsia 2019 will be available later. And > here is the note of the round table meeting after the conference. A bit > long... > > First we talked about splittable meta. At Xiaomi we have a cluster > which has nearly 200k regions and meta is very easy to overload and > can not recover. Anoop said we can try read replica, but agreed that > read replica can not solve all the problems, finally we still need to split > meta. > > Then we talked about SQL. Allan Yang said that most of their customers > want secondary index, even more than SQL. And for global strong > consistent secondary index, we agree that the only safe way is to use > transaction. > Other 'local' solutions will be in trouble when splitting/merging. > Xiaomi has an global secondary index solution, open source it? > > Then we back to SQL. We talked about Phoenix, the problem for Phoenix > is well known: not stable enough. We even had a user on the > mailing-list said he/she will never use Phoenix again. Alibaba and > Huawei both have their in-house SQL solution, and Huawei also talked > about it on HBaseConAsia 2019, they will try to open source it. And we > could introduce a SQL proxy in hbase-connector repo. No push down > support first, all logics are done at the proxy side, can optimize later. > > Some guys said that the current feature set for 3.0.0 is not good > enough to attract more users, especially for small companies. Only > internal improvements, no users visible features. SQL and secondary > index are very important. > > Yu Li talked about the CCSMap, we still want it to be release in > 3.0.0. One problem is the relationship with in memory compaction. > Theoretically they should have no conflicts but actually they have. > And Xiaomi guys mentioned that in memory compaction still has some > bugs, even for basic mode, the MVCC writePoint may be stuck and hang > the region server. And Jieshan Bi asked why not just use CCSMap to > replace CSLM. Yu Li said this is for better memory usage, the index and data > could be placed together. > > Then we started to talk about the HBase on cloud. For now, it is a bit > difficult to deploy HBase on cloud as we need to deploy zookeeper and > HDFS first. Then we talked about the HBOSS and WAL abstraction(HBASE-209520. > Wellington said the HBOSS basicly works, it use s3a and zookeeper to > help simulating the operations of HDFS. We could introduce our own > 'FileSystem' > interface, not the hadoop one, and we could remove the 'atomic renaming' > dependency so the 'FileSystem' implementation will be easier. And on > the WAL abstraction, Wellington said there are still some guys working > it, but now they focus on patching ratis, rather than abstracting the > WAL system first. We agreed that a better way is to abstract WAL > system at a level higher than FileSystem. so maybe we could even use Kafka to > store the WAL. > > Then we talked about the FPGA usage for compaction at Alibaba. Jieshan > Bi said that in Huawei they offload the compaction to storage layer. > For open source solution, maybe we could offload the compaction to > spark, and then use something like bulkload to let region server load > the new HFiles. The problem for doing compaction inside region server > is the CPU cost and GC pressure. We need to scan every cell so the CPU > cost is high. Yu Li talked about their page based compaction in flink > state store, maybe it could also benefit HBase. > > Then it is the time for MOB. Huawei said MOD can not solve their problem. > We still need to read the data through RPC, and it will also introduce > pressures on the memstore, since the memstore is still a bit small, > comparing to MOB cell. And we will also flush a lot although there are > only a small number of MOB cells in the memstore, so we still need to > compact a lot. So maybe the suitable scenario for using MOB is that, > most of your data are still small, and a small amount of the data are > a bit larger, where MOD could increase the performance, and users do > not need to use another system to store the larger data. > Huawei said that they implement the logic at client side. If the data > is larger than a threshold, the client will go to another storage > system rather than HBase. > Alibaba said that if we want to support large blob, we need to > introduce streaming API. > And Kuaishou said that they do not use MOB, they just store data on > HDFS and the index in HBase, typical solution. > > Then we talked about which company to host the next year's > HBaseConAsia. It will be Tencent or Huawei, or both, probably in > Shenzhen. And since there is no HBaseCon in America any more(it is > called 'NoSQL Day'), maybe next year we could just call the conference > HBaseCon. > > Then we back to SQL again. Alibaba said that most of their customers > are migrate from old business, so they need 'full' SQL support. That's > why they need Phoenix. And lots of small companies wants to run OLAP > queries directly on the database, they do no want to use ETL. So maybe > in the SQL proxy(planned above), we should delegate the OLAP queries > to spark SQL or something else, rather than just rejecting them. > > And a Phoenix committer said that, the Phoenix community are currently > re-evaluate the relationship with HBase, because when upgrading to > HBase 2.1.x, lots of things are broken. They plan to break the tie > between Phoenix and HBase, which means Phoenix plans to also run on > other storage systems. > Note: This is not on the meeting but personally, I think this maybe a > good news, since Phoenix is not HBase only, we have more reasons to > introduce our own SQL layer. > > Then we talked about Kudu. It is faster than HBase on scan. If we want > to increase the performance on scan, we should have larger block size, > but this will lead to a slower random read, so we need to trade-off. > The Kuaishou guys asked whether HBase could support storing HFile in > columnar format. The answer is no, as said above, it will slow random read. > But we could learn what google done in bigtable. We could write a copy > of the data in parquet format to another FileSystem, and user could > just scan the parquet file for better analysis performance. And if > they want the newest data, they could ask HBase for the newest data, > and it should be small. This is more like a solution, not only HBase > is involved. But at least we could introduce some APIs in HBase so > users can build the solution in their own environment. And if you do > not care the newest data, you could also use replication to replicate > the data to ES or other systems, and search there. > > And Didi talked about their problems using HBase. They use kylin so > they also have lots of regions, so meta is also a problem for them. > And the pressure on zookeeper is also a problem, as the replication > queues are stored on zk. And after 2.1, zookeeper is only used as an > external storage in replication implementation, so it is possible to > switch to other storages, such as etcd. But it is still a bit > difficult to store the data in a system table, as now we need to start > the replication system before WAL system, but if we want to store the > replication data in a hbase table, obviously the WAL system must be > started before replication system, as we need the region of the system > online first, and it will write an open marker to WAL. We need to find a way > to break the dead lock. > And they also mentioned that, the rsgroup feature also makes big znode > on zookeeper, as they have lots of tables. We have HBASE-22514 which > aims to solve the problem. > And last, they shared their experience when upgrading from 0.98 to 1.4.x. > they should be compatible but actually there are problems. They agreed > to post a blog about this. > > And the Flipkart guys said they will open source their test-suite, > which focus on the consistency(Jepsen?). This is a good news, hope we > could have another useful tool other than ITBLL. > > That's all. Thanks for reading. >