Regarding your idea to use the snapshot/restore method (with a new name). Is it possible to add a PK column with that approach? For example, if I wanted to change a PK column type from VARCHAR to FLOAT, is this possible?
On Sun, Jun 18, 2017 at 10:50 AM, Jonathan Leech <jonat...@gmail.com> wrote: > Also, if you're updating that many values and not doing it in bulk / > mapreduce / straight to hfiles, you'll want to give the region servers as > much heap as possible, set store files and blocking store files > astronomically high, and set the memory size for the table before Hbase > flushes to disk as large as possible. This is to avoid compactions slowing > you down and causing timeouts. You can also break up the upsert selects > into smaller chunks and manually compact in between to mitigate. The above > strategy also applies for other large updates in the regular Hbase write > path, such as building or rebuilding indexes. > > > On Jun 18, 2017, at 11:41 AM, Jonathan Leech <jonat...@gmail.com> wrote: > > > > Another thing to consider, but only if your 1:1 mapping keeps the > primary keys the same, is to snapshot the table and restore it with the new > name, and a schema that is the union of the old and new schemas. I would > put the new columns in a new column family. Then use upsert select, > mapreduce, or Spark to transform the data, then drop the columns from the > old schema. This strategy could cut the amount of work to be done by half > and not send data over the network. > > > >> On Jun 17, 2017, at 5:06 PM, Randy Hu <ruw...@gmail.com> wrote: > >> > >> If I count the number of tailing zeros correctly, it's 15 billion > records, > >> any solution based on HBase PUT interaction (UPSERT SELECT) would > probably > >> take way more time than your expectation. It would be better to use the > >> map/reduce based bulk importer provided by Phoenix: > >> > >> https://phoenix.apache.org/bulk_dataload.html > >> > >> The importer leverages HBase bulk mode to convert all data into HBase > >> storage file, then hand it over to HBase in the final stage, thus avoids > >> all network and disk random access cost when going through HBase region > >> servers. > >> > >> Randy > >> > >> On Fri, Jun 16, 2017 at 9:51 AM, Pedro Boado [via Apache Phoenix User > List] > >> <ml+s1124778n3675...@n5.nabble.com> wrote: > >> > >>> Hi guys, > >>> > >>> We are trying to populate a Phoenix table based on a 1:1 projection of > >>> another table with around 15.000.000.000 records via an UPSERT SELECT > in > >>> phoenix client. We've noticed a very poor performance ( I suspect the > >>> client is using a single-threaded approach ) and lots of issues with > client > >>> timeouts. > >>> > >>> Is there a better way of approaching this problem? > >>> > >>> Cheers! > >>> Pedro > >>> > >>> > >>> ------------------------------ > >>> If you reply to this email, your message will be added to the > discussion > >>> below: > >>> http://apache-phoenix-user-list.1124778.n5.nabble.com/ > >>> Best-strategy-for-UPSERT-SELECT-in-large-table-tp3675.html > >>> To start a new topic under Apache Phoenix User List, email > >>> ml+s1124778n1...@n5.nabble.com > >>> To unsubscribe from Apache Phoenix User List, click here > >>> <http://apache-phoenix-user-list.1124778.n5.nabble.com/ > template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code= > cnV3ZWloQGdtYWlsLmNvbXwxfC04OTI3ODY3NTc=> > >>> . > >>> NAML > >>> <http://apache-phoenix-user-list.1124778.n5.nabble.com/ > template/NamlServlet.jtp?macro=macro_viewer&id=instant_ > html%21nabble%3Aemail.naml&base=nabble.naml.namespaces. > BasicNamespace-nabble.view.web.template.NabbleNamespace- > nabble.view.web.template.NodeNamespace&breadcrumbs= > notify_subscribers%21nabble%3Aemail.naml-instant_emails% > 21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > >>> > >> > >> > >> > >> > >> -- > >> View this message in context: http://apache-phoenix-user- > list.1124778.n5.nabble.com/Best-strategy-for-UPSERT-SELECT-in-large-table- > tp3675p3683.html > >> Sent from the Apache Phoenix User List mailing list archive at > Nabble.com. >