I think you could add additional pk columns, but not change or remove existing ones.
> On Jun 19, 2017, at 11:58 AM, Michael Young <[email protected]> wrote: > > Regarding your idea to use the snapshot/restore method (with a new name). Is > it possible to add a PK column with that approach? For example, if I wanted > to change a PK column type from VARCHAR to FLOAT, is this possible? > > > >> On Sun, Jun 18, 2017 at 10:50 AM, Jonathan Leech <[email protected]> wrote: >> Also, if you're updating that many values and not doing it in bulk / >> mapreduce / straight to hfiles, you'll want to give the region servers as >> much heap as possible, set store files and blocking store files >> astronomically high, and set the memory size for the table before Hbase >> flushes to disk as large as possible. This is to avoid compactions slowing >> you down and causing timeouts. You can also break up the upsert selects into >> smaller chunks and manually compact in between to mitigate. The above >> strategy also applies for other large updates in the regular Hbase write >> path, such as building or rebuilding indexes. >> >> > On Jun 18, 2017, at 11:41 AM, Jonathan Leech <[email protected]> wrote: >> > >> > Another thing to consider, but only if your 1:1 mapping keeps the primary >> > keys the same, is to snapshot the table and restore it with the new name, >> > and a schema that is the union of the old and new schemas. I would put the >> > new columns in a new column family. Then use upsert select, mapreduce, or >> > Spark to transform the data, then drop the columns from the old schema. >> > This strategy could cut the amount of work to be done by half and not send >> > data over the network. >> > >> >> On Jun 17, 2017, at 5:06 PM, Randy Hu <[email protected]> wrote: >> >> >> >> If I count the number of tailing zeros correctly, it's 15 billion records, >> >> any solution based on HBase PUT interaction (UPSERT SELECT) would probably >> >> take way more time than your expectation. It would be better to use the >> >> map/reduce based bulk importer provided by Phoenix: >> >> >> >> https://phoenix.apache.org/bulk_dataload.html >> >> >> >> The importer leverages HBase bulk mode to convert all data into HBase >> >> storage file, then hand it over to HBase in the final stage, thus avoids >> >> all network and disk random access cost when going through HBase region >> >> servers. >> >> >> >> Randy >> >> >> >> On Fri, Jun 16, 2017 at 9:51 AM, Pedro Boado [via Apache Phoenix User >> >> List] >> >> <[email protected]> wrote: >> >> >> >>> Hi guys, >> >>> >> >>> We are trying to populate a Phoenix table based on a 1:1 projection of >> >>> another table with around 15.000.000.000 records via an UPSERT SELECT in >> >>> phoenix client. We've noticed a very poor performance ( I suspect the >> >>> client is using a single-threaded approach ) and lots of issues with >> >>> client >> >>> timeouts. >> >>> >> >>> Is there a better way of approaching this problem? >> >>> >> >>> Cheers! >> >>> Pedro >> >>> >> >>> >> >>> ------------------------------ >> >>> If you reply to this email, your message will be added to the discussion >> >>> below: >> >>> http://apache-phoenix-user-list.1124778.n5.nabble.com/ >> >>> Best-strategy-for-UPSERT-SELECT-in-large-table-tp3675.html >> >>> To start a new topic under Apache Phoenix User List, email >> >>> [email protected] >> >>> To unsubscribe from Apache Phoenix User List, click here >> >>> <http://apache-phoenix-user-list.1124778.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cnV3ZWloQGdtYWlsLmNvbXwxfC04OTI3ODY3NTc=> >> >>> . >> >>> NAML >> >>> <http://apache-phoenix-user-list.1124778.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >> >>> >> >> >> >> >> >> >> >> >> >> -- >> >> View this message in context: >> >> http://apache-phoenix-user-list.1124778.n5.nabble.com/Best-strategy-for-UPSERT-SELECT-in-large-table-tp3675p3683.html >> >> Sent from the Apache Phoenix User List mailing list archive at Nabble.com. >
