Re: Best strategy for UPSERT SELECT in large table

Michael Young Mon, 19 Jun 2017 10:59:09 -0700

Regarding your idea to use the snapshot/restore method (with a new name).
Is it possible to add a PK column with that approach?  For example, if I
wanted to change a PK column type from VARCHAR to FLOAT, is this possible?




On Sun, Jun 18, 2017 at 10:50 AM, Jonathan Leech <jonat...@gmail.com> wrote:

> Also, if you're updating that many values and not doing it in bulk /
> mapreduce / straight to hfiles, you'll want to give the region servers as
> much heap as possible, set store files and blocking store files
> astronomically high, and set the memory size for the table before Hbase
> flushes to disk as large as possible. This is to avoid compactions slowing
> you down and causing timeouts. You can also break up the upsert selects
> into smaller chunks and manually compact in between to mitigate. The above
> strategy also applies for other large updates in the regular Hbase write
> path, such as building or rebuilding indexes.
>
> > On Jun 18, 2017, at 11:41 AM, Jonathan Leech <jonat...@gmail.com> wrote:
> >
> > Another thing to consider, but only if your 1:1 mapping keeps the
> primary keys the same, is to snapshot the table and restore it with the new
> name, and a schema that is the union of the old and new schemas. I would
> put the new columns in a new column family. Then use upsert select,
> mapreduce, or Spark to transform the data, then drop the columns from the
> old schema. This strategy could cut the amount of work to be done by half
> and not send data over the network.
> >
> >> On Jun 17, 2017, at 5:06 PM, Randy Hu <ruw...@gmail.com> wrote:
> >>
> >> If I count the number of tailing zeros correctly, it's 15 billion
> records,
> >> any solution based on HBase PUT interaction (UPSERT SELECT) would
> probably
> >> take way more time than your expectation. It would be better to use the
> >> map/reduce based bulk importer provided by Phoenix:
> >>
> >> https://phoenix.apache.org/bulk_dataload.html
> >>
> >> The importer leverages HBase bulk mode to convert all data into HBase
> >> storage file, then hand it over to HBase in the final stage, thus avoids
> >> all network and disk random access cost when going through HBase region
> >> servers.
> >>
> >> Randy
> >>
> >> On Fri, Jun 16, 2017 at 9:51 AM, Pedro Boado [via Apache Phoenix User
> List]
> >> <ml+s1124778n3675...@n5.nabble.com> wrote:
> >>
> >>> Hi guys,
> >>>
> >>> We are trying to populate a Phoenix table based on a 1:1 projection of
> >>> another table with around 15.000.000.000 records via an UPSERT SELECT
> in
> >>> phoenix client. We've noticed a very poor performance ( I suspect the
> >>> client is using a single-threaded approach ) and lots of issues with
> client
> >>> timeouts.
> >>>
> >>> Is there a better way of approaching this problem?
> >>>
> >>> Cheers!
> >>> Pedro
> >>>
> >>>
> >>> ------------------------------
> >>> If you reply to this email, your message will be added to the
> discussion
> >>> below:
> >>> http://apache-phoenix-user-list.1124778.n5.nabble.com/
> >>> Best-strategy-for-UPSERT-SELECT-in-large-table-tp3675.html
> >>> To start a new topic under Apache Phoenix User List, email
> >>> ml+s1124778n1...@n5.nabble.com
> >>> To unsubscribe from Apache Phoenix User List, click here
> >>> <http://apache-phoenix-user-list.1124778.n5.nabble.com/
> template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=
> cnV3ZWloQGdtYWlsLmNvbXwxfC04OTI3ODY3NTc=>
> >>> .
> >>> NAML
> >>> <http://apache-phoenix-user-list.1124778.n5.nabble.com/
> template/NamlServlet.jtp?macro=macro_viewer&id=instant_
> html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.
> BasicNamespace-nabble.view.web.template.NabbleNamespace-
> nabble.view.web.template.NodeNamespace&breadcrumbs=
> notify_subscribers%21nabble%3Aemail.naml-instant_emails%
> 21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> >>>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context: http://apache-phoenix-user-
> list.1124778.n5.nabble.com/Best-strategy-for-UPSERT-SELECT-in-large-table-
> tp3675p3683.html
> >> Sent from the Apache Phoenix User List mailing list archive at
> Nabble.com.
>

Re: Best strategy for UPSERT SELECT in large table

Reply via email to