Re: Best strategy for UPSERT SELECT in large table

Jonathan Leech Mon, 19 Jun 2017 13:01:51 -0700

I think you could add additional pk columns, but not change or remove existing 
ones.


> On Jun 19, 2017, at 11:58 AM, Michael Young <[email protected]> wrote:
> 
> Regarding your idea to use the snapshot/restore method (with a new name).  Is 
> it possible to add a PK column with that approach?  For example, if I wanted 
> to change a PK column type from VARCHAR to FLOAT, is this possible?
> 
> 
> 
>> On Sun, Jun 18, 2017 at 10:50 AM, Jonathan Leech <[email protected]> wrote:
>> Also, if you're updating that many values and not doing it in bulk / 
>> mapreduce / straight to hfiles, you'll want to give the region servers as 
>> much heap as possible, set store files and blocking store files 
>> astronomically high, and set the memory size for the table before Hbase 
>> flushes to disk as large as possible. This is to avoid compactions slowing 
>> you down and causing timeouts. You can also break up the upsert selects into 
>> smaller chunks and manually compact in between to mitigate. The above 
>> strategy also applies for other large updates in the regular Hbase write 
>> path, such as building or rebuilding indexes.
>> 
>> > On Jun 18, 2017, at 11:41 AM, Jonathan Leech <[email protected]> wrote:
>> >
>> > Another thing to consider, but only if your 1:1 mapping keeps the primary 
>> > keys the same, is to snapshot the table and restore it with the new name, 
>> > and a schema that is the union of the old and new schemas. I would put the 
>> > new columns in a new column family. Then use upsert select, mapreduce, or 
>> > Spark to transform the data, then drop the columns from the old schema. 
>> > This strategy could cut the amount of work to be done by half and not send 
>> > data over the network.
>> >
>> >> On Jun 17, 2017, at 5:06 PM, Randy Hu <[email protected]> wrote:
>> >>
>> >> If I count the number of tailing zeros correctly, it's 15 billion records,
>> >> any solution based on HBase PUT interaction (UPSERT SELECT) would probably
>> >> take way more time than your expectation. It would be better to use the
>> >> map/reduce based bulk importer provided by Phoenix:
>> >>
>> >> https://phoenix.apache.org/bulk_dataload.html
>> >>
>> >> The importer leverages HBase bulk mode to convert all data into HBase
>> >> storage file, then hand it over to HBase in the final stage, thus avoids
>> >> all network and disk random access cost when going through HBase region
>> >> servers.
>> >>
>> >> Randy
>> >>
>> >> On Fri, Jun 16, 2017 at 9:51 AM, Pedro Boado [via Apache Phoenix User 
>> >> List]
>> >> <[email protected]> wrote:
>> >>
>> >>> Hi guys,
>> >>>
>> >>> We are trying to populate a Phoenix table based on a 1:1 projection of
>> >>> another table with around 15.000.000.000 records via an UPSERT SELECT in
>> >>> phoenix client. We've noticed a very poor performance ( I suspect the
>> >>> client is using a single-threaded approach ) and lots of issues with 
>> >>> client
>> >>> timeouts.
>> >>>
>> >>> Is there a better way of approaching this problem?
>> >>>
>> >>> Cheers!
>> >>> Pedro
>> >>>
>> >>>
>> >>> ------------------------------
>> >>> If you reply to this email, your message will be added to the discussion
>> >>> below:
>> >>> http://apache-phoenix-user-list.1124778.n5.nabble.com/
>> >>> Best-strategy-for-UPSERT-SELECT-in-large-table-tp3675.html
>> >>> To start a new topic under Apache Phoenix User List, email
>> >>> [email protected]
>> >>> To unsubscribe from Apache Phoenix User List, click here
>> >>> <http://apache-phoenix-user-list.1124778.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cnV3ZWloQGdtYWlsLmNvbXwxfC04OTI3ODY3NTc=>
>> >>> .
>> >>> NAML
>> >>> <http://apache-phoenix-user-list.1124778.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>> >>>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context: 
>> >> http://apache-phoenix-user-list.1124778.n5.nabble.com/Best-strategy-for-UPSERT-SELECT-in-large-table-tp3675p3683.html
>> >> Sent from the Apache Phoenix User List mailing list archive at Nabble.com.
>

Re: Best strategy for UPSERT SELECT in large table

Reply via email to