Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread 张铎
If you guys have already implemented the feature in the MR way and the patch is ready for landing on master, I'm a -0 on it as I do not want to block the development progress. But I strongly suggest later we need to revisit the design and see if we can seperated the logic from HMaster as much as

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread 张铎
2016-09-23 12:38 GMT+08:00 Devaraj Das : > Guys, first off apologies for bringing in the topic of MR-based > compactions.. But I was thinking more about the SpliceMachine approach of > managing compactions in Spark where apparently they saw a lot of benefits. > Apologies for

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
Just wanted to add one argument of doing this in a Master way : Client - based backups/restore are very hard (if possible) to make fully fault tolerant. If client fails abruptly half way, some system data will be broken, cluster will never return into original state. We disable, for example

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Devaraj Das
All the better, Vlad! On Thu, Sep 22, 2016 at 9:53 PM -0700, "Vladimir Rodionov" > wrote: >> If in the future, we find better ways of doing this without using MR, we can certainly consider that Our framework for distributed operations

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
>> If in the future, we find better ways of doing this without using MR, we can certainly consider that Our framework for distributed operations is abstract and allows different implementations. MR is just one implementation we provide. -Vlad On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Devaraj Das
Guys, first off apologies for bringing in the topic of MR-based compactions.. But I was thinking more about the SpliceMachine approach of managing compactions in Spark where apparently they saw a lot of benefits. Apologies for giving you that sore throat Andrew; I really didn't mean to :-) So

[jira] [Created] (HBASE-16689) Durability == ASYNC_WAL means no SYNC

2016-09-22 Thread stack (JIRA)
stack created HBASE-16689: - Summary: Durability == ASYNC_WAL means no SYNC Key: HBASE-16689 URL: https://issues.apache.org/jira/browse/HBASE-16689 Project: HBase Issue Type: Bug

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread 张铎
Stability is one thing, and another thing is the difficulty of configuration and deployment. For configuration, it is always a pain. I do not want to restart HMaster many times to get thing right. A standalone service would be better. For deployment, as chenheng said above, usually we do not

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Ted Yu
You mean standalone service which runs Procedure V2 ? Not sure how much work is involved. Is this concerning the stability of Master where backup / restore procedures run ? To my understanding, errors in one procedure are isolated, not having adverse impact on Master's stability. On Thu, Sep

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread 张铎
So what about a standalone service other than master? You can use your own procedure store in that service? 2016-09-23 11:28 GMT+08:00 Ted Yu : > An earlier implementation was client driven. > > But with that approach, it is hard to resume if there is error midway. > Using

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
Agreed, this would be interesting to contemplate. On Sep 22, 2016, at 8:03 PM, Vladimir Rodionov wrote: >>> No, never. > > No need for M/R here, just a simple compaction-server colocated with RS on > a same node. > You save a lot on GC in RS. Ideally, it can be IO

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
No, this misses Matteo's finer point, which is "shelling out" from the master directly to run MR is a first. Why not drive this with a utility derived from Tool? On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov wrote: >>> In our production cluster, it is a common case

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
>> If MR is not strong dependency for Master/RS, it is OK for me. There is no strong MR dependency for Master/RS. They will function as usual, until you try backup, it will fail but Master won't. -Vlad On Thu, Sep 22, 2016 at 8:03 PM, Vladimir Rodionov wrote: > >> No,

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
>> No, never. No need for M/R here, just a simple compaction-server colocated with RS on a same node. You save a lot on GC in RS. Ideally, it can be IO "nice" in Linux (by setting IO priority). But offtopic, of course :) -Vlad On Thu, Sep 22, 2016 at 7:57 PM, Vladimir Rodionov

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
>> In our production cluster, it is a common case we just have HDFS and >> HBase deployed. >> If our Master/RS depend on MR framework (especially some features we >> have not used at all), it introduced another cost for maintain. I >> don't think it is a good idea. So , you are not backup

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Ted Yu
If MR framework is not deployed in the cluster, hbase still functions normally (post merge). In terms of build time dependency, we have long been depending on mapreduce. Take a look at ExportSnapshot. Cheers On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen wrote: > In our

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Heng Chen
In our production cluster, it is a common case we just have HDFS and HBase deployed. If our Master/RS depend on MR framework (especially some features we have not used at all), it introduced another cost for maintain. I don't think it is a good idea. 2016-09-23 10:28 GMT+08:00 张铎

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread 张铎
I‘m -1 on let master or rs launch MR jobs. It is OK that some of our features depend on MR but I think the bottom line is that we should launch the jobs from outside manually or by other services. 2016-09-23 9:47 GMT+08:00 Andrew Purtell : > Ok, got it. Well "shelling

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
(Back with a sore throat.) Also for what it is worth - it may well be that the attempt to bolt containers-as-executors to YARN is too little too late and coordination of container based services and applications (such as distributed map-reduce workflows or more likely Spark) will be handled by

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
> We should also do compactions using MR (just saying :) No, never. It's not a good idea to wed any of our core function to something that independently evolves, that some of us don't have commit rights on (and never will), and has varying degrees of utility depending on deploy. Like JM says

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
Ok, got it. Well "shelling out" is on the line I think, so a fair question. Can this be driven by a utility derived from Tool like our other MR apps? The issue is needing the AccessController to decide if allowed? But nothing prevents the user from running the job manually/independently, right?

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Enis Söztutar
Once you are in the game of coordinating large scale tasks with distribution, fault tolerance, etc other than implementing a similar framework inside HBase, MR will be the way to go. Things like exporting snapshots, dist cp, or backups (which uses these) must use such a framework. The issue about

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Devaraj Das
Not practical to do those tools without MR, JM. We should be using the right framework for the use cases in hand. MR fits this really well. JM, when you say "if we can do without MR, then, why not?", do you have a framework in mind that performs/scale as well as MR? Curious.

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Devaraj Das
Matteo, the Master won't spawn the job unless someone actually wants to use the backup/restore. So I'd argue we still don't have a 'hard' dependency - it's still much like the other tools that you consider as being outside the core. From: Matteo Bertozzi

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Jean-Marc Spaggiari
Well, I'm just not using those features ;) But was hopping for the MOBs ;) My point is, if we can do it without MR, then, why not? ) 2016-09-22 19:25 GMT-04:00 Vladimir Rodionov : > Forgot WALPlayer :) > > -Vlad > > On Thu, Sep 22, 2016 at 4:21 PM, Vladimir Rodionov

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
Forgot WALPlayer :) -Vlad On Thu, Sep 22, 2016 at 4:21 PM, Vladimir Rodionov wrote: > >> and > >> backups too, but don't want to bother having to install and configure > YARN > >> just for that, as well as removing resources from HBase to give it to > > Any suggestions

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Vladimir Rodionov
>> and >> backups too, but don't want to bother having to install and configure YARN >> just for that, as well as removing resources from HBase to give it to Any suggestions on how to do bulk data move with transformation from/to HBase cluster w/o MapReduce? Opposition to M/R does not make sense

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Jean-Marc Spaggiari
My 2¢: I have a strong preference for NOT having a dependency on MR anywhere :( I run my HBase cluste without YARN. Just HBase and HDFS. I like all the features that we built. Would love to be able to use MOBs and backups too, but don't want to bother having to install and configure YARN just for

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Matteo Bertozzi
just a remark. my query was not about tools using MR (everyone i think is ok with those). the topic was about: "are we ok with running MR jobs from Master and RSs code?" since this will be the first time we do this Matteo On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Devaraj Das
Very much agree; for tools like ExportSnapshot / Backup / Restore, it's fine to be dependent on MR. MR is the right framework for such. We should also do compactions using MR (just saying :) ) From: Ted Yu Sent: Thursday, September

[jira] [Resolved] (HBASE-16687) Remove MaxPermSize from surefire/failsave command line

2016-09-22 Thread Andrew Purtell (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-16687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-16687. Resolution: Duplicate Assignee: (was: Andrew Purtell) Fix Version/s:

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Ted Yu
I agree - backup / restore is in the same category as import / export. On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell wrote: > Backup is extra tooling around core in my opinion. Like import or export. > Or the optional MOB tool. It's fine. > > > On Sep 22, 2016, at

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
Backup is extra tooling around core in my opinion. Like import or export. Or the optional MOB tool. It's fine. > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi wrote: > > What's the latest opinion around running MR jobs from hbase (Master or RS)? > > I remember in the

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Andrew Purtell
I would be -1 a requirement for MR for something core to HBase. > On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi wrote: > > What's the latest opinion around running MR jobs from hbase (Master or RS)? > > I remember in the past that there was discussion about not having MR

[DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Matteo Bertozzi
What's the latest opinion around running MR jobs from hbase (Master or RS)? I remember in the past that there was discussion about not having MR has direct dependency of hbase. I think some of discussion where around MOB that had a MR job to compact, that later was transformed in a non-MR job to

[jira] [Created] (HBASE-16688) Split TestMasterFailoverWithProcedures

2016-09-22 Thread Matteo Bertozzi (JIRA)
Matteo Bertozzi created HBASE-16688: --- Summary: Split TestMasterFailoverWithProcedures Key: HBASE-16688 URL: https://issues.apache.org/jira/browse/HBASE-16688 Project: HBase Issue Type: Bug

[jira] [Created] (HBASE-16687) Remove MaxPermSize from surefire/failsave command line

2016-09-22 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-16687: -- Summary: Remove MaxPermSize from surefire/failsave command line Key: HBASE-16687 URL: https://issues.apache.org/jira/browse/HBASE-16687 Project: HBase

[jira] [Created] (HBASE-16686) Add latency metrics for REST

2016-09-22 Thread Guang Yang (JIRA)
Guang Yang created HBASE-16686: -- Summary: Add latency metrics for REST Key: HBASE-16686 URL: https://issues.apache.org/jira/browse/HBASE-16686 Project: HBase Issue Type: New Feature

[jira] [Created] (HBASE-16685) Revisit execution of SnapshotCopy in MapReduceBackupCopyService

2016-09-22 Thread Ted Yu (JIRA)
Ted Yu created HBASE-16685: -- Summary: Revisit execution of SnapshotCopy in MapReduceBackupCopyService Key: HBASE-16685 URL: https://issues.apache.org/jira/browse/HBASE-16685 Project: HBase Issue

[jira] [Created] (HBASE-16684) The get() requests does not see locally buffered put() requests when autoflush is disabled

2016-09-22 Thread Haohui Mai (JIRA)
Haohui Mai created HBASE-16684: -- Summary: The get() requests does not see locally buffered put() requests when autoflush is disabled Key: HBASE-16684 URL: https://issues.apache.org/jira/browse/HBASE-16684

Re: [DISCUSSION] Merge Backup / Restore - Branch HBASE-7912

2016-09-22 Thread Sean Busbey
I'd like to see the docs proposed on HBASE-16574 integrated into our project's documentation prior to merge. On Thu, Sep 22, 2016 at 9:02 AM, Ted Yu wrote: > This feature can be marked experimental due to some limitations such as > security. > > Your previous round of

[jira] [Created] (HBASE-16683) Address review comments for backup / restore feature

2016-09-22 Thread Ted Yu (JIRA)
Ted Yu created HBASE-16683: -- Summary: Address review comments for backup / restore feature Key: HBASE-16683 URL: https://issues.apache.org/jira/browse/HBASE-16683 Project: HBase Issue Type: Bug

Re: [DISCUSSION] Merge Backup / Restore - Branch HBASE-7912

2016-09-22 Thread Ted Yu
This feature can be marked experimental due to some limitations such as security. Your previous round of comments have been addressed. Command line tool has gone through: HBASE-16620 Fix backup command-line tool usability issues HBASE-16655 hbase backup describe with incorrect backup id results

Re: [DISCUSSION] Merge Backup / Restore - Branch HBASE-7912

2016-09-22 Thread Stack
On Wed, Sep 21, 2016 at 7:43 AM, Ted Yu wrote: > Are there more (review) comments ? > > Are outstanding comments addressed? I don't see answer to my 'is this experimental/will it be marked experimental' question. I ran into some issues trying to use the feature and

Successful: HBase Generate Website

2016-09-22 Thread Apache Jenkins Server
Build status: Successful If successful, the website and docs have been generated. To update the live site, follow the instructions below. If failed, skip to the bottom of this email. Use the following commands to download the patch and apply it to a clean branch based on origin/asf-site. If

[jira] [Created] (HBASE-16682) Fix Shell tests failure. NoClassDefFoundError for MiniKdc

2016-09-22 Thread Appy (JIRA)
Appy created HBASE-16682: Summary: Fix Shell tests failure. NoClassDefFoundError for MiniKdc Key: HBASE-16682 URL: https://issues.apache.org/jira/browse/HBASE-16682 Project: HBase Issue Type: Bug

[jira] [Created] (HBASE-16681) Fix flaky TestReplicationSourceManagerZkImpl

2016-09-22 Thread Appy (JIRA)
Appy created HBASE-16681: Summary: Fix flaky TestReplicationSourceManagerZkImpl Key: HBASE-16681 URL: https://issues.apache.org/jira/browse/HBASE-16681 Project: HBase Issue Type: Bug