http://git-wip-us.apache.org/repos/asf/hbase/blob/cb77a925/src/main/docbkx/security.xml ---------------------------------------------------------------------- diff --git a/src/main/docbkx/security.xml b/src/main/docbkx/security.xml deleted file mode 100644 index 61493cd..0000000 --- a/src/main/docbkx/security.xml +++ /dev/null @@ -1,1925 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<chapter - version="5.0" - xml:id="security" - xmlns="http://docbook.org/ns/docbook" - xmlns:xlink="http://www.w3.org/1999/xlink" - xmlns:xi="http://www.w3.org/2001/XInclude" - xmlns:svg="http://www.w3.org/2000/svg" - xmlns:m="http://www.w3.org/1998/Math/MathML" - xmlns:html="http://www.w3.org/1999/xhtml" - xmlns:db="http://docbook.org/ns/docbook"> - <!-- -/** - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ ---> - <title>Securing Apache HBase</title> - <para>HBase provides mechanisms to secure various components and aspects of HBase and how it - relates to the rest of the Hadoop infrastructure, as well as clients and resources outside - Hadoop.</para> - <section> - <title>Using Secure HTTP (HTTPS) for the Web UI</title> - <para>A default HBase install uses insecure HTTP connections for web UIs for the master and - region servers. To enable secure HTTP (HTTPS) connections instead, set - <code>hadoop.ssl.enabled</code> to <literal>true</literal> in - <filename>hbase-site.xml</filename>. This does not change the port used by the Web UI. To - change the port for the web UI for a given HBase component, configure that port's setting in - hbase-site.xml. These settings are:</para> - <itemizedlist> - <listitem><para><code>hbase.master.info.port</code></para></listitem> - <listitem><para><code>hbase.regionserver.info.port</code></para></listitem> - </itemizedlist> - <note> - <title>If you enable HTTPS, clients should avoid using the non-secure HTTP connection.</title> - <para>If you enable secure HTTP, clients should connect to HBase using the - <code>https://</code> URL. Clients using the <code>http://</code> URL will receive an HTTP - response of <literal>200</literal>, but will not receive any data. The following exception is logged:</para> - <screen>javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?</screen> - <para>This is because the same port is used for HTTP and HTTPS.</para> - <para>HBase uses Jetty for the Web UI. Without modifying Jetty itself, it does not seem - possible to configure Jetty to redirect one port to another on the same host. See Nick - Dimiduk's contribution on this <link - xlink:href="http://stackoverflow.com/questions/20611815/redirect-from-http-to-https-in-jetty" - >Stack Overflow</link> thread for more information. If you know how to fix this without - opening a second port for HTTPS, patches are appreciated.</para> - </note> - </section> - <section - xml:id="hbase.secure.configuration"> - <title>Secure Client Access to Apache HBase</title> - <para>Newer releases of Apache HBase (>= 0.92) support optional SASL authentication of clients. See also Matteo Bertozzi's article on <link - xlink:href="http://www.cloudera.com/blog/2012/09/understanding-user-authentication-and-authorization-in-apache-hbase/">Understanding - User Authentication and Authorization in Apache HBase</link>.</para> - <para>This describes how to set up Apache HBase and clients for connection to secure HBase - resources.</para> - - <section xml:id="security.prerequisites"> - <title>Prerequisites</title> - <variablelist> - <varlistentry> - <term>Hadoop Authentication Configuration</term> - <listitem> - <para>To run HBase RPC with strong authentication, you must set - <code>hbase.security.authentication</code> to <literal>true</literal>. In this case, - you must also set <code>hadoop.security.authentication</code> to - <literal>true</literal>. Otherwise, you would be using strong authentication for - HBase but not for the underlying HDFS, which would cancel out any benefit.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Kerberos KDC</term> - <listitem> - <para> You need to have a working Kerberos KDC. </para> - <para> A HBase configured for secure client access is expected to be running on top of a - secured HDFS cluster. HBase must be able to authenticate to HDFS services. HBase needs - Kerberos credentials to interact with the Kerberos-enabled HDFS daemons. - Authenticating a service should be done using a keytab file. The procedure for - creating keytabs for HBase service is the same as for creating keytabs for Hadoop. - Those steps are omitted here. Copy the resulting keytab files to wherever HBase Master - and RegionServer processes are deployed and make them readable only to the user - account under which the HBase daemons will run. </para> - <para> A Kerberos principal has three parts, with the form - <code>username/fully.qualified.domain.n...@your-realm.com</code>. We recommend using - <code>hbase</code> as the username portion. </para> - <para> The following is an example of the configuration properties for Kerberos - operation that must be added to the <code>hbase-site.xml</code> file on every server - machine in the cluster. Required for even the most basic interactions with a secure - Hadoop configuration, independent of HBase security. </para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.regionserver.kerberos.principal</name> - <value>hbase/_h...@your-realm.com</value> -</property> -<property> - <name>hbase.regionserver.keytab.file</name> - <value>/etc/hbase/conf/keytab.krb5</value> -</property> -<property> - <name>hbase.master.kerberos.principal</name> - <value>hbase/_h...@your-realm.com</value> -</property> -<property> - <name>hbase.master.keytab.file</name> - <value>/etc/hbase/conf/keytab.krb5</value> -</property> - ]]></programlisting> - <para> Each HBase client user should also be given a Kerberos principal. This principal - should have a password assigned to it (as opposed to a keytab file). The client - principal's <code>maxrenewlife</code> should be set so that it can be renewed enough - times for the HBase client process to complete. For example, if a user runs a - long-running HBase client process that takes at most 3 days, we might create this - user's principal within <code>kadmin</code> with: <code>addprinc -maxrenewlife - 3days</code> - </para> - <para> Long running daemons with indefinite lifetimes that require client access to - HBase can instead be configured to log in from a keytab. For each host running such - daemons, create a keytab with <code>kadmin</code> or <code>kadmin.local</code>. The - procedure for creating keytabs for HBase service is the same as for creating keytabs - for Hadoop. Those steps are omitted here. Copy the resulting keytab files to where the - client daemon will execute and make them readable only to the user account under which - the daemon will run. </para> - </listitem> - </varlistentry> - </variablelist> - </section> - - <section> - <title>Server-side Configuration for Secure Operation</title> - <para>First, refer to <xref linkend="security.prerequisites" /> and ensure that your - underlying HDFS configuration is secure.</para> - <para> Add the following to the <code>hbase-site.xml</code> file on every server machine in - the cluster: </para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.security.authentication</name> - <value>kerberos</value> -</property> -<property> - <name>hbase.security.authorization</name> - <value>true</value> -</property> -<property> -<name>hbase.coprocessor.region.classes</name> - <value>org.apache.hadoop.hbase.security.token.TokenProvider</value> -</property> - ]]></programlisting> - <para> A full shutdown and restart of HBase service is required when deploying these - configuration changes. </para> - </section> - - <section> - <title>Client-side Configuration for Secure Operation</title> - <para>First, refer to <xref linkend="security.prerequisites" /> and ensure that your - underlying HDFS configuration is secure.</para> - <para> Add the following to the <code>hbase-site.xml</code> file on every client: </para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.security.authentication</name> - <value>kerberos</value> -</property> - ]]></programlisting> - <para> The client environment must be logged in to Kerberos from KDC or keytab via the - <code>kinit</code> command before communication with the HBase cluster will be possible. </para> - <para> Be advised that if the <code>hbase.security.authentication</code> in the client- and - server-side site files do not match, the client will not be able to communicate with the - cluster. </para> - <para> Once HBase is configured for secure RPC it is possible to optionally configure - encrypted communication. To do so, add the following to the <code>hbase-site.xml</code> file - on every client: </para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.rpc.protection</name> - <value>privacy</value> -</property> - ]]></programlisting> - <para> This configuration property can also be set on a per connection basis. Set it in the - <code>Configuration</code> supplied to <code>HTable</code>: </para> - <programlisting language="java"> -Configuration conf = HBaseConfiguration.create(); -conf.set("hbase.rpc.protection", "privacy"); -HTable table = new HTable(conf, tablename); - </programlisting> - <para> Expect a ~10% performance penalty for encrypted communication. </para> - </section> - - - <section xml:id="security.client.thrift"> - <title>Client-side Configuration for Secure Operation - Thrift Gateway</title> - <para> Add the following to the <code>hbase-site.xml</code> file for every Thrift gateway: <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.thrift.keytab.file</name> - <value>/etc/hbase/conf/hbase.keytab</value> -</property> -<property> - <name>hbase.thrift.kerberos.principal</name> - <value>$USER/_HOST@HADOOP.LOCALDOMAIN</value> - <!-- TODO: This may need to be HTTP/_HOST@<REALM> and _HOST may not work. - You may have to put the concrete full hostname. - --> -</property> - ]]></programlisting> - </para> - <para> Substitute the appropriate credential and keytab for <replaceable>$USER</replaceable> - and <replaceable>$KEYTAB</replaceable> respectively. </para> - <para>In order to use the Thrift API principal to interact with HBase, it is also necessary to - add the <code>hbase.thrift.kerberos.principal</code> to the <code>_acl_</code> table. For - example, to give the Thrift API principal, <code>thrift_server</code>, administrative - access, a command such as this one will suffice: </para> - <programlisting language="sql"><![CDATA[ -grant 'thrift_server', 'RWCA' - ]]></programlisting> - <para>For more information about ACLs, please see the <link - linkend="hbase.accesscontrol.configuration">Access Control</link> section </para> - - <para> The Thrift gateway will authenticate with HBase using the supplied credential. No - authentication will be performed by the Thrift gateway itself. All client access via the - Thrift gateway will use the Thrift gateway's credential and have its privilege. </para> - </section> - <section xml:id="security.gateway.thrift"> - <title>Configure the Thrift Gateway to Authenticate on Behalf of the Client</title> - <para><xref linkend="security.client.thrift"/> describes how to authenticate a Thrift client - to HBase using a fixed user. As an alternative, you can configure the Thrift gateway to - authenticate to HBase on the client's behalf, and to access HBase using a proxy user. This - was implemented in <link xlink:href="https://issues.apache.org/jira/browse/HBASE-11349" - >HBASE-11349</link> for Thrift 1, and <link - xlink:href="https://issues.apache.org/jira/browse/HBASE-11474">HBASE-11474</link> for - Thrift 2.</para> - <note> - <title>Limitations with Thrift Framed Transport</title> - <para>If you use framed transport, you cannot yet take advantage of this feature, because - SASL does not work with Thrift framed transport at this time.</para> - </note> - <para>To enable it, do the following.</para> - <procedure> - <step> - <para>Be sure Thrift is running in secure mode, by following the procedure described in - <xref linkend="security.client.thrift"/>.</para> - </step> - <step> - <para>Be sure that HBase is configured to allow proxy users, as described in <xref - linkend="security.rest.gateway"/>.</para> - </step> - <step> - <para>In <filename>hbase-site.xml</filename> for each cluster node running a Thrift - gateway, set the property <code>hbase.thrift.security.qop</code> to one of the following - three values:</para> - <itemizedlist> - <listitem> - <para><literal>auth-conf</literal> - authentication, integrity, and confidentiality - checking</para> - </listitem> - <listitem> - <para><literal>auth-int</literal> - authentication and integrity checking</para> - </listitem> - <listitem> - <para><literal>auth</literal> - authentication checking only</para> - </listitem> - </itemizedlist> - </step> - <step> - <para>Restart the Thrift gateway processes for the changes to take effect. If a node is - running Thrift, the output of the <command>jps</command> command will list a - <code>ThriftServer</code> process. To stop Thrift on a node, run the command - <command>bin/hbase-daemon.sh stop thrift</command>. To start Thrift on a node, run the - command <command>bin/hbase-daemon.sh start thrift</command>.</para> - </step> - </procedure> - </section> - - <section> - <title>Client-side Configuration for Secure Operation - REST Gateway</title> - <para> Add the following to the <code>hbase-site.xml</code> file for every REST gateway: </para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.rest.keytab.file</name> - <value>$KEYTAB</value> -</property> -<property> - <name>hbase.rest.kerberos.principal</name> - <value>$USER/_HOST@HADOOP.LOCALDOMAIN</value> -</property> - ]]></programlisting> - <para> Substitute the appropriate credential and keytab for <replaceable>$USER</replaceable> - and <replaceable>$KEYTAB</replaceable> respectively. </para> - <para> The REST gateway will authenticate with HBase using the supplied credential. No - authentication will be performed by the REST gateway itself. All client access via the REST - gateway will use the REST gateway's credential and have its privilege. </para> - <para>In order to use the REST API principal to interact with HBase, it is also necessary to - add the <code>hbase.rest.kerberos.principal</code> to the <code>_acl_</code> table. For - example, to give the REST API principal, <code>rest_server</code>, administrative access, a - command such as this one will suffice: </para> - <programlisting language="sql"><![CDATA[ -grant 'rest_server', 'RWCA' - ]]></programlisting> - <para>For more information about ACLs, please see the <link - linkend="hbase.accesscontrol.configuration">Access Control</link> section </para> - <para> It should be possible for clients to authenticate with the HBase cluster through the - REST gateway in a pass-through manner via SPEGNO HTTP authentication. This is future work. - </para> - </section> - - <section xml:id="security.rest.gateway"> - <title>REST Gateway Impersonation Configuration</title> - <para> By default, the REST gateway doesn't support impersonation. It accesses the HBase on - behalf of clients as the user configured as in the previous section. To the HBase server, - all requests are from the REST gateway user. The actual users are unknown. You can turn on - the impersonation support. With impersonation, the REST gateway user is a proxy user. The - HBase server knows the acutal/real user of each request. So it can apply proper - authorizations. </para> - <para> To turn on REST gateway impersonation, we need to configure HBase servers (masters and - region servers) to allow proxy users; configure REST gateway to enable impersonation. </para> - <para> To allow proxy users, add the following to the <code>hbase-site.xml</code> file for - every HBase server: </para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hadoop.security.authorization</name> - <value>true</value> -</property> -<property> - <name>hadoop.proxyuser.$USER.groups</name> - <value>$GROUPS</value> -</property> -<property> - <name>hadoop.proxyuser.$USER.hosts</name> - <value>$GROUPS</value> -</property> - ]]></programlisting> - <para> Substitute the REST gateway proxy user for $USER, and the allowed group list for - $GROUPS. </para> - <para> To enable REST gateway impersonation, add the following to the - <code>hbase-site.xml</code> file for every REST gateway. </para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.rest.authentication.type</name> - <value>kerberos</value> -</property> -<property> - <name>hbase.rest.authentication.kerberos.principal</name> - <value>HTTP/_HOST@HADOOP.LOCALDOMAIN</value> -</property> -<property> - <name>hbase.rest.authentication.kerberos.keytab</name> - <value>$KEYTAB</value> -</property> - ]]></programlisting> - <para> Substitute the keytab for HTTP for $KEYTAB. </para> - </section> - - </section> - <!-- Secure Client Access to HBase --> - - <section - xml:id="hbase.secure.simpleconfiguration"> - <title>Simple User Access to Apache HBase</title> - <para>Newer releases of Apache HBase (>= 0.92) support optional SASL authentication of clients. See also Matteo Bertozzi's article on <link - xlink:href="http://www.cloudera.com/blog/2012/09/understanding-user-authentication-and-authorization-in-apache-hbase/">Understanding - User Authentication and Authorization in Apache HBase</link>.</para> - <para>This describes how to set up Apache HBase and clients for simple user access to HBase - resources.</para> - - <section> - <title>Simple Versus Secure Access</title> - <para> The following section shows how to set up simple user access. Simple user access is not - a secure method of operating HBase. This method is used to prevent users from making - mistakes. It can be used to mimic the Access Control using on a development system without - having to set up Kerberos. </para> - <para> This method is not used to prevent malicious or hacking attempts. To make HBase secure - against these types of attacks, you must configure HBase for secure operation. Refer to the - section <link - linkend="hbase.accesscontrol.configuration">Secure Client Access to HBase</link> and - complete all of the steps described there. </para> - - <section> - <title>Prerequisites</title> - <para> None </para> - - <section> - <title>Server-side Configuration for Simple User Access Operation</title> - <para> Add the following to the <code>hbase-site.xml</code> file on every server machine - in the cluster: </para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.security.authentication</name> - <value>simple</value> -</property> -<property> - <name>hbase.security.authorization</name> - <value>true</value> -</property> -<property> - <name>hbase.coprocessor.master.classes</name> - <value>org.apache.hadoop.hbase.security.access.AccessController</value> -</property> -<property> - <name>hbase.coprocessor.region.classes</name> - <value>org.apache.hadoop.hbase.security.access.AccessController</value> -</property> -<property> - <name>hbase.coprocessor.regionserver.classes</name> - <value>org.apache.hadoop.hbase.security.access.AccessController</value> -</property> - ]]></programlisting> - <para> For 0.94, add the following to the <code>hbase-site.xml</code> file on every server - machine in the cluster: </para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.rpc.engine</name> - <value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value> -</property> -<property> - <name>hbase.coprocessor.master.classes</name> - <value>org.apache.hadoop.hbase.security.access.AccessController</value> -</property> -<property> - <name>hbase.coprocessor.region.classes</name> - <value>org.apache.hadoop.hbase.security.access.AccessController</value> -</property> - ]]></programlisting> - <para> A full shutdown and restart of HBase service is required when deploying these - configuration changes. </para> - </section> - - <section> - <title>Client-side Configuration for Simple User Access Operation</title> - <para> Add the following to the <code>hbase-site.xml</code> file on every client: </para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.security.authentication</name> - <value>simple</value> -</property> - ]]></programlisting> - <para> For 0.94, add the following to the <code>hbase-site.xml</code> file on every server - machine in the cluster: </para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.rpc.engine</name> - <value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value> -</property> - ]]></programlisting> - <para> Be advised that if the <code>hbase.security.authentication</code> in the client- - and server-side site files do not match, the client will not be able to communicate with - the cluster. </para> - </section> - - <section> - <title>Client-side Configuration for Simple User Access Operation - Thrift Gateway</title> - <para>The Thrift gateway user will need access. For example, to give the Thrift API user, - <code>thrift_server</code>, administrative access, a command such as this one will - suffice: </para> - <programlisting language="sql"><![CDATA[ -grant 'thrift_server', 'RWCA' - ]]></programlisting> - <para>For more information about ACLs, please see the <link - linkend="hbase.accesscontrol.configuration">Access Control</link> section </para> - - <para> The Thrift gateway will authenticate with HBase using the supplied credential. No - authentication will be performed by the Thrift gateway itself. All client access via the - Thrift gateway will use the Thrift gateway's credential and have its privilege. </para> - </section> - - <section> - <title>Client-side Configuration for Simple User Access Operation - REST Gateway</title> - - <para> The REST gateway will authenticate with HBase using the supplied credential. No - authentication will be performed by the REST gateway itself. All client access via the - REST gateway will use the REST gateway's credential and have its privilege. </para> - <para>The REST gateway user will need access. For example, to give the REST API user, - <code>rest_server</code>, administrative access, a command such as this one will - suffice: </para> - <programlisting language="sql"><![CDATA[ -grant 'rest_server', 'RWCA' - ]]></programlisting> - <para>For more information about ACLs, please see the <link - linkend="hbase.accesscontrol.configuration">Access Control</link> section </para> - <para> It should be possible for clients to authenticate with the HBase cluster through - the REST gateway in a pass-through manner via SPEGNO HTTP authentication. This is future - work. </para> - </section> - </section> - </section> - - </section> - <!-- Simple User Access to Apache HBase --> - - <section> - <title>Securing Access To Your Data</title> - <para>After you have configured secure authentication between HBase client and server processes - and gateways, you need to consider the security of your data itself. HBase provides several - strategies for securing your data:</para> - <itemizedlist> - <listitem> - <para>Role-based Access Control (RBAC) controls which users or groups can read and write to - a given HBase resource or execute a coprocessor endpoint, using the familiar paradigm of - roles.</para> - </listitem> - <listitem> - <para>Visibility Labels which allow you to label cells and control access to labelled cells, - to further restrict who can read or write to certain subsets of your data. Visibility - labels are stored as tags. See <xref linkend="hbase.tags"/> for more information.</para> - </listitem> - <listitem> - <para>Transparent encryption of data at rest on the underlying filesystem, both in HFiles - and in the WAL. This protects your data at rest from an attacker who has access to the - underlying filesystem, without the need to change the implementation of the client. It can - also protect against data leakage from improperly disposed disks, which can be important - for legal and regulatory compliance.</para> - </listitem> - </itemizedlist> - <para>Server-side configuration, administration, and implementation details of each of these - features are discussed below, along with any performance trade-offs. An example security - configuration is given at the end, to show these features all used together, as they might be - in a real-world scenario.</para> - <caution> - <para>All aspects of security in HBase are in active development and evolving rapidly. Any - strategy you employ for security of your data should be thoroughly tested. In addition, some - of these features are still in the experimental stage of development. To take advantage of - many of these features, you must be running HBase 0.98+ and using the HFile v3 file - format.</para> - </caution> - - <warning> - <title>Protecting Sensitive Files</title> - <para>Several procedures in this section require you to copy files between cluster nodes. When - copying keys, configuration files, or other files containing sensitive strings, use a secure - method, such as <code>ssh</code>, to avoid leaking sensitive data.</para> - </warning> - - <procedure xml:id="security.data.basic.server.side"> - <title>Basic Server-Side Configuration</title> - <step> - <para>Enable HFile v3, by setting <option>hfile.format.version </option>to 3 in - <filename>hbase-site.xml</filename>. This is the default for HBase 1.0 and - newer.</para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hfile.format.version</name> - <value>3</value> -</property> - ]]></programlisting> - </step> - <step> - <para>Enable SASL and Kerberos authentication for RPC and ZooKeeper, as described in <xref - linkend="security.prerequisites"/> and <xref linkend="zk.sasl.auth"/>.</para> - </step> - </procedure> - - <section xml:id="hbase.tags"> - <title>Tags</title> - <para><firstterm>Tags</firstterm> are a feature of HFile v3. A tag is a piece of metadata - which is part of a cell, separate from the key, value, and version. Tags are an - implementation detail which provides a foundation for other security-related features such - as cell-level ACLs and visibility labels. Tags are stored in the HFiles themselves. It is - possible that in the future, tags will be used to implement other HBase features. You don't - need to know a lot about tags in order to use the security features they enable.</para> - <section> - <title>Implementation Details</title> - <para> Every cell can have zero or more tags. Every tag has a type and the actual tag byte - array.</para> - <para> Just as row keys, column families, qualifiers and values can be encoded (see <xref - linkend="data.block.encoding.types"/>), tags can also be encoded as well. You can enable - or disable tag encoding at the level of the column family, and it is enabled by default. - Use the <code>HColumnDescriptor#setCompressionTags(boolean compressTags)</code> method to - manage encoding settings on a column family. You also need to enable the DataBlockEncoder - for the column family, for encoding of tags to take effect.</para> - <para>You can enable compression of each tag in the WAL, if WAL compression is also enabled, - by setting the value of <option>hbase.regionserver.wal.tags.enablecompression</option> to - <literal>true</literal> in <filename>hbase-site.xml</filename>. Tag compression uses - dictionary encoding.</para> - <para>Tag compression is not supported when using WAL encryption.</para> - </section> - </section> - - <section xml:id="hbase.accesscontrol.configuration"> - <title>Access Control Labels (ACLs)</title> - <section> - <title>How It Works</title> - <para>ACLs in HBase are based upon a user's membership in or exclusion from groups, and a - given group's permissions to access a given resource. ACLs are implemented as a - coprocessor called AccessController.</para> - <para>HBase does not maintain a private group mapping, but relies on a <firstterm>Hadoop - group mapper</firstterm>, which maps between entities in a directory such as LDAP or - Active Directory, and HBase users. Any supported Hadoop group mapper will work. Users are - then granted specific permissions (Read, Write, Execute, Create, Admin) against resources - (global, namespaces, tables, cells, or endpoints).</para> - <note> - <para> With Kerberos and Access Control enabled, client access to HBase is authenticated - and user data is private unless access has been explicitly granted.</para> - </note> - <para>HBase has a simpler security model than relational databases, especially in terms of - client operations. No distinction is made between an insert (new record) and update (of - existing record), for example, as both collapse down into a Put.</para> - <section> - <title>Understanding Access Levels</title> - <para>HBase access levels are granted independently of each other and allow for different - types of operations at a given scope.</para> - <itemizedlist> - <listitem> - <para>Read (R) - can read data at the given scope</para> - </listitem> - <listitem> - <para><command>Write (W)</command> - can write data at the given scope</para> - </listitem> - <listitem> - <para><command>Execute (X)</command> - can execute coprocessor endpoints at the given - scope</para> - </listitem> - <listitem> - <para><command>Create (C)</command> - can create tables or drop tables (even those - they did not create) at the given scope</para> - </listitem> - <listitem> - <para><command>Admin (A)</command> - can perform cluster operations such as balancing - the cluster or assigning regions at the given scope</para> - </listitem> - </itemizedlist> - <para>The possible scopes are:</para> - <itemizedlist> - <listitem> - <para><command>Superuser</command> - superusers can perform any operation available in - HBase, to any resource. The user who runs HBase on your cluster is a superuser, as - are any principals assigned to the configuration property - <code>hbase.superuser</code> in <filename>hbase-site.xml</filename> on the - HMaster.</para> - </listitem> - <listitem> - <para><command>Global</command> - permissions granted at <filename>global</filename> - scope allow the admin to operate on all tables of the cluster.</para> - </listitem> - <listitem> - <para><command>Namespace</command> - permissions granted at - <filename>namespace</filename> scope apply to all tables within a given - namespace.</para> - </listitem> - <listitem> - <para><command>Table</command> - permissions granted at <filename>table</filename> - scope apply to data or metadata within a given table.</para> - </listitem> - <listitem> - <para><command>ColumnFamily</command> - permissions granted at - <filename>ColumnFamily</filename> scope apply to cells within that - ColumnFamily.</para> - </listitem> - <listitem> - <para><command>Cell</command> - permissions granted at <filename>cell</filename> scope - apply to that exact cell coordinate (key, value, timestamp). This allows for policy - evolution along with data.</para> - <para>To change an ACL on a specific cell, write an updated cell with new ACL to the - precise coordinates of the original.</para> - <para>If you have a multi-versioned schema and want to update ACLs on all visible - versions, you need to write new cells for all visible versions. The application - has complete control over policy evolution.</para> - <para>The exception to the above rule is <code>append</code> and - <code>increment</code> processing. Appends and increments can carry an ACL in the - operation. If one is included in the operation, then it will be applied to the - result of the <code>append</code> or <code>increment</code>. Otherwise, the ACL of - the existing cell you are appending to or incrementing is preserved.</para> - </listitem> - </itemizedlist> - <para>The combination of access levels and scopes creates a matrix of possible access - levels that can be granted to a user. In a production environment, it is useful to think - of access levels in terms of what is needed to do a specific job. The following list - describes appropriate access levels for some common types of HBase users. It is - important not to grant more access than is required for a given user to perform their - required tasks.</para> - <itemizedlist> - <listitem> - <para>Superusers - In a production system, only the HBase user should have superuser - access. In a development environment, an administrator may need superuser access in - order to quickly control and manage the cluster. However, this type of administrator - should usually be a Global Admin rather than a superuser.</para> - </listitem> - <listitem> - <para>Global Admins - A global admin can perform tasks and access every table in - HBase. In a typical production environment, an admin should not have Read or Write - permissions to data within tables.</para> - <itemizedlist> - <listitem> - <para>A global admin with Admin permissions can perform cluster-wide operations on - the cluster, such as balancing, assigning or unassigning regions, or calling an - explicit major compaction. This is an operations role.</para> - </listitem> - <listitem> - <para>A global admin with Create permissions can create or drop any table within - HBase. This is more of a DBA-type role.</para> - </listitem> - </itemizedlist> - <para>In a production environment, it is likely that different users will have only - one of Admin and Create permissions.</para> - <warning> - <para>In the current implementation, a Global Admin with <code>Admin</code> - permission can grant himself <code>Read</code> and <code>Write</code> permissions - on a table and gain access to that table's data. For this reason, only grant - <code>Global Admin</code> permissions to trusted user who actually need - them.</para> - <para>Also be aware that a <code>Global Admin</code> with <code>Create</code> - permission can perform a <code>Put</code> operation on the ACL table, simulating a - <code>grant</code> or <code>revoke</code> and circumventing the authorization - check for <code>Global Admin</code> permissions.</para> - <para>Due to these issues, be cautious with granting <code>Global Admin</code> - privileges.</para> - </warning> - </listitem> - <listitem> - <para><command>Namespace Admins</command> - a namespace admin with <code>Create</code> - permissions can create or drop tables within that namespace, and take and restore - snapshots. A namespace admin with <code>Admin</code> permissions can perform - operations such as splits or major compactions on tables within that - namespace.</para> - </listitem> - <listitem> - <para><command>Table Admins</command> - A table admin can perform administrative - operations only on that table. A table admin with <code>Create</code> permissions - can create snapshots from that table or restore that table from a snapshot. A table - admin with <code>Admin</code> permissions can perform operations such as splits or - major compactions on that table.</para> - </listitem> - <listitem> - <para><command>Users</command> - Users can read or write data, or both. Users can also - execute coprocessor endpoints, if given <code>Executable</code> permissions.</para> - </listitem> - </itemizedlist> - <table> - <title>Real-World Example of Access Levels</title> - <tgroup cols="4"> - <thead> - <row> - <entry>Job Title</entry> - <entry>Scope</entry> - <entry>Permissions</entry> - <entry>Description</entry> - </row> - </thead> - <tbody> - <row> - <entry><para>Senior Administrator</para></entry> - <entry><para>Global</para></entry> - <entry><para>Access, Create</para></entry> - <entry><para>Manages the cluster and gives access to Junior - Administrators.</para></entry> - </row> - <row> - <entry><para>Junior Administrator</para></entry> - <entry><para>Global</para></entry> - <entry><para>Create</para></entry> - <entry><para>Creates tables and gives access to Table - Administrators.</para></entry> - </row> - <row> - <entry><para>Table Administrator</para></entry> - <entry><para>Table</para></entry> - <entry><para>Access</para></entry> - <entry><para>Maintains a table from an operations point of view.</para></entry> - </row> - <row> - <entry><para>Data Analyst</para></entry> - <entry><para>Table</para></entry> - <entry><para>Read</para></entry> - <entry><para>Creates reports from HBase data.</para></entry> - </row> - <row> - <entry><para>Web Application</para></entry> - <entry><para>Table</para></entry> - <entry><para>Read, Write</para></entry> - <entry><para>Puts data into HBase and uses HBase data to perform - operations.</para></entry> - </row> - </tbody> - </tgroup> - <caption><para>This table shows how real-world titles might map to HBase permissions in - a hypothetical company.</para></caption> - - </table> - <formalpara> - <title>ACL Matrix</title> - <para>For more details on how ACLs map to specific HBase operations and tasks, see <xref - linkend="appendix_acl_matrix"/>.</para> - </formalpara> - </section> - <section> - <title>Implementation Details</title> - <para>Cell-level ACLs are implemented using tags (see <xref linkend="hbase.tags"/>). In - order to use cell-level ACLs, you must be using HFile v3 and HBase 0.98 or newer.</para> - <orderedlist> - <title>ACL Implementation Caveats</title> - <listitem> - <para>Files created by HBase are owned by the operating system user running the HBase - process. To interact with HBase files, you should use the API or bulk load - facility.</para> - </listitem> - <listitem> - <para>HBase does not model "roles" internally in HBase. Instead, group names can be - granted permissions. This allows external modeling of roles via group membership. - Groups are created and manipulated externally to HBase, via the Hadoop group mapping - service.</para> - </listitem> - </orderedlist> - </section> - <section> - <title>Server-Side Configuration</title> - <procedure> - <step> - <para>As a prerequisite, perform the steps in <xref - linkend="security.data.basic.server.side"/>.</para> - </step> - <step> - <para>Install and configure the AccessController coprocessor, by setting the following - properties in <filename>hbase-site.xml</filename>. These properties take a list of - classes. </para> - <note> - <para>If you use the AccessController along with the VisibilityController, the - AccessController must come first in the list, because with both components active, - the VisibilityController will delegate access control on its system tables to the - AccessController. For an example of using both together, see <xref - linkend="security.example.config"/>.</para> - </note> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.coprocessor.region.classes</name> - <value>org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.security.token.TokenProvider</value> -</property> -<property> - <name>hbase.coprocessor.master.classes</name> - <value>org.apache.hadoop.hbase.security.access.AccessController</value> -</property> -<property> - <name>hbase.coprocessor.regionserver.classes</name> - <value>org.apache.hadoop.hbase.security.access.AccessController</value> -</property> -<property> - <name>hbase.security.exec.permission.checks</name> - <value>true</value> -</property> - ]]></programlisting> - <para>Optionally, you can enable transport security, by setting - <option>hbase.rpc.protection</option> to <literal>auth-conf</literal>. This - requires HBase 0.98.4 or newer.</para> - </step> - <step> - <para>Set up the Hadoop group mapper in the Hadoop namenode's - <filename>core-site.xml</filename>. This is a Hadoop file, not an HBase file. - Customize it to your site's needs. Following is an example.</para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hadoop.security.group.mapping</name> - <value>org.apache.hadoop.security.LdapGroupsMapping</value> -</property> - -<property> - <name>hadoop.security.group.mapping.ldap.url</name> - <value>ldap://server</value> -</property> - -<property> - <name>hadoop.security.group.mapping.ldap.bind.user</name> - <value>Administrator@example-ad.local</value> -</property> - -<property> - <name>hadoop.security.group.mapping.ldap.bind.password</name> - <value>****</value> -</property> - -<property> - <name>hadoop.security.group.mapping.ldap.base</name> - <value>dc=example-ad,dc=local</value> -</property> - -<property> - <name>hadoop.security.group.mapping.ldap.search.filter.user</name> - <value>(&(objectClass=user)(sAMAccountName={0}))</value> -</property> - -<property> - <name>hadoop.security.group.mapping.ldap.search.filter.group</name> - <value>(objectClass=group)</value> -</property> - -<property> - <name>hadoop.security.group.mapping.ldap.search.attr.member</name> - <value>member</value> -</property> - -<property> - <name>hadoop.security.group.mapping.ldap.search.attr.group.name</name> - <value>cn</value> -</property>]]> - </programlisting> - </step> - <step> - <para>Optionally, enable the early-out evaluation strategy. Prior to HBase 0.98.0, if - a user was not granted access to a column family, or at least a column qualifier, an - AccessDeniedException would be thrown. HBase 0.98.0 removed this exception in order - to allow cell-level exceptional grants. To restore the old behavior in HBase - 0.98.0-0.98.6, set <option>hbase.security.access.early_out</option> to - <literal>true</literal> in <filename>hbase-site.xml</filename>. In HBase 0.98.6, - the default has been returned to <literal>true</literal>.</para> - </step> - <step> - <para>Distribute your configuration and restart your cluster for changes to take - effect.</para> - </step> - <step> - <para>To test your configuration, log into HBase Shell as a given user and use the - <command>whoami</command> command to report the groups your user is part of. In - this example, the user is reported as being a member of the <code>services</code> - group.</para> - <screen> -hbase> <userinput>whoami</userinput> -<computeroutput>service (auth:KERBEROS) - groups: services</computeroutput> - </screen> - </step> - </procedure> - </section> - <section> - <title>Administration</title> - <para>Administration tasks can be performed from HBase Shell or via an API.</para> - <caution> - <title>API Examples</title> - <para>Many of the API examples below are taken from source files - <filename>hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java</filename> - and - <filename>hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java</filename>.</para> - <para>Neither the examples, nor the source files they are taken from, are part of the - public HBase API, and are provided for illustration only. Refer to the official API - for usage instructions.</para> - </caution> - <procedure> - <step> - <title>User and Group Administration</title> - <para>Users and groups are maintained external to HBase, in your directory.</para> - </step> - <step> - <title>Granting Access To A Namespace, Table, Column Family, or Cell</title> - <para>There are a few different types of syntax for grant statements. The first, and - most familiar, is as follows, with the table and column family being - optional:</para> - <screen>grant 'user', 'RWXCA', 'TABLE', 'CF', 'CQ'</screen> - <para>Groups and users are granted access in the same way, but groups are prefixed - with an <literal>@</literal> symbol. In the same way, tables and namespaces are - specified in the same way, but namespaces are prefixed with an <literal>@</literal> - symbol.</para> - <para>It is also possible to grant multiple permissions against the same resource in a - single statement, as in this example. The first sub-clause maps users to ACLs and - the second sub-clause specifies the resource.</para> - <note> - <para>HBase Shell support for granting and revoking access at the cell level is for - testing and verification support, and should not be employed for production use - because it won't apply the permissions to cells that don't exist yet. The correct - way to apply cell level permissions is to do so in the application code when - storing the values.</para> - </note> - <formalpara> - <title>ACL Granularity and Evaluation Order</title> - <para>ACLs are evaluated from least granular to most granular, and when an ACL is - reached that grants permission, evaluation stops. This means that cell ACLs do not - override ACLs at less granularity.</para> - </formalpara> - <example> - <title>HBase Shell</title> - <itemizedlist> - <listitem> - <para>Global:</para> - <screen>hbase> <userinput>grant '@admins', 'RWXCA'</userinput></screen> - </listitem> - <listitem> - <para>Namespace:</para> - <screen>hbase> <userinput>grant 'service', 'RWXCA', '@test-NS'</userinput></screen> - </listitem> - <listitem> - <para>Table:</para> - <screen>hbase> <userinput>grant 'service', 'RWXCA', 'user'</userinput></screen> - </listitem> - <listitem> - <para>Column Family:</para> - <screen>hbase> <userinput>grant '@developers', 'RW', 'user', 'i'</userinput></screen> - </listitem> - <listitem> - <para>Column Qualifier:</para> - <screen>hbase> <userinput>grant 'service, 'RW', 'user', 'i', 'foo'</userinput></screen> - </listitem> - <listitem> - <para>Cell:</para> - <para>The syntax for granting cell ACLs uses the following syntax:</para> - <screen>grant <replaceable><table></replaceable>, \ - { '<replaceable><user-or-group></replaceable>' => \ - '<replaceable><permissions></replaceable>', ... }, \ - { <replaceable><scanner-specification></replaceable> }</screen> - <itemizedlist> - <listitem> - <para><replaceable><user-or-group></replaceable> is the user or group - name, prefixed with <literal>@</literal> in the case of a group.</para> - </listitem> - <listitem> - <para><replaceable><permissions></replaceable> is a string containing - any or all of "RWXCA", though only R and W are meaningful at cell - scope.</para> - </listitem> - <listitem> - <para><replaceable><scanner-specification></replaceable> is the - scanner specification syntax and conventions used by the 'scan' shell - command. For some examples of scanner specifications, issue the following - HBase Shell command.</para> - <screen>hbase> help "scan"</screen> - </listitem> - </itemizedlist> - <para>This example grants read access to the 'testuser' user and read/write - access to the 'developers' group, on cells in the 'pii' column which match the - filter.</para> - <screen>hbase> grant 'user', \ - { '@developers' => 'RW', 'testuser' => 'R' }, \ - { COLUMNS => 'pii', FILTER => "(PrefixFilter ('test'))" }</screen> - <para>The shell will run a scanner with the given criteria, rewrite the found - cells with new ACLs, and store them back to their exact coordinates.</para> - </listitem> - </itemizedlist> - </example> - <example> - <title>API</title> - <para>The following example shows how to grant access at the table level.</para> - <programlisting language="java"><![CDATA[ -public static void grantOnTable(final HBaseTestingUtility util, final String user, - final TableName table, final byte[] family, final byte[] qualifier, - final Permission.Action... actions) throws Exception { - SecureTestUtil.updateACLs(util, new Callable<Void>() { - @Override - public Void call() throws Exception { - HTable acl = new HTable(util.getConfiguration(), AccessControlLists.ACL_TABLE_NAME); - try { - BlockingRpcChannel service = acl.coprocessorService(HConstants.EMPTY_START_ROW); - AccessControlService.BlockingInterface protocol = - AccessControlService.newBlockingStub(service); - ProtobufUtil.grant(protocol, user, table, family, qualifier, actions); - } finally { - acl.close(); - } - return null; - } - }); -} ]]> - </programlisting> - <para>To grant permissions at the cell level, you can use the - <code>Mutation.setACL</code> method:</para> - <programlisting language="java"><![CDATA[ -Mutation.setACL(String user, Permission perms) -Mutation.setACL(Map<String, Permission> perms) - ]]> - </programlisting> - <para>Specifically, this example provides read permission to a user called - <literal>user1</literal> on any cells contained in a particular Put - operation:</para> - <programlisting language="java"><![CDATA[ -put.setACL(âuser1â, new Permission(Permission.Action.READ)) - ]]></programlisting> - </example> - </step> - <step> - <title>Revoking Access Control From a Namespace, Table, Column Family, or Cell</title> - <para>The <command>revoke</command> command and API are twins of the grant command and - API, and the syntax is exactly the same. The only exception is that you cannot - revoke permissions at the cell level. You can only revoke access that has previously - been granted, and a <command>revoke</command> statement is not the same thing as - explicit denial to a resource.</para> - <note> - <para>HBase Shell support for granting and revoking access is for testing and - verification support, and should not be employed for production use because it - won't apply the permissions to cells that don't exist yet. The correct way to - apply cell-level permissions is to do so in the application code when storing the - values.</para> - </note> - <example> - <title>Revoking Access To a Table</title> - <programlisting language="java"> -<![CDATA[public static void revokeFromTable(final HBaseTestingUtility util, final String user, - final TableName table, final byte[] family, final byte[] qualifier, - final Permission.Action... actions) throws Exception { - SecureTestUtil.updateACLs(util, new Callable<Void>() { - @Override - public Void call() throws Exception { - HTable acl = new HTable(util.getConfiguration(), AccessControlLists.ACL_TABLE_NAME); - try { - BlockingRpcChannel service = acl.coprocessorService(HConstants.EMPTY_START_ROW); - AccessControlService.BlockingInterface protocol = - AccessControlService.newBlockingStub(service); - ProtobufUtil.revoke(protocol, user, table, family, qualifier, actions); - } finally { - acl.close(); - } - return null; - } - }); -} ]]> - </programlisting> - </example> - </step> - <step> - <title>Showing a User's Effective Permissions</title> - <example> - <title>HBase Shell</title> - <screen>hbase> user_permission 'user'</screen> - <screen>hbase> user_permission '.*'</screen> - <screen>hbase> user_permission <replaceable>JAVA_REGEX</replaceable></screen> - </example> - <example> - <title>API</title> - <programlisting language="java"><![CDATA[ -public static void verifyAllowed(User user, AccessTestAction action, int count) throws Exception { - try { - Object obj = user.runAs(action); - if (obj != null && obj instanceof List<?>) { - List<?> results = (List<?>) obj; - if (results != null && results.isEmpty()) { - fail("Empty non null results from action for user '" + user.getShortName() + "'"); - } - assertEquals(count, results.size()); - } - } catch (AccessDeniedException ade) { - fail("Expected action to pass for user '" + user.getShortName() + "' but was denied"); - } -}]]> - </programlisting> - </example> - </step> - </procedure> - </section> - </section> - </section> - - <section> - <title>Visibility Labels</title> - <para>Visibility labels control can be used to only permit users or principals associated with - a given label to read or access cells with that label. For instance, you might label a cell - <literal>top-secret</literal>, and only grant access to that label to the - <literal>managers</literal> group. Visibility labels are implemented using Tags, which are - a feature of HFile v3, and allow you to store metadata on a per-cell basis. A label is a - string, and labels can be combined into expressions by using logical operators (&, |, or - !), and using parentheses for grouping. HBase does not do any kind of validation of - expressions beyond basic well-formedness. Visibility labels have no meaning on their own, - and may be used to denote sensitivity level, privilege level, or any other arbitrary - semantic meaning.</para> - <para>If a user's labels do not match a cell's label or expression, the user is - denied access to the cell.</para> - <para>In HBase 0.98.6 and newer, UTF-8 encoding is supported for visibility labels and - expressions. When creating labels using the <code>addLabels(conf, labels)</code> method - provided by the <code>org.apache.hadoop.hbase.security.visibility.VisibilityClient</code> - class and passing labels in Authorizations via Scan or Get, labels can contain UTF-8 - characters, as well as the logical operators normally used in visibility labels, with normal - Java notations, without needing any escaping method. However, when you pass a CellVisibility - expression via a Mutation, you must enclose the expression with the - <code>CellVisibility.quote()</code> method if you use UTF-8 characters or logical - operators. See <code>TestExpressionParser</code> and the source file - <filename>hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestScan.java</filename>. - </para> - <para>A user adds visibility expressions to a cell during a Put operation. In the default - configuration, the user does not need to access to a label in order to label cells with it. - This behavior is controlled by the configuration option - <option>hbase.security.visibility.mutations.checkauths</option>. If you set this option to - <literal>true</literal>, the labels the user is modifying as part of the mutation must be - associated with the user, or the mutation will fail. Whether a user is authorized to read a - labelled cell is determined during a Get or Scan, and results which the user is not allowed - to read are filtered out. This incurs the same I/O penalty as if the results were returned, - but reduces load on the network.</para> - <para>Visibility labels can also be specified during Delete operations. For details about - visibility labels and Deletes, see <link - xlink:href="https://issues.apache.org/jira/browse/HBASE-10885">HBASE-10885</link>. </para> - <para>The user's effective label set is built in the RPC context when a request is first - received by the RegionServer. The way that users are associated with labels is pluggable. - The default plugin passes through labels specified in Authorizations added to the Get or - Scan and checks those against the calling user's authenticated labels list. When the client - passes labels for which the user is not authenticated, the default plugin drops them. You - can pass a subset of user authenticated labels via the - <code>Get#setAuthorizations(Authorizations(String,...))</code> and - <code>Scan#setAuthorizations(Authorizations(String,...));</code> methods. </para> - <para>Visibility label access checking is performed by the VisibilityController coprocessor. - You can use interface <code>VisibilityLabelService</code> to provide a custom implementation - and/or control the way that visibility labels are stored with cells. See the source file - <filename>hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithCustomVisLabService.java</filename> - for one example.</para> - - <para>Visibility labels can be used in conjunction with ACLs.</para> - <table> - <title>Examples of Visibility Expressions</title> - <tgroup cols="2"> - <thead> - <row> - <entry>Expression</entry> - <entry>Interpretation</entry> - </row> - </thead> - <tbody> - <row> - <entry><screen>fulltime</screen></entry> - <entry><para>Allow accesss to users associated with the - <code>fulltime</code> label.</para></entry> - </row> - <row> - <entry><screen>!public</screen></entry> - <entry><para>Allow access to users not associated with the - <code>public</code> label.</para></entry> - </row> - <row> - <entry><screen>( secret | topsecret ) & !probationary</screen></entry> - <entry><para>Allow access to users associated with either the - <code>secret</code> or <code>topsecret</code> label and not - associated with the <code>probationary</code> label.</para></entry> - </row> - </tbody> - </tgroup> - </table> - <section> - <title>Server-Side Configuration</title> - <procedure> - <step> - <para>As a prerequisite, perform the steps in <xref - linkend="security.data.basic.server.side"/>.</para></step> - <step> - <para>Install and configure the VisibilityController coprocessor by setting the - following properties in <filename>hbase-site.xml</filename>. These properties take a - list of class names.</para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.coprocessor.region.classes</name> - <value>org.apache.hadoop.hbase.security.visibility.VisibilityController</value> -</property> -<property> - <name>hbase.coprocessor.master.classes</name> - <value>org.apache.hadoop.hbase.security.visibility.VisibilityController</value> -</property> - ]]></programlisting> - <note> - <para>If you use the AccessController and VisibilityController coprocessors together, - the AccessController must come first in the list, because with both components - active, the VisibilityController will delegate access control on its system tables - to the AccessController.</para> - </note> - </step> - <step> - <title>Adjust Configuration</title> - <para>By default, users can label cells with any label, including labels they are not - associated with, which means that a user can Put data that he cannot read. For - example, a user could label a cell with the (hypothetical) 'topsecret' label even if - the user is not associated with that label. If you only want users to be able to label - cells with labels they are associated with, set - <property>hbase.security.visibility.mutations.checkauths</property> to - <literal>true</literal>. In that case, the mutation will fail if it makes use of - labels the user is not associated with.</para> - </step> - <step> - <para>Distribute your configuration and restart your cluster for changes to take - effect.</para> - </step> - </procedure> - </section> - <section> - <title>Administration</title> - <para>Administration tasks can be performed using the HBase Shell or the Java API. For - defining the list of visibility labels and associating labels with users, the - HBase Shell is probably simpler.</para> - <caution> - <title>API Examples</title> - <para>Many of the Java API examples in this section are taken from the source file - <filename>hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java</filename>. - Refer to that file or the API documentation for more context.</para> - <para>Neither these examples, nor the source file they were taken from, are part of the - public HBase API, and are provided for illustration only. Refer to the official API - for usage instructions.</para> - </caution> - <procedure> - <step> - <title>Define the List of Visibility Labels</title> - <example> - <title>HBase Shell</title> - <screen>hbase< <userinput>add_labels [ 'admin', 'service', 'developer', 'test' ]</userinput></screen> - </example> - <example> - <title>Java API</title> - <programlisting language="java"><![CDATA[ -public static void addLabels() throws Exception { - PrivilegedExceptionAction<VisibilityLabelsResponse> action = - new PrivilegedExceptionAction<VisibilityLabelsResponse>() { - public VisibilityLabelsResponse run() throws Exception { - String[] labels = { SECRET, TOPSECRET, CONFIDENTIAL, PUBLIC, PRIVATE, COPYRIGHT, ACCENT, - UNICODE_VIS_TAG, UC1, UC2 }; - try { - VisibilityClient.addLabels(conf, labels); - } catch (Throwable t) { - throw new IOException(t); - } - return null; - } - }; - SUPERUSER.runAs(action); -} - ]]></programlisting> - </example> - </step> - <step> - <title>Associate Labels with Users</title> - <example> - <title>HBase Shell</title> - <screen>hbase< <userinput>set_auths 'service', [ 'service' ]</userinput></screen> - <screen>hbase< <userinput>set_auths 'testuser', [ 'test' ]</userinput></screen> - <screen>hbase< <userinput>set_auths 'qa', [ 'test', 'developer' ]</userinput></screen> - </example> - <example> - <title>Java API</title> - <programlisting language="java"><![CDATA[ -public void testSetAndGetUserAuths() throws Throwable { - final String user = "user1"; - PrivilegedExceptionAction<Void> action = new PrivilegedExceptionAction<Void>() { - public Void run() throws Exception { - String[] auths = { SECRET, CONFIDENTIAL }; - try { - VisibilityClient.setAuths(conf, auths, user); - } catch (Throwable e) { - } - return null; - } - ... - ]]></programlisting> - </example> - </step> - <step> - <title>Clear Labels From Users</title> - <example> - <title>HBase Shell</title> - <screen>hbase< <userinput>clear_auths 'service', [ 'service' ]</userinput></screen> - <screen>hbase< <userinput>clear_auths 'testuser', [ 'test' ]</userinput></screen> - <screen>hbase< <userinput>clear_auths 'qa', [ 'test', 'developer' ]</userinput></screen> - </example> - <example> - <title>Java API</title> - <programlisting language="java"><![CDATA[ -... -auths = new String[] { SECRET, PUBLIC, CONFIDENTIAL }; -VisibilityLabelsResponse response = null; -try { - response = VisibilityClient.clearAuths(conf, auths, user); -} catch (Throwable e) { - fail("Should not have failed"); -... - ]]></programlisting> - </example> - </step> - <step> - <title>Apply a Label or Expression to a Cell</title> - <para>The label is only applied when data is written. The label is associated with a - given version of the cell.</para> - <example> - <title>HBase Shell</title> - <screen>hbase< <userinput>set_visibility 'user', 'admin|service|developer', \ - { COLUMNS => 'i' }</userinput></screen> - <screen>hbase< <userinput>set_visibility 'user', 'admin|service', \ - { COLUMNS => ' pii' }</userinput></screen> - <screen>hbase< <userinput>COLUMNS => [ 'i', 'pii' ], \ - FILTER => "(PrefixFilter ('test'))" }</userinput></screen> - </example> - <note> - <para>HBase Shell support for applying labels or permissions to cells is for testing - and verification support, and should not be employed for production use because it - won't apply the labels to cells that don't exist yet. The correct way to apply cell - level labels is to do so in the application code when storing the values.</para> - </note> - <example> - <title>Java API</title> - <programlisting language="java"><![CDATA[ -static HTable createTableAndWriteDataWithLabels(TableName tableName, String... labelExps) - throws Exception { - HTable table = null; - try { - table = TEST_UTIL.createTable(tableName, fam); - int i = 1; - List<Put> puts = new ArrayList<Put>(); - for (String labelExp : labelExps) { - Put put = new Put(Bytes.toBytes("row" + i)); - put.add(fam, qual, HConstants.LATEST_TIMESTAMP, value); - put.setCellVisibility(new CellVisibility(labelExp)); - puts.add(put); - i++; - } - table.put(puts); - } finally { - if (table != null) { - table.flushCommits(); - } - } - ]]></programlisting> - </example> - </step> - </procedure> - </section> - <section> - <title>Implementing Your Own Visibility Label Algorithm</title> - <para>Interpreting the labels authenticated for a given get/scan request is a pluggable - algorithm. You can specify a custom plugin by using the property - <code>hbase.regionserver.scan.visibility.label.generator.class</code>. The default - implementation class is - <code>org.apache.hadoop.hbase.security.visibility.DefaultScanLabelGenerator</code>. You - can also configure a set of <code>ScanLabelGenerators</code> to be used by the system, as - a comma-separated list.</para> - </section> - </section> - - <section xml:id="hbase.encryption.server"> - <title>Transparent Encryption of Data At Rest</title> - <para>HBase provides a mechanism for protecting your data at rest, in HFiles and the WAL, which - reside within HDFS or another distributed filesystem. A two-tier architecture is used for - flexible and non-intrusive key rotation. "Transparent" means that no implementation changes - are needed on the client side. When data is written, it is encrypted. When it is read, it is - decrypted on demand.</para> - <section> - <title>How It Works</title> - <para>The administrator provisions a master key for the cluster, which is stored in a key - provider accessible to every trusted HBase process, including the HMaster, RegionServers, - and clients (such as HBase Shell) on administrative workstations. The default key provider - is integrated with the Java KeyStore API and any key management systems with support for - it. Other custom key provider implementations are possible. The key retrieval mechanism is - configured in the <filename>hbase-site.xml</filename> configuration file. The master key - may be stored on the cluster servers, protected by a secure KeyStore file, or on an - external keyserver, or in a hardware security module. This master key is resolved as - needed by HBase processes through the configured key provider.</para> - <para>Next, encryption use can be specified in the schema, per column family, by creating - or modifying a column descriptor to include two additional attributes: the name of the - encryption algorithm to use (currently only "AES" is supported), and optionally, a data - key wrapped (encrypted) with the cluster master key. If a data key is not explictly - configured for a ColumnFamily, HBase will create a random data key per HFile. This - provides an incremental improvement in security over the alternative. Unless you need to - supply an explicit data key, such as in a case where you are generating encrypted HFiles - for bulk import with a given data key, only specify the encryption algorithm in the - ColumnFamily schema metadata and let HBase create data keys on demand. Per Column Family - keys facilitate low impact incremental key rotation and reduce the scope of any external - leak of key material. The wrapped data key is stored in the ColumnFamily schema metadata, - and in each HFile for the Column Family, encrypted with the cluster master key. After the - Column Family is configured for encryption, any new HFiles will be written encrypted. To - ensure encryption of all HFiles, trigger a major compaction after enabling this - feature.</para> - <para>When the HFile is opened, the data key is extracted from the HFile, decrypted with the - cluster master key, and used for decryption of the remainder of the HFile. The HFile will - be unreadable if the master key is not available. If a remote user somehow acquires access - to the HFile data because of some lapse in HDFS permissions, or from inappropriately - discarded media, it will not be possible to decrypt either the data key or the file - data.</para> - <para>It is also possible to encrypt the WAL. Even though WALs are transient, it is - necessary to encrypt the WALEdits to avoid circumventing HFile protections for encrypted - column families, in the event that the underlying filesystem is compromised. When WAL - encryption is enabled, all WALs are encrypted, regardless of whether the relevant HFiles - are encrypted.</para> - </section> - <section> - <title>Server-Side Configuration</title> - <para>This procedure assumes you are using the default Java keystore implementation. If you - are using a custom implementation, check its documentation and adjust accordingly.</para> - <procedure> - <step> - <title>Create a secret key of appropriate length for AES encryption, using the - <code>keytool</code> utility.</title> - <screen>$ <userinput>keytool -keystore /path/to/hbase/conf/hbase.jks \ - -storetype jceks -storepass **** \ - -genseckey -keyalg AES -keysize 128 \ - -alias <alias></userinput></screen> - <para>Replace <replaceable>****</replaceable> with the password for the keystore file - and <alias> with the username of the HBase service account, or an arbitrary - string. If you use an arbitrary string, you will need to configure HBase to use it, - and that is covered below. Specify a keysize that is appropriate. Do not specify a - separate password for the key, but press <keycap>Return</keycap> when prompted.</para> - </step> - <step> - <title>Set appropriate permissions on the keyfile and distribute it to all the HBase - servers.</title> - <para>The previous command created a file called <filename>hbase.jks</filename> in the - HBase <filename>conf/</filename> directory. Set the permissions and ownership on this - file such that only the HBase service account user can read the file, and securely - distribute the key to all HBase servers.</para> - </step> - <step> - <title>Configure the HBase daemons.</title> - <para>Set the following properties in <filename>hbase-site.xml</filename> on the region - servers, to configure HBase daemons to use a key provider backed by the KeyStore file - or retrieving the cluster master key. In the example below, replace - <replaceable>****</replaceable> with the password.</para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.crypto.keyprovider</name> - <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value> -</property> -<property> - <name>hbase.crypto.keyprovider.parameters</name> - <value>jceks:///path/to/hbase/conf/hbase.jks?password=****</value> -</property> - ]]></programlisting> - <para>By default, the HBase service account name will be used to resolve the cluster - master key. However, you can store it with an arbitrary alias (in the - <command>keytool</command> command). In that case, set the following property to the - alias you used.</para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.crypto.master.key.name</name> - <value>my-alias</value> -</property>]]> - </programlisting> - <para>You also need to be sure your HFiles use HFile v3, in order to use transparent - encryption. This is the default configuration for HBase 1.0 onward. For previous - versions, set the following property in your <filename>hbase-site.xml</filename> - file.</para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hfile.format.version</name> - <value>3</value> -</property>]]> - </programlisting> - <para>Optionally, you can use a different cipher provider, either a Java Cryptography - Encryption (JCE) algorithm provider or a custom HBase cipher implementation. </para> - <substeps> - <step> - <title>JCE: </title> - <itemizedlist> - <listitem> - <para>Install a signed JCE provider (supporting âAES/CTR/NoPaddingâ mode with - 128 bit keys) </para> - </listitem> - <listitem> - <para>Add it with highest preference to the JCE site configuration file - <filename>$JAVA_HOME/lib/security/java.security</filename>.</para> - </listitem> - <listitem> - <para>Update <option>hbase.crypto.algorithm.aes.provider</option> and - <option>hbase.crypto.algorithm.rng.provider</option> options in - <filename>hbase-site.xml</filename>. </para> - </listitem> - </itemizedlist> - </step> - <step> - <title>Custom HBase Cipher: </title> - <itemizedlist> - <listitem> - <para>Implement - <code>org.apache.hadoop.hbase.io.crypto.CipherProvider</code>.</para> - </listitem> - <listitem> - <para>Add the implementation to the server classpath.</para> - </listitem> - <listitem> - <para>Update <option>hbase.crypto.cipherprovider</option> in - <filename>hbase-site.xml</filename>.</para> - </listitem> - </itemizedlist> - </step> - </substeps> - </step> - <step> - <title>Configure WAL encryption.</title> - <para>Configure WAL encryption in every RegionServer's - <filename>hbase-site.xml</filename>, by setting the following properties. You can - include these in the HMaster's <filename>hbase-site.xml</filename> as well, but the - HMaster does not have a WAL and will not use them.</para> - <programlisting language="xml"><![CDATA[ -<property> - <name>hbase.regionserver.hlog.reader.impl</name> - <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value> -</property> -<property> - <name>hbase.regionserver.hlog.writer.impl</name> - <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value> -</property> -<property> - <name>hbase.regionserver.wal.encryption</name> - <value>true</value> -</property> - ]]></programlisting> - </step> - <step> - <title>Configure permissions on the <filename>hbase-site.xml</filename> file.</title> - <para>Because the keystore password is stored in the hbase-site.xml, you need to ensure - that only the HBase user can read the <filename>hbase-site.xml</filename> file, using - file ownership and permissions.</para> - </step> - <step> - <title>Restart your cluster.</title> - <para>Distribute the new configuration file to all nodes and restart your - cluster.</para> - </step> - </procedure> - </section> - <section> - <title>Administration</title> - <para>Administrative tasks can be performed in HBase Shell or the Java API.</para> - <caution> - <title>Java API</title> - <para>Java API examples in this section are taken from the source file - <filename>hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckEncryption.java</filename>. - .</para> - <para>Neither these examples, nor the source files they are taken from, are part of the - public HBase API, and are provided for illustration only. Refer to the official API - for usage instructions.</para> - </caution> - <variablelist> - <varlistentry> - <term>Enable Encryption on a Column Family</term> - <listitem> - <para>To enable encryption on a column family, you can either use HBase Shell or the - Java API. After enabling encryption, trigger a major compaction. When the major - compaction completes, the HFiles will be encrypted.</para> - <example> - <title>HBase Shell</title> - <screen> -hbase> disable 'mytable' -hbase> alter 'mytable', 'mycf', {ENCRYPTION => AES} -hbase> enable 'mytable' - </screen> - </example> - <example> - <title>Java API</title> - <para>You can use the <code>HBaseAdmin#modifyColumn</code> API to modify the - <property>ENCRYPTION</property> attribute on a Column Family. Additionally, you - can specify the specific key to use as the wrapper, by setting the - <property>ENCRYPTION_KEY</property> attribute. This is only possible via the - Java API, and not the HBase Shell. The default behavior if you do not specify an - <property>ENCRYPTION_KEY</property> for a column family is for a random key to - be generated for each encrypted column family (per HFile). This provides - additional defense in the (unlikely, but theoretically possible) occurrence of - storing the same data in multiple HFiles with exactly the same block layout, the - same data key, and the same randomly-generated initialization vector.</para> - <para>This example shows how to programmatically set the transparent encryption both - in the server configuration and at the column family, as part of a test which uses - the Minicluster configuration.</para> - <programlisting language="java"> -@Before -public void setUp() throws Exception { - conf = TEST_UTIL.getConfiguration(); - conf.setInt("hfile.format.version", 3); - conf.set(HConstants.CRYPTO_KEYPROVIDER_CONF_KEY, KeyProviderForTesting.class.getName()); - conf.set(HConstants.CRYPTO_MASTERKEY_NAME_CONF_KEY, "h
<TRUNCATED>