Tuesday, January 5, 2016

[BigData] Authorization with Apache Sentry

Hi,

Security is a hot topic in big data systems and Oracle integrates security to hadoop ecosystem with Apache Sentry.
Sentry is used in Oracle BigData Appliance in order to satisfy security concerns and authorize SQL access. Sentry supports role-based access control and unified authorization for HUE, impala and hive.

In this post , i will talk about sentry tool and make some authorization settings for OS groups.


Have a look at the documentation: 

Apache Sentry (incubating) is a granular, role-based authorization module for Hadoop. Sentry provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. Sentry currently works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Cloudera Impala and HDFS (limited to Hive table data).

First, our sentry version is :

mysql> select * from SENTRY_VERSION;
+--------+----------------+-------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT                     |
+--------+----------------+-------------------------------------+
|      1 | 1.4.0-cdh5-2   | Sentry release version 1.4.0-cdh5-2 |
+--------+----------------+-------------------------------------+
1 row in set (0.00 sec)

As you see , we can think sentry as a police officer who checks whether this SQL query allowed to run or not, in brief.

We can integrate sentry with Active Directory and also use kerberos authentication system in our BDA environment.

In this post, i am passing configuration part of sentry which you can follow from here.  But i will talk about sentry database model and how to use sentry with our systems.

Sentry Database Model.

Sentry use a very simple database model. You can access configuration settings from Cloudera Manager -> Sentry service-> Configuration page


Lets connect sentry mysql database in this example.

You will see following tables:

mysql> use sentry;
Database changed
mysql> show tables;
+------------------------------+
| Tables_in_sentry             |
+------------------------------+
| SENTRY_DB_PRIVILEGE          |
| SENTRY_GROUP                 |
| SENTRY_ROLE                  |
| SENTRY_ROLE_DB_PRIVILEGE_MAP |
| SENTRY_ROLE_GROUP_MAP        |
| SENTRY_VERSION               |
| SEQUENCE_TABLE               |
+------------------------------+
7 rows in set (0.00 sec)

The important tables are following and you can check your grants in tables easily.

SENTRY_ROLE                  -> Defined roles 
SENTRY_GROUP                 -> Granted OS groups 
SENTRY_ROLE_GROUP_MAP        -> which roles assigned to which roles.
SENTRY_DB_PRIVELEGE          -> which grants are given
SENTRY_ROLE_DB_PRIVILEGE_MAP -> grants assigned to roles

You can also give select grant on a table or a jar file using file path in sentry.

In table SENTRY_DB_PRIVELEGE, these privileges are also listed.

mysql> SELECT * FROM SENTRY_DB_PRIVILEGE  ;
+-----------------+-----------------+-------------+----------+---------------------+-------------------------------------------------------------------------------+--------+---------------+-------------------+
| DB_PRIVILEGE_ID | PRIVILEGE_SCOPE | SERVER_NAME | DB_NAME  | TABLE_NAME          | URI                                                                           | ACTION | CREATE_TIME   | WITH_GRANT_OPTION |
+-----------------+-----------------+-------------+----------+---------------------+-------------------------------------------------------------------------------+--------+---------------+-------------------+
|               1 | SERVER          | server1     | __NULL__ | __NULL__            | __NULL__                                                                      | *      | 1435230391611 | N                 |
|               6 | URI             | server1     | __NULL__ | __NULL__            | file:///opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib-1.1.0-cdh5.4.0.jar | *      | 1435758914457 | N                 |
|               7 | URI             | server1     | __NULL__ | __NULL__            | hdfs:///opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib-1.1.0-cdh5.4.0.jar | *      | 1435759061151 | N                 |
|              11 | TABLE           | server1     | default  | default             | __NULL__                                                                      | select | 1439543565668 | N                 |
|              12 | TABLE           | server1     | default  | defaul              | __NULL__                                                                      | select | 1439543578705 | N                 |
|              14 | DATABASE        | server1     | default  | __NULL__            | __NULL__                                                                      | select | 1439546951486 | N                 |
|              15 | TABLE           | server1     | default  | default             | __NULL__                                                                      | select | 1439547062374 | Y                 |
|              
+-----------------+-----------------+-------------+----------+---------------------+-------------------------------------------------------------------------------+--------+---------------+-------------------+


For a complete query show you can run following query

mysql > SELECT rolelist.role_name as "ROLE NAME",
               grouplist.group_name AS "GROUP NAME",
               priv.action"ACTION",
               priv.privilege_scope "PRIV SCOPE",
               priv.db_name "DB_NAME",
               priv.table_name "TABLE_NAME",
               priv.uri "URI"
        FROM SENTRY_DB_PRIVILEGE  priv, 
             SENTRY_ROLE_DB_PRIVILEGE_MAP role_priv_map, 
             SENTRY_ROLE_GROUP_MAP role_group_map,
             SENTRY_ROLE  rolelist,SENTRY_GROUP grouplist
        WHERE role_group_map.role_id=rolelist.role_id
              and role_group_map.group_id=grouplist.group_id
              and role_priv_map.role_id=role_group_map.role_id
              and role_priv_map.db_privilege_id=priv.db_privilege_id
              order by 1 ,2 


+------------------------+-------------------------------------+--------+------------+----------+------------+-------------------------------------------------------------------------------+
| ROLE NAME              | GROUP NAME                          | ACTION | PRIV SCOPE | DB_NAME  | TABLE_NAME | URI                                                                           |
+------------------------+-------------------------------------+--------+------------+----------+------------+-------------------------------------------------------------------------------+
| admin_role             | grup1                   | *      | URI        | __NULL__ | __NULL__   | file:///opt/oracle/bigdatasql/bdcell-12.1/jlib/json-serde-1.3.1.jar           |
| admin_role             | grup2           | *      | URI        | __NULL__ | __NULL__   | file:///opt/oracle/bigdatasql/bdcell-12.1/jlib/json-serde-1.3.1.jar           |
...
| role 2             | oracle                              | *      | URI        | __NULL__ | __NULL__   | hdfs:///opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib-1.1.0-cdh5.4.0.jar |
| role 3 | grup3                   | *      | DATABASE   | db1   | __NULL__   | __NULL__                                                                      |
+------------------------+-------------------------------------+--------+------------+----------+------------+-------------------------------------------------------------------------------+

As you see server-wide grants or file-based grants and also table-level grants can be listed easily here , you can get your reports easily.

Lets give grants and permissions on commandline. You can use hive or beeline CLI for enabling sentry grants. 

Listing all roles in db ( you can also query SENTRY_ROLE table from sentry database)

beeline> show roles;
+-----------------------------+--+
|            role             |
+-----------------------------+--+
| role1       |
| role2       |
...
...
| admin_role                  |
| deneme                      |
...
+-----------------------------+--+

You can list grant of a role with this:

beeline> show grant role admin_role;
+----------------------------------------------------------------------+--------+------------+---------+-----------------+-----------------+------------+---------------+-------------------+----------+--+
|                               database                               | table  | partition  | column  | principal_name  | principal_type  | privilege  | grant_option  |    grant_time     | grantor  |
+----------------------------------------------------------------------+--------+------------+---------+-----------------+-----------------+------------+---------------+-------------------+----------+--+
| *                                                                    |        |            |         | admin_role      | ROLE            | *          | false         | 1435230391611000  | --       |
| hdfs:///app/hive_serde/json-serde-1.3.1.jar                          |        |            |         | admin_role      | ROLE            | *          | false         | 1440163479303000  | --       |
| file:///tmp/hive-json-serde/json-serde-aws.jar                       |        |            |         | admin_role      | ROLE            | *          | false         | 1440061254959000  | --       |
| file:///tmp/hive-json-serde/json-serde-1.3.1.jar                     |        |            |         | admin_role      | ROLE            | *          | false         | 1440061245472000  | --       |
| file:///opt/oracle/bigdatasql/bdcell-12.1/jlib/json-serde-1.3.1.jar  |        |            |         | admin_role      | ROLE            | *          | false         | 1440163459313000  | --       |
| hdfs:///tmp/hive-json-serde/json-serde-1.3.1.jar                     |        |            |         | admin_role      | ROLE            | *          | false         | 1440061264598000  | --       |
| hdfs:///tmp/hive-json-serde/json-serde-aws.jar                       |        |            |         | admin_role      | ROLE            | *          | false         | 1440061271254000  | --       |
| file:///opt/oracle/bigdatasql/bdcell-12.1/jlib/json-serde-aws.jar    |        |            |         | admin_role      | ROLE            | *          | false         | 1440163475556000  | --       |
| hdfs:///app/hive_serde/json-serde-aws.jar                            |        |            |         | admin_role      | ROLE            | *          | false         | 1440163482669000  | --       |
+----------------------------------------------------------------------+--------+------------+---------+-----------------+-----------------+------------+---------------+-------------------+----------+--+

You can check grant of a OS level group with this:


beeline> SHOW ROLE GRANT GROUP `OSgroup1`;
+------------------------+---------------+-------------+----------+--+
|          role          | grant_option  | grant_time  | grantor  |
+------------------------+---------------+-------------+----------+--+
| admin_role             | false         | NULL        | --       |
| role 1                 | false         | NULL        | --       |
+------------------------+---------------+-------------+----------+--+

Lastly, lets create a role and assign some grants to that role and grant that role to OS level group.

beeline> create role roleX;
No rows affected (0.072 seconds)
beeline> Grant all on database dbX to role roleX;
No rows affected (0.062 seconds)
beeline>Grant role roleX to group `OSgroupX`;
No rows affected (0.024 seconds)

IMPORTANT!! 
Sentry is a powerful tool when you try to solve  security problems. But here important point is that you can add sentry rules ( grant/revoke) with only allowed users to sentry database.

You can check and adjust this from Cloudera Manager.


As you see ,  OS group of user who is making security settings must be in this configuration. Or you can get following error.

beeline> SHOW ROLES;
ERROR : Error processing Sentry command: Access denied to erkanul. Server Stacktrace: org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to erkanul
        at org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.list_sentry_roles_by_group(SentryPolicyStoreProcessor.java:450)
        at org.apache.sentry.provider.db.service.thrift.SentryPolicyService$Processor$list_sentry_roles_by_group.getResult(SentryPolicyService.java:953)
        at org.apache.sentry.provider.db.service.thrift.SentryPolicyService$Processor$list_sentry_roles_by_group.getResult(SentryPolicyService.java:938)
        at sentry.org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at sentry.org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.sentry.provider.db.service.thrift.SentryProcessorWrapper.process(SentryProcessorWrapper.java:48)
        at sentry.org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
        at sentry.org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.SentryGrantRevokeTask. SentryAccessDeniedException: Access denied to erkanul (state=08S01,code=1)

With Apache Senty, we cannot give grants to users . Only roles are supported.

beeline> grant role admin_role to user erkanul;
Error: Error while compiling statement: FAILED: SemanticException Sentry does not allow grant/revoke on: USER (state=42000,code=40000)

Ok, that's all.
Enjoy & Share.
Thanks

Source:

http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cm_sg_sentry_service.html 

http://www.oracle.com/technetwork/database/bigdata-appliance/overview/bigdataappliance-datasheet-1883358.pdf

https://cwiki.apache.org/confluence/display/SENTRY/Sentry+Tutorial


1 comment :

  1. Really Good blog post.provided a helpful information.I hope that you will post more updates like thisHadoop Admin Online Training Bangalore

    ReplyDelete