Monday, December 28, 2015

[BigData] Configuring Beeline and Impala Agents on an EdgeNode

Hi,

In this post, i will configure beeline and impala agents on an EdgeNode which you will remember in my previous posts and configure these agents for Oracle Big Data Appliance (BDA) test and prod targets.

Let me talk about EdgeNode.
You can configure a node as a hadoop client outside BDA and enabling connections to BDA from this node, called EdgeNode. One can use EdgeNode(s) as access points or data land zone to trasport data to BDA. Many edge nodes can be defined in order to handle production load.

In my previous posts, i established my EdgeNode for supporting hdfs, hive and spark operations like working on BDA .
Visit -> for Hadoop client, for Spark client

In this post, i configured beeline and impala command shells supporting authentication without entering connection strings.





Beeline CLI

As you know beeline requires connection string when authenticating hive server. Here is how we use beeline when connecting to default database on Hive Server.

$beeline
Beeline version 1.1.0-cdh5.4.0 by Apache Hive
beeline>  !connect jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST
scan complete in 2ms
Connecting to jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST
Enter username for jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST: 
Enter password for jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST: 
Connected to: Apache Hive (version 1.1.0-cdh5.4.0)
Driver: Hive JDBC (version 1.1.0-cdh5.4.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://HIVESERVERHOST:10000/default> show databases;
+------------------------+--+
|     database_name      |
+------------------------+--+
| authentication         |
| default                |
| test                   |
| tmp1                   |
| tmp2                   |
+------------------------+--+
15 rows selected (0.211 seconds)
0: jdbc:hive2://HIVESERVERHOST:10000/default> 

You see we need connection string as well as username and password . So lets login to EdgeNode.

In EdgeNode, we can setup beeline with pre-given connection string. You can look for commandline options with following:

$beeline -h
Usage: java org.apache.hive.cli.beeline.BeeLine 
   -u <database url>               the JDBC URL to connect to
   -n <username>                   the username to connect as
   -p <password>                   the password to connect as
   -d <driver class>               the driver class to use
   -i <init file>                  script file for initialization
   -e <query>                      query that should be executed
   -f <exec file>                  script file that should be executed
   -w (or) --password-file <password file>  the password file to read password from
   --hiveconf property=value       Use value for given property
   --hivevar name=value            hive variable name and value
                                   This is Hive specific settings in which variables
                                   can be set at session level and referenced in Hive
                                   commands or queries.
   --color=[true/false]            control whether color is used for display
   --showHeader=[true/false]       show column names in query results
   --headerInterval=ROWS;          the interval between which heades are displayed
   --fastConnect=[true/false]      skip building table/column list for tab-completion
   --autoCommit=[true/false]       enable/disable automatic transaction commit
   --verbose=[true/false]          show verbose error messages and debug info
   --showWarnings=[true/false]     display connection warnings
   --showNestedErrs=[true/false]   display nested errors
   --numberFormat=[pattern]        format numbers using DecimalFormat pattern
   --force=[true/false]            continue running script even after errors
   --maxWidth=MAXWIDTH             the maximum width of the terminal
   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns
   --silent=[true/false]           be more silent
   --autosave=[true/false]         automatically save preferences
   --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv]  format mode for result display
                                   Note that csv, and tsv are deprecated - use csv2, tsv2 instead
  --truncateTable=[true/false]    truncate table column when it exceeds length
   --delimiterForDSV=DELIMITER     specify the delimiter for delimiter-separated values output format (default: |)
   --isolation=LEVEL               set the transaction isolation level
   --nullemptystring=[true/false]  set to true to get historic behavior of printing null as empty string
   --help                          display this message
Beeline version 1.1.0-cdh5.4.3 by Apache Hive

You can see that we can give connection url with "-u" and username with "-n" and password with "-p".

NOTE THAT: In a kerberos secured BDA, we do not need to use username and passwords because of kerberos tickets. So you have to get your kerberos ticket initially, or you can get this following error.

$beeline
Beeline version 1.1.0-cdh5.4.0 by Apache Hive
beeline>  !connect jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST
scan complete in 2ms
Connecting to jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST
Enter username for jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST: 
Enter password for jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST: 
15/12/28 09:27:09 [main]: ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
        at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
        at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
        at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:190)
        at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:163)
        at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:208)
        at org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:137)
        at org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:178)
        at org.apache.hive.beeline.Commands.connect(Commands.java:1087)
        at org.apache.hive.beeline.Commands.connect(Commands.java:1008)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
        at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:962)
        at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:805)
        at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
        at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
        at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
        at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
        at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
        at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
        at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
        ... 34 more
Error: Could not open client transport with JDBC Uri: jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST: GSS initiate failed (state=08S01,code=0)
0: jdbc:hive2://HIVESERVERHOST:10000/default (closed)> 15/12/28 09:27:11 [main]: ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
        at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
        at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
        at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:190)
        at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:163)
        at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:208)
        at org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:137)
        at org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:178)
        at org.apache.hive.beeline.Commands.close(Commands.java:925)
        at org.apache.hive.beeline.Commands.closeall(Commands.java:907)
        at org.apache.hive.beeline.BeeLine.close(BeeLine.java:818)
        at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:769)
        at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
        at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
        at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
        at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
        at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
        at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
        ... 28 more
Error: Could not open client transport with JDBC Uri: jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST: GSS initiate failed (state=08S01,code=0)

After succesfully getting ticket, you can connect to beeline with pre-given connection string as follow.

$beeline -u "jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST

$beeline -u "jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST
scan complete in 3ms
Connecting to jdbc:hive2://HIVESERVERHOST:10000//default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST
Connected to: Apache Hive (version 1.1.0-cdh5.4.0)
Driver: Hive JDBC (version 1.1.0-cdh5.4.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.4.3 by Apache Hive
0: jdbc:hive2://HIVESERVERHOST:10000/default>show databases;
+------------------------+--+
|     database_name      |
+------------------------+--+
| authentication         |
| default                |
...
| test                   |
| tmp1                   |
| tmp2                   |
+------------------------+--+
15 rows selected (0.211 seconds)
0: jdbc:hive2://HIVESERVERHOST:10000/default> 

Note that beeline did not ask for username and password and connection string.

You can set an alias for that and no need to give connection string for developers and you can replace configuration later,  easily.

alias beeline='beeline -u "jdbc:hive2://HIVESERVERHOST:10000/default;principal=hive/HIVESERVERHOST.FWHOST@FWHOST"'

Impala

For the impala agent, let me check help

$impala-shell -h
Usage: impala_shell.py [options]

Options:
  -h, --help            show this help message and exit
  -i IMPALAD, --impalad=IMPALAD
                        <host:port> of impalad to connect to
                        [default: gbbdap33.fw.teknoloji.com.tr:21000]
  -q QUERY, --query=QUERY
                        Execute a query without the shell [default: none]
  -f QUERY_FILE, --query_file=QUERY_FILE
                        Execute the queries in the query file, delimited by ;
                        [default: none]
  -k, --kerberos        Connect to a kerberized impalad [default: False]
  -o OUTPUT_FILE, --output_file=OUTPUT_FILE
                        If set, query results are written to the given file.
                        Results from multiple semicolon-terminated queries
                        will be appended to the same file [default: none]
  -B, --delimited       Output rows in delimited mode [default: False]
  --print_header        Print column names in delimited mode when pretty-
                        printed. [default: False]
  --output_delimiter=OUTPUT_DELIMITER
                        Field delimiter to use for output in delimited mode
                        [default: \t]
  -s KERBEROS_SERVICE_NAME, --kerberos_service_name=KERBEROS_SERVICE_NAME
                        Service name of a kerberized impalad [default: impala]
  -V, --verbose         Verbose output [default: True]
  -p, --show_profiles   Always display query profiles after execution
                        [default: False]
  --quiet               Disable verbose output [default: False]
  -v, --version         Print version information [default: False]
  -c, --ignore_query_failure
                        Continue on query failure [default: False]
  -r, --refresh_after_connect
                        Refresh Impala catalog after connecting
                        [default: False]
  -d DEFAULT_DB, --database=DEFAULT_DB
                        Issues a use database command on startup
                        [default: none]
  -l, --ldap            Use LDAP to authenticate with Impala. Impala must be
                        configured to allow LDAP authentication.
                        [default: False]
  -u USER, --user=USER  User to authenticate with. [default: erkanul]
  --ssl                 Connect to Impala via SSL-secured connection
                        [default: False]
  --ca_cert=CA_CERT     Full path to certificate file used to authenticate
                        Impala's SSL certificate. May either be a copy of
                        Impala's certificate (for self-signed certs) or the
                        certificate of a trusted third-party CA. If not set,
                        but SSL is enabled, the shell will NOT verify Impala's
                        server certificate [default: none]
  --config_file=CONFIG_FILE
                        Specify the configuration file to load options. File
                        must have case-sensitive '[impala]' header. Specifying
                        this option within a config file will have no effect.
                        Only specify this as a option in the commandline.
                        [default: /home/erkanul/.impalarc]

You can see we have two options here using "-i" commandline option for the connection string which targets impala deamon on BDA or using a config file which has impalad deamon entry. In this post, i will use commandline option and again make an alias for it.

Here is the new impala-shell command

$impala-shell -k -i IMPALA_DEAMON.FWHOST
Starting Impala Shell using Kerberos authentication
Using service name 'impala'
Connected to IMPALA_DEAMON.FWHOST:21000
Server version: impalad version 2.2.0-cdh5 RELEASE (build 2ffd73a4255cefd521362ffe1cfb37463f67f75c)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.

Copyright (c) 2012 Cloudera, Inc. All rights reserved.

(Shell build version: Impala Shell v2.2.0-cdh5 (2ffd73a) built on Tue Apr 21 12:09:21 PDT 2015)
[IMPALA_DEAMON.FWHOST:21000] > show databases;
Query: show databases
+-----------------------+
| name                  |
+-----------------------+
| _impala_builtins      |
| authentication        |
| default               |
....
| test                  |
| tmp1                  |
| tmp2                  |
+-----------------------+
Fetched 15 row(s) in 0.30s
[IMPALADEAMON.FWHOST:21000] >

You see we successfully authenticated via impala and query our databases on BDA.

Here is the alias for impala-shell

alias impala-shell='impala-shell -k -i IMPALA_DEAMON.FWHOST'

Ok, that's all. Lastly, we finished configuration of commandline tools hadoop, hive, spark-shell, beeline and impala-shell on our EdgeNode and satisfied some security concerns of Oracle BDA. You can start configuring of EdgeNode with my previous step from zero .

Visit http://kamudba.blogspot.com.tr/2015/09/custom-hadoop-client-configuration.html

Thanks .
Enjoy & Share

Source :

http://kamudba.blogspot.com.tr/2015/09/custom-hadoop-client-configuration.html
http://kamudba.blogspot.com.tr/2015/09/spark-client-configuration-in-custom.html
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
http://www.cloudera.com/content/www/en-us/documentation/archive/impala/2-x/2-1-x/topics/impala_shell_options.html 






No comments :

Post a Comment