Wednesday, September 9, 2015

Custom Hadoop Client Configuration

Hi again,

In this post, i will show how to configure hadoop client on user machine and connecting them to different hadoop clusters. Also i will set client configurations on same machine targeting different Hadoop clusters. I am working on Oracle Big Data Appliance with Cloudera Manager 5.4.0.

It is very important for developers to access Prod and Test systems without changing development machines. For a hadoop developer, custom client configurations are important. In this post, i will connect developer to Prod and Test hadoop system and show you how to change connection target from test system to prod system within the environment.

Setup Hadoop Client 

  • Download Configuration Files

I am assuming necessary insallations for hadoop on Exadata are done.RPMs , firewall acceses, a valid Kerberos access , etc..

You can check "Section :  Installing CDH on Oracle Exadata Database Machine" in this document.

After that , to enable hadoop client we have to download necessary client configuration files from Cloudera Manager. In order to that,

When you click "View Client Configuration URLs" , it gives the necessary configuration files from CM server.

Download and unzip all files and you see following

Ok files are done. Now i will configure custom cluster mapping.

  • Configure Hadoop Client

Here, i will configure two parameter for custom HDFS and HIVE access .


$ export HADOOP_CONF_DIR=/tmp/hadoopprod/conf/
$ export HIVE_CONF_DIR=/tmp/hadoopprod/conf/


$ export HADOOP_CONF_DIR=/tmp/hadooptest/conf/
$ export HIVE_CONF_DIR=/tmp/hadooptest/conf/

Now, for prod installation, i gave same location for them but you can change, of course.

Now copy config files from downloaded folder to CONF_DIRS. Do following

- Copy all files from hadoop-conf to HADOOP_CONF_DIR
- Copy mapred-site.xml, yarn-site.xml from yarn-conf to HADOOP_CONF_DIR
- Copy,hive-site.xml and redaction-rules.json from hive-conf to HIVE_CONF_DIR

  • Check Access to Hadoop Cluster

First you have to get a Kerberos ticket with following, and try to put a file

$ kinit erkanul@KERBEROSHOST

$ hadoop fs -put sample.txt
$ hadoop fs -ls
Found 2 items
-rw-r--r--   3 erkanul erkanul          8 2015-09-09 15:46 sample.txt
drwxr-xr-x   - erkanul erkanul          0 2015-09-08 19:59 hcli

Here you see, we succesfully listed our hdfs files.
So lets do a sqoop import

$ sqoop import --connect jdbc:oracle:thin:@//DBHOST:DBPORT/DBNAME --table #TABLE--username #USER --password #PASS -m 1 --hive-import --hive-table default.sqooptest --target-dir /user/erkanul/hcli2customtest --hive-import --hive-table default.sqooptestyeni

15/09/09 11:34:54 INFO hive.HiveImport: Hive import complete.
15/09/09 11:34:54 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

Ok, we also succesfully finished hive import .Check again, you will see the

$ hadoop fs -ls
Found 3 items

-rw-r--r--   3 erkanul erkanul          8 2015-09-09 15:46 sample.txt
drwxr-xr-x   - erkanul erkanul          0 2015-09-08 19:59 hcli
drwxr-xr-x   - erkanul erkanul          0 2015-09-09 11:23 hcli2customtest

Also check in hive editor from HUE

  • Switch to Test BDA

As i said above configuration is for PROD BDA, so we have to do it for Test BDA as following
- Download and unzip config files from TEST BDA CM page.
- Copy config files to /tmp/hadooptest/conf.

After that switch configuration with following

$ export HADOOP_CONF_DIR=/tmp/hadooptest/conf/
$ export HIVE_CONF_DIR=/tmp/hadooptest/conf/

$hadoop fs -ls
Found 4 items
drwxr-xr-x   - erkanul erkanul          0 2015-08-07 16:20 SQOOPERKANUL.SQOOPTEST3
-rw-r--r--   3 erkanul erkanul  128589824 2015-08-07 14:23 part-m-00000
-rwxrwxrwx   3 erkanul erkanul       2795 2015-09-01 10:00 test_erkanul.csv
-rwxrwxrwx   3 erkanul erkanul       2840 2015-09-01 10:04 test_erkanul2.csv

As you can see, i see my hdfs files on Test Hadoop Cluster.

Now do the sqoop import and try a file put :)


When a developer changes the HADOOP_CONF_DIR and HIVE_CONF_DIR parameters , it connects to Prod or Test environment. It is a good practice when you use a gateway for accessing your Hadoop Cluster without chaning configuration.

  • Some Errors 

Here, i will talk about some errors you can get while testing.

First when you get a valid ticket from Kerberos server, but when you issue an HDFS command you get following.

ls: Failed on local exception: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "#CLIENT_HOST/#CLIENT_IP"; destination host is: "#BDA_HOST":#BDA_PORT

That problem is your security jar. :( You can check from that page.
You have to download new version of local_policy.jar and US_export_policy.jar to $JAVA_HOME/jre/lib/security.

Another error is when executing sqoop import, you see

ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver
java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver

You have to use ojdbc6.jar in your classpath, or copy them under to /var/lib/sqoop

And finally when you make a hive import it is possible to get a create-table error.

15/09/09 15:52:10 INFO hive.HiveImport: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:User erkanul does not have privileges for CREATETABLE)

You need to check whether your user can write to database default in this example.

That's all.

Thanks for reading

Enjoy  & share.


1 comment :

  1. Nice, Get Certification in Big Data and Hadoop Developer Training in Noida from Croma Campus. The training program is packed with the Latest & Advanced modules like YARN, Flume, Oozie, Mahout & Chukwa. croma Campus gives well expierience menter & fully facility according other institute.