Welcome Guest! Log in
Stambia versions 2.x, 3.x, S17, S18, S19 and S20 are reaching End of Support January, 15th, 2024. Please consider upgrading to the supported Semarchy xDI versions. See Global Policy Support and the Semarchy Documentation.

The Stambia User Community is moving to Semarchy! All the applicable resources have already been moved or are currently being moved to their new location. Read more…


Impala is a data warehouse software used to manage, query and structure data stored across different storage systems such as HDFS.

This article explains the basics to start working with Impala in Stambia.

Prerequisites:

You must install the Hadoop connector to be able to work with Impala.

Please refer to the following article that will guide you to accomplish this.

 

Metadata

The first step, when you want to work with Impala in Stambia DI, consists of creating and configuring the Impala Metadata.

Here is a summary of the main steps to follow:

  1. Creation of the Impala Metadata
  2. (Optional) Configuration of the Kerberos security
  3. Configuration of the server JDBC properties
  4. Reverse of the schemas and tables
  5. Configuration of HDFS

 

Here is an example of a common Metadata Configuration:

 

metadataOverview

 

Metadata creation

Create first the Impala Metadata, as usual, by selecting the technology in the Metadata Creation Wizard:

newMetadata

 

Click next, choose a name and click on finish.

 

Configuration of the Kerberos security

When working with Kerberos secured Hadoop clusters, connections will be protected, and you'll therefore need to specify in Stambia the credentials and necessary information to perform the Kerberos connection.

If your cluster is secured with Kerberos, close the server Wizard popup (if it is displayed), and follow the steps below before trying to connect and reverse Impala objects.

Otherwise you can go to the next section.

  1. Create a new Kerberos Metadata (or use an existing one)
  2. Define inside the Kerberos Principal to use for Impala
  3. Drag and drop it in the Impala Metadata
  4. Rename the Metadata Link to 'KERBEROS'

 

metadataLinkKerberos

 

Refer to this dedicated article for further information about the Kerberos Metadata configuration

 

Configuration of the server properties

You are now ready to configure the JDBC properties that will be used to connect to Impala.

We're going to use the Server Wizard to configure everything.

If the Server Wizard popup is not displayed (if you closed it for configuring Kerberos, or any other reason), you can open it again with a right click > Actions > Launch wizard on the server node.

Define the JDBC properties to connect to Impala and click then on Connect when it is done.

launchWizard

Specifying a User / Password is not required when using Kerberos

 

About the JDBC URL

Defining the correct JDBC URL and parameters might be delicate as it depends a lot on the Impala server and network configuration, if Kerberos is used, what Hadoop distribution is used, etc.

We'll so take a little time here to give advice and examples of URLs with explanations about its structure.

First, the Impala JDBC URL must follow the given syntax:

jdbc:stambia:handler1:<Impala JDBC Class>:<JDBC URL>

The first part is a little particular and is present because we're using a custom Stambia handler which helps us to handle the Kerberos security seamlessly. It is mandatory.

The 'Impala JDBC Class' will be the following in most cases, if you are using the standard Cloudera JDBC driver:

jdbc:stambia:handler1:com.cloudera.impala.jdbc4.Driver:<JDBC_URL>

You should have so to configure only the <JDBC_URL> part.

 

Example of URL to connect to Impala server that is not secured with Kerberos

jdbc:stambia:handler1:com.cloudera.impala.jdbc4.Driver:jdbc:impala://quickstart.cloudera:21050/default;OptimizedInsert=0;UseNativeQuery=0

 

Example of URL to connect to Impala server that is secured with Kerberos

jdbc:stambia:handler1:com.cloudera.impala.jdbc4.Driver:jdbc:impala://quickstart.cloudera:21050/default;OptimizedInsert=0;UseNativeQuery=0;KrbHostFQDN=quickstart.cloudera;KrbServiceName=impala;AuthMech=1;principal=impala/quickstart.cloudera@CLOUDERA

 

Below are some JDBC URL properties that are usually required when using Kerberos:

Property Mandatory Description Example
AuthMech Yes

This is used to specify the authentication mechanism to use while connection. For Kerberos this should be set to 1.

AuthMech=1
principal Yes

Kerberos principal to connect with

principal=impala/quickstart.cloudera@CLOUDERA
KrbServiceName Yes Impala service's principal name impala
KrbHostFQDN Yes Fully qualified domain name of the Impala Server host quickstart.cloudera
KrbRealm Optional Kerberos realm used to connect CLOUDERA

 

Reversing schemas and tables

Once the connection properties are set, Kerberos optionally configured, you can click on connect and reverse your schemas and tables, as usual.

Simply follow the wizard as for any other traditional database:

reverse

 

About HDFS

Most of the Impala Templates are using HDFS operations to optimize the treatments and use the native loaders.

The Impala Metadata therefore requires an HDFS connection to create temporary files while processing.

  1. Create an HDFS Metadata or use an existing one
  2. Define in this Metadata the temporary HDFS folder where those operations should be performed (Impala must be able to access it).
  3. Drag and drop the HDFS Folder Metadata in the Impala Metadata
  4. Rename the Metadata Link to HDFS

 

metadataLinkHDFS

 

Creating your first Mappings

Your Metadata being ready and your tables reversed, you can now start creating your first Mappings.

The Impala technology in Stambia is not different than any other database you could usually use.

Drag and drop your sources and targets, map the columns as usual, and configure the templates accordingly to your requirements.

Example of Mapping loading data from HSQL into Impala:

mappingHsqlToImpala

 

Example of Mapping loading data from HSQL to Impala with rejects enabled:

mappingHsqlToImpalaRejects

 

Example of Mapping loading a delimited file into Impala:

mappingFileToImpala

 

Example of Mapping loading data from Impala to HSQL, using a filter and performing joins:

mappingImpalaToHsql

 

Note: For further information, please consult the template's Process and parameters description.

 

Demonstration Project

The Hadoop demonstration project that you can find on the download page contains Impala examples.

Do not hesitate to have a look at this project to find samples and examples.

 

Articles

Suggest a new Article!