Welcome Guest! Log in
Stambia versions 2.x, 3.x, S17, S18, S19 and S20 are reaching End of Support January, 15th, 2024. Please consider upgrading to the supported Semarchy xDI versions. See Global Policy Support and the Semarchy Documentation.

The Stambia User Community is moving to Semarchy! All the applicable resources have already been moved or are currently being moved to their new location. Read more…


Sqoop is a tool used to transfer data between HDFS and relational databases having a JDBC driver.

Stambia provide tools for both Sqoop and Sqoop2 versions, which are different in term of usage.

This article explains the basics to start using Sqoop in Stambia.

Prerequisites:

You must install the Hadoop connector to be able to work with Sqoop.

Please refer to the following article that will guide you to accomplish this.

Important:

Hadoop's Sqoop is using JDBC drivers to transfer data.

Therefore, the JDBC driver corresponding to the technology you want to import or export data from must be installed on the Sqoop server.

Otherwise, Sqoop will not be able to transfer any data.

 

Metadata

The first step, when you want to work with Sqoop in Stambia DI, consists of creating and configuring the Sqoop Metadata.

Here is an example of a common Metadata Configuration

MetadataOverview

 

Metadata creation

Create first the Sqoop Metadata, as usual, by selecting the technology in the Metadata Creation Wizard:

MetadataNew

Click next, choose a name and click on finish.

 

Configuration of the server properties

The Metadata is now created, you can configure the Sqoop server properties accordingly to your requirements and environment:

MetadataOverview

The available properties are different depending on the Sqoop Version selected.

Here are the common Properties:

Property Description
Name Logical label (alias) for the server
Version

Sqoop version that should be used.

  • Sqoop1 version uses command line tools to perform the operations.
  • Sqoop2 version uses REST Web Services to perform the operations

 

Here are the Sqoop1 Properties:

Property Description Examples
Default API

The API to use by default with the Sqoop1 version:

  • CommandLine: Sqoop commands are launched locally from the current Runtime executing the delivery
  • CommandLine over SSH: Sqoop commands are launched on a remote server through SSH
CommandLine
Sqoop Home

Directory where the sqoop commands are installed.

This must be set to the folder just before the 'bin' folder.

As an example, if the sqoop-import / sqoop-export utilities are under /usr/bin/, the value should be the following:

/usr/

 

If you are using the CommandLine over SSH API, you must drag and drop a SSH Metadata Link containing the SSH connection information in the HDFS Metadata.

Rename it to 'SSH'.

MetadataSSH

 

 

Here are the Sqoop2 Properties:

Property Description Examples
Default API The API to use by default with the Sqoop2 version:
  • REST: The Sqoop2 REST API is used to perform the operations.
REST
URL Sqoop2 Rest API base URL http://<hostname>:12000/sqoop
Hadoop Configuration Directory

Path to a directory containing Hadoop Configuration Files on the remote server, such as core-site.xml, hdfs-site.xml, ...

It is required when creating a Sqoop2 HDFS Link

/home/cloudera/stambia/conf

 

Configuration of the Kerberos Security

When working with Kerberos secured Hadoop clusters, connections will be protected, and you'll therefore need to specify in Stambia the credentials and necessary information to perform the Kerberos connection.

A Kerberos Metadata is available to specify everything required.

  1. Create a new Kerberos Metadata (or use an existing one)
  2. Define inside the Kerberos Principal to use for Sqoop
  3. Drag and drop it in the Sqoop Metadata
  4. Rename the Metadata Link to 'KERBEROS'

 

MetadataKerberos

 

 Kerberos is only supported for the Sqoop1 version
 Refer to this dedicated article for further information about the Kerberos Metadata configuration

 

Using Sqoop in Stambia

Sqoop1

Stambia provides two Process tools to work with Sqoop1:

Tool Description
TOOL Sqoop Export Export data from HDFS to any database having a JDBC driver.
TOOL Sqoop Import Import data from any database having a JDBC driver to HDFS.

 

To use a tool:

  1. Drag and drop it in a Process
  2. Drag and drop the HDFS Folder Metadata Link from which you want to import or export data on it
  3. Drag and drop a Sqoop Metadata Link on it
  4. Drag and drop the Database Table Metadata Link from which you want to import or export data on it
  5. Execute the Process

 

ProcessSqoop1

 

Note: For further information, please consult the tool's Process and parameters description.

 

Sqoop2

Sqoop2 has the same goal as Sqoop1 but is completely different at use.

Please refer to the Sqoop2 documentation to understand it's concepts and usage.

Stambia provides all the necessary tools to create and manage Sqoop2 Links and Jobs.

Tool Description
TOOL Sqoop2 Describe Connectors Retrieve information about the Sqoop driver and the available connectors.
TOOL Sqoop2 Create Link Create a Sqoop2 Link.
TOOL Sqoop2 Monitor Link Monitor Sqoop2 Links (enable, disable, delete).
TOOL Sqoop2 Create Job Create a Sqoop2 Job.
TOOL Sqoop2 Set Job FROM HDFS Generate the HDFS From part of a Job.
TOOL Sqoop2 Set Job FROM JDBC Generate the JDBC From part of a Job.
TOOL Sqoop2 Set Job TO HDFS Generate the HDFS To part of a Job.
TOOL Sqoop2 Set Job TO JDBC Generate the JDBC To part of a Job.
TOOL Sqoop2 Monitor Job Monitor Sqoop2 Jobs (enable, disable, delete, start, stop, status).

 

To use a tool:

  1. Drag and drop it in a Process
  2. Set the properties to your needs

 

Note: For further information, please consult the tool's Process and parameters description.

 

Demonstration Project

The Hadoop demonstration project that you can find on the download page contains examples for Sqoop.

Do not hesitate to have a look at this project to find samples and examples on how to use it.

Articles

Suggest a new Article!