Spark Component

This article describes the major changes of Spark Component.

If you need further information, please consult the full changelog.

Component download section can be found at this page.

Note:

Stambia DI is a flexible and agile solution. It can be quickly adapted to your needs.

If you have any question, any feature request or any issue, do not hesitate to contact us.

Component.Spark.2.1.0

This version contains some fixed issues which can be found on the full changelog.

Component.Spark.2.0.5

Handle spark session configuration when multiple targets are loaded with Spark

Spark will always use the same session within a single Spark Program, with previous version, when the session was created then it could not be configured anymore. As a consequence, we must know all the various session configurations at the beginning of the execution.

Spark component has been updated to be able to handle spark session configuration when multiple targets are loaded with Spark.

Spark Submit TOOL

When submitting a Spark job, it is possible to specify one or more resource files that will be made available for each Spark worker.

The Spark Submit TOOL has been updated to allow the ability to work with resource files through a new dedicated node in Metadata and a new parameter on the Spark Submit TOOL.

Ability to specify deploy mode (cluster or client)

A new field has been added on the Spark Metadata to allow to specify deploy mode (cluster or client) of the Spark program, when using spark-submit.

Complete changelog

The complete changelog with the list of improvements and fixed issues can be found at the following location.

Component.Spark.2.0.4

Minor improvements and fixed issues

This version contains some minor improvements and fixed issues, which can be found in the complete changelog.

Complete changelog

The complete changelog with the list of improvements and fixed issues can be found at the following location.

Component.Spark.2.0.3

Minor improvements and fixed issues

This version contains some minor improvements and fixed issues, which can be found in the complete changelog.

Complete changelog

The complete changelog with the list of improvements and fixed issues can be found at the following location.

Component.Spark.2.0.2

Change Data Capture (CDC)

Multiple improvements have been performed to homogenize the usage of Change Data Capture (CDC) in the various Components.

Parameters have been homogenized, so that all Templates should now have the same CDC Parameters, with the same support of features.

Multiple fixes have also been performed to correct CDC issues. Refer to the changelog for the exact list of changes.

Complete changelog

The complete changelog with the list of improvements and fixed issues can be found at the following location.

Component.Spark.2.0.1

New Templates to load data from and into Elasticsearch

Two new dedicated Templates have been added to load data from Elasticsearch into Spark, and to load data from Spark into Elasticsearch.

New Templates to load data from and into Parquet HDFS files

Two new dedicated Templates have been added to load data from Parquet HDFS Files into Spark, and to load data from Spark into HDFS Parquet Files.

Improve datatype conversion

Datatype conversion between various systems when working with Spark has been improved to better handle the different datatypes.

Fix kerberos command under Windows environment

An issue about kerberos command launched under Windows environment has been fixed.

The "kinit" command launched for initializing kerberos security was not formed properly for Windows environments.

Fix issue with partition truncation when loading data to Hive using SCD mode

When loading data from Spark into Hive through SCD mode, there was an issue when the changes on the data lead to a partition truncation.

In this situation Template execution would fail when trying to get the partitions to truncate.

This issue has been fixed.

spark

Spark Component

Component.Spark.2.1.0

Component.Spark.2.0.5

Handle spark session configuration when multiple targets are loaded with Spark

Spark Submit TOOL

Ability to specify deploy mode (cluster or client)

Complete changelog

Component.Spark.2.0.4

Minor improvements and fixed issues

Complete changelog

Component.Spark.2.0.3

Minor improvements and fixed issues

Complete changelog

Component.Spark.2.0.2

Change Data Capture (CDC)

Complete changelog

Component.Spark.2.0.1

New Templates to load data from and into Elasticsearch

New Templates to load data from and into Parquet HDFS files

Improve datatype conversion

Fix kerberos command under Windows environment

Fix issue with partition truncation when loading data to Hive using SCD mode

Articles