What is ETL Testing and its Importance?

What is ETL?

ETL stands for Extract, Transform and Load. It is used for extracting data from different sources and transforming it and loading the data for the end-users to access. These end users can be a database or any downstream users who want to access the data.

Why is ETL Testing?

ETL testing is a process in which we make sure that data from source data is correctly loaded into the destination after business transformation.

It can be either source to target testing for the data types or some kind of validation testing.

Documents and Inputs needed for ETL testing

In order to carry out ETL testing, we need various information that can be provided in several documents. Two of the important documents are given below.

  • ETL Mapping sheets or Source to Target Mapping document

It includes information on the source and target tables. It has information such as column detail, transformation details, and any look-up that is needed in reference tables.

  • Database Schema of Source and Target

An ETL tester needs to be comfortable with SQL queries, and they write big and complex queries with multiple joins to validate the data. They also need to be familiar with various transformations that SQL provides.

  • Test Strategy Document
  • Test Planning Document
  • Sample Test files if Applicable

Types of ETL Tests

There are many types of tests that can be performed during the life cycle of ETL Tests.

Smoke Tests

We perform smoke tests to check the records count between source and target. This testing is considered a success if the record count between source and target matches. If this testing fails, we need to check why there is a sound mismatch.

Primary Key Validation

Primary is a column or combination of columns that identify the unique keys. Some processes rely on the database to generate the primary keys, whereas some generate them themselves by using composite keys and primary keys. Many of the ETL jobs carry the primary keys from upstream sources to target. When testing these jobs, we need to make sure that the primary keys from the upstream system are copied correctly to the downstream.

Attribute Validation

Once the primary key is validated, we need to check the values of each attribute and make sure all the mapping is applied correctly. We also need to make sure the mapping is applied correctly to all the columns. If all the attributes match after this validation, we can move ahead. If we find some error during the validation, we need to find out what caused the error.

Duplicate Checking

This testing is done to check if there are duplicates or not. Every ETL process needs to check for duplicates before moving the data into the target. This is to make sure that no data duplication is carried out while moving the data to the downstream or target system.

Conclusion

In This blog post, we learned about ETL testing and its types. We also learned about the different tests that can be.

Do you use any ETL testing at your work?

Please share this blog post on social media and leave a comment with any questions or suggestions.