Article
Title: "Repairing ETL Processes using Extended Relational Algebra"
Authors: Judith Awiti, Robert Wrembel, Esteban Zimányi
Pages: 157-190
DOI: 10.2478/fcds-2025-0006
Abstract:

In a data warehouse architecture, heterogeneous and distributed data sources (DSs) are integrated by means of an extract-transform-load (ETL) layer, which runs integration processes (a.k.a. ETL processes). This layer is not static, since DSs being integrated change their schemas in time. A DS schema change impacts ETL processes, which typically stop working and need to be re-designed (i.e., repaired). Our overall goal is to repair automatically these ETL processes that were affected by DS schema changes. In this paper we focus on ETL processes specified by extended relational algebra, since relational data warehouses are among the most popular for business applications. For such a processes, we contribute a repair method. The method uses a rule engine that maps a possible DS schema change with: (1) an ETL operation on the changed schema element and with (2) a repair rule applicable if a DS schema element is changed. Based on this mapping, when a DS schema change occurs, our solution allows to apply adequate ETL rules to repair the affected ETL processes.

Open access to full text at De Gruyter Online