This article discusses the advantages and disadvantages of ETL vs ELT for creating a data strategy. At Zuar, we advocated using ELT instead of the more traditional ETL due to the ease of eliminating errors and auditing data with ELT. It does not require data staging because the loading phase begins immediately, and its versatility makes it a modern approach that exceeds the capabilities of ETL.
Extract-Transform-Load (ETL) is the most common process for building a performant data model out of differing data sources and has been the main building block for data strategy for almost 50 years. Data is taken from one or more source data systems, merged and converted to a central model, then written to a final data store. The target data store is generally just referred to as a ‘database’ but may be a data warehouse, data mart, data lake or any other system where the resulting data can be efficiently queried.
The resulting data model from an ETL process designed to be efficiently queried, with a structure often very unlike that of the source systems. These data models were designed to answer a fixed set of queries instead of the more ad-hoc queries often used today.
Many implementations add additional steps to the ETL process – for example, cataloging or validation of the data – but the overall process continues to be known by ETL. The process of creating a data model, by any method, is frequently referred to as a pipeline.
ETL has been around for a long time, dating back to the 1970s – nearly as long as the concept of a database itself. The ETL process being this old means it was designed for radically different computer resources than are available today. Memory and storage were far more limited – and far more expensive – than those resources are today.
ETL was designed for efficient use of memory and storage at the cost of overall complexity. Transforms of the data were done on a record by record basis before being stored in the final model meaning that a user could only see any of the data once the final model was built. Failures of the process are time consuming to diagnose and fix.
Today storage is comparably cheap – especially cold storage, where the data is not being actively used.
Extract-Load-Transform (ELT) trades storage and memory for simplicity.
ELT brings all source data into a central data store – either the target database or a staging system – and then performs a set of simple transforms keeping the result at each stage until a final data model is achieved.
In an ELT process, the steps are not just reordered but the fundamental concept is changed – the results of each transform are kept so that the entire process is auditable. When something goes wrong, having the results of every transform – including a copy of the original source data – allows the problem to be quickly diagnosed and fixed. The picture of ELT may seem to make ELT more complicated than ETL, but in practice, it’s far easier to see what’s happening at each stage of the process.
ELT also allows additional data models to be easily added. In a traditional ETL process, creating another data model required an entirely different ETL pipeline. With ELT, a new data model can be created as a branch of an existing ELT pipeline.
What are the pros of ETL?
- Efficient use of system resources. It requires less storage space and processes structured data with minimal resource usage.
- Proven history of use. ETL has been in use for a long time, making it ideal for legacy data.
- Many tools are designed for this process.
What are the cons of ETL?
- More difficult to debug and fix errors. Because ETL chooses data early in advance, it is difficult to debug errors later on in the process. The rigidity of ETL makes it difficult to use in later stages.
- Process changes are notoriously slow to implement. While one might argue that ETL has been tested by time, it’s also true that it can feel outdated now. It only maintains efficiency when using the rigid structure of old data formats.
What are the pros of ELT?
- Entire process can be easily audited. ELT retains raw data making it easy to review each stage of the data pipeline and diagnosing a problem is easier when the results of each step are available.
- Error are easier to fix. The process can be easily restarted at the last successful step after errors are fixed without having to run the entire process from the beginning.
- Quickly extensible. Using raw data enables ELT to be highly flexible. This, combined with the fact that data is readily available, makes development easier.
What are the cons of ELT?
- More system resources required. There are more systems required to store ELT data, but the benefits outweigh the costs if flexibility and expanded capabilities are what you desire.
- May be slower due to additional writes. ELT might be slower due to persisting each step of the process.
Choosing a data strategy suitable for an organization is predicated on its needs and the resources available. But if one is looking for a methodology that provides not only immediate results but a foundation for a strategy that will grow with the organization, it’s hard to go wrong with ELT. That’s partly because it’s highly efficient; it only needs to load once and speed does not depend on data size. It’s low-maintenance, scalable, and while ELT requires a little bit more skill, increases efficiency and productivity in the long run because it allows you to work with raw data.
What's for sure, however, is that all organizations, both large and small, have a crucial need for quality and practical data processing tools. If you are interested in finding out how Zuar can help you with your data processing needs, talk with our team to find out more.
Get started with Zuar's business intelligence platform today.