Introduction to Data Engineering Design Patterns (DEDP)

As we've seen the history and state of data engineering and its challenges in data engineering, design patterns should help us find more clarity in the data engineering ecosystem.

In this chapter, we will explore the core concepts of this book, starting from convergent evolution, what it is, and why it is essential for this book. Defining what a pattern is and the difference to a design pattern. We set that about both DE pattern and DE design patterns.

What is a Convergent Evolution?

So, what is a Convergent Evolution? In short, it's when two distinct evolutions' outcomes are the same. The most famous one is flying. Both a bird and a bee can fly but in different ways. The bird has developed sophisticated feathers, and a bee has an exoskeleton, which it learned and developed to fly. Their evolutions didn't talk to each other. The first feathered bird1 is known to be on earth 150 million years ago2 and the first bee a little later, 120 million years ago3.

Why is this relevant? It's because that is what we will explore in this book: convergent evolutions within data engineering that achieve the same outcome pattern and have been developed during different times.

What is a (DE) Pattern?

As convergent evolution mentions, the pattern results from two similar CEs. Let's clarify the difference between a pattern, design pattern, and data engineering design pattern, as we refer to in this book, separated into different chapters.

Generally, a "pattern" refers to a repeated, identifiable design, procedure, or practice across different contexts. This could be anything from natural patterns (like the stripes on a zebra) to patterns in software code to behavioral patterns in sociology. Patterns are recurring structures, shapes, events, or other observable phenomena that may present some predictability and repeatability. Again, it's the outcome of a deep analysis and comparison between similar CEs.

What is a (DE) Design Pattern?

In contrast, a "design pattern," particularly in the context of (software) engineering, is a specific term that refers to best practices. Design patterns are solutions to general problems developers face during development. These solutions are highly reusable and flexible to fit within any design, making them ideal for tackling recurring design problems.

The general idea of a design pattern comes from the legendary book Design Patterns: Elements of Reusable Object-Oriented Software. The book explains 23 design patterns and clusters them into creational, structural, and behavioral categories. It also shows how to solve a commonly occurring problem in a general repeatable solution typically used in the object-oriented software developing world.

Think of DP as a higher-level practice and a way of systematically applying the patterns. A design pattern describes that systematic approach in a detailed description that can be applied to other use cases.

What is a Data Engineering Design Pattern (DEDP)

To clarify, the data engineering design pattern from the general design pattern is only the fact of the domain. If I talk about design patterns in this book, I always refer to the data engineering ones. Also, the search for a DE pattern or DE design pattern is systematically done through convergent evolutions, which might differ from traditional ways.

In the upcoming chapters, you'll find more on foundational terms of data engineering, building up to patterns of data engineering, the Convergent Evolution (CE) — Ch. 3. Further in the book, you'll see a dedicated chapter to the two concepts DEP and DEDP:

  • Data Engineering Patterns (DEP) — Ch. 4: Refers to repeated or standard practices and procedures used in data engineering so far found through CEs. Including things like ETL (extract, transform, load) processes, data pipelining, or specific ways of handling data streaming. They could also refer to common data structures found in data or dedicated data architecture in data organizations.
  • Data Engineering Design Patterns (DEDP) — Ch. 5: Best practice solutions to common problems encountered in data engineering. These would be established, tested, and optimized approaches to tackling these recurring challenges. They may include architectural decisions, data modeling approaches, or data storage and retrieval strategies.

Why Design Patterns in Data Engineering?

Solving specific parts of the Data Engineering Lifecycle is generally hard as data engineering matures. Everyone has their ways and techniques. Design patterns aim to have standardized ways of solving particular problems.

There will always be alternative and unique ways to a design pattern, but the goal is to have a go-to for anyone starting in the field, to have standards that can be integrated into open tools standards.