Welcome to my blog! Here you’ll find articles about data engineering, cloud computing, and software development, based on my professional experience and research.
Migrating ETL Workflows to Azure Databricks: A Case Study In this post, I’ll share my experience leading the migration of ETL workflows from legacy systems to Azure Databricks at Zürich Insurance. This project presented unique challenges and opportunities for modernizing our data infrastructure.
Project Overview The goal was to migrate existing ETL workflows from legacy systems to Azure Databricks, improving scalability, maintainability, and performance. The migration involved multiple data sources and complex transformations.
...
Building Scalable Data Pipelines with Apache Airflow
Building Scalable Data Pipelines with Apache Airflow Introduction Building scalable data pipelines is crucial for modern data engineering. In this post, I’ll share my experience and best practices for creating maintainable and efficient data pipelines using Apache Airflow.
Why Apache Airflow? Apache Airflow has become the de-facto standard for workflow orchestration in data engineering. Here’s why:
Declarative DAGs: Write your workflows in Python Rich Ecosystem: Extensive collection of operators and hooks Scalability: Can handle complex workflows with thousands of tasks Monitoring: Built-in UI and logging capabilities Community: Large, active community and regular updates Best Practices 1. Modular DAG Design Keep your DAGs modular and reusable:
...