Airflow Architecture — Airflow Core, Cloud Composer,MWAA,Astronomer
Hello All
Welcome back! This is the second blog post in our “Learn Airflow” series.
In this blog post, we’ll explore the core architecture of Airflow, and then dive into how this architecture slightly changes depending on the mode of deployment — be it on VMs, Kubernetes, or managed platforms like Cloud Composer and MWAA.
Whether you’re a beginner trying to understand how Airflow fits into your tech stack or an experienced engineer evaluating deployment options, this guide will help you understand the practical implications of each approach.
Core Airflow Architecture
At its core, Airflow consists of the following components:
- Scheduler: Triggers task instances based on DAG schedules.
- Webserver: UI to monitor and manage DAGs and tasks.
- Metadata Database: Stores DAG runs, task status, logs, etc.
- Worker(s): Executes tasks (via Celery/Kubernetes/LocalExecutor).
- DAGs Folder: Where workflow definitions are stored.
This base architecture remains consistent — but how you deploy these components can vary dramatically depending on your infrastructure strategy.
On Kubernetes
Airflow can be containerized and deployed on a Kubernetes cluster using the KubernetesExecutor or CeleryExecutor with pods.
- Architecture: Airflow components run as Kubernetes pods. Each task can spin up its own pod dynamically (KubernetesExecutor).
- Benefits: Scalability, isolation, modern DevOps practices.
- Best for: Teams already using Kubernetes, looking for scalability and fine-grained control.
Cloud Composer (Google Cloud Platform)
Cloud Composer is Google Cloud’s fully managed Airflow offering. Google handles the underlying infrastructure so you can focus on writing DAGs.
- Architecture: Managed Scheduler, Webserver, and Workers. You just upload your DAGs to a Cloud Storage bucket.
- Integration: Tightly coupled with BigQuery, GCS, Cloud Functions, etc.
- Best for: GCP-native teams wanting zero infrastructure overhead.
Google Cloud Composer 2 & Composer 3 Architecture differ slightly based on components deployed in Customer Project & In Tenant Project (Google managed)
MWAA (Amazon Web Services)
MWAA is AWS’s managed service for Apache Airflow. Like Cloud Composer, it abstracts away infrastructure management.
All of the components contained in the outer box (in the image below) appear as a single Amazon MWAA environment in your account. The Apache Airflow Scheduler and Workers are AWS Fargate containers that connect to the private subnets in the Amazon VPC for your environment. Each environment has its own Apache Airflow metadatabase managed by AWS that is accessible to the Scheduler and Workers Fargate containers via a privately-secured VPC endpoint.
- Architecture: Managed environment with automatic scaling. DAGs stored in S3, logs in CloudWatch, IAM-based permissions.
- Integration: Seamless with AWS services like S3, Redshift, Lambda.
- Best for: AWS-first organizations looking to simplify Airflow operations.
Astronomer
Astronomer is a commercial Airflow platform offering enterprise features like observability, RBAC, CI/CD pipelines, and multi-environment support.
- Architecture: Built on Kubernetes under the hood, but with a layer of tooling and automation for enterprise needs.
- Benefits: Prebuilt Dev/Stage/Prod environments, dashboards, analytics.
- Best for: Companies needing robust, scalable Airflow with enterprise support.
About Me
As an experienced Fully certified (11x certified) Google Cloud Architect, Google Developer Expert(GDE), with over 9+ years of expertise in Google Cloud Networking,Data ,Devops, Security and ML, I am passionate about technology and innovation. Being a Champion Innovator and Google Cloud Architect, I am always exploring new ways to leverage cloud technologies to deliver innovative solutions that make a difference.
If you have any queries or would like to get in touch, you can reach me at Email address — vishal.bulbule@techtrapture.com or connect with me on LinkedIn at https://www.linkedin.com/in/vishal-bulbule/. For a more personal connection, you can also find me on Instagram at https://www.instagram.com/vishal_bulbule/?hl=en.
Additionally, please check out my YouTube Channel at https://www.youtube.com/@techtrapture for tutorials and demos on Google Cloud & Data Engineering.
