This QuickStart AMI will allow you to quickly deploy and integrate open source Apache Airflow, enabling orchestration capabilities for AWS based Big Data and Machine Learning workloads, allowing Data Scientists to focus on SageMaker and EMR related developments rather than infrastructure preparations. This would be ideal for data scientists getting familiar with enterprise orchestration concepts and experimenting Apache Airflow in development environments using an EC2 instance. This image includes all up-to-date modules and prerequisites of Apache Airflow v2 releases
More comprehensive Apache Air Flow deployment guidance and relevant AWS cloud formation templates for production deployments can be found under the open source project Turbine; https://github.com/villasv/aws-airflow-stack.
More information regarding the Apache AirFlow can be found at https://airflow.apache.org/start.html
Instructions
After the deployment, you can access the AirFlow interface via a browser at http://public_dns_name:8080. Default administrator username is "airflow" , and the default password is your ec2 instance id. Please allow 3 to 5 mins. before trying to log-in for the first time after your provisioning, so that default users can be created. Please don't forget to update the default password after the setup, and/or change the default authentication method.
To connect to the operating system, use SSH and the username ec2-user.
AirFlow configuration files are stored under the /airflow directory. For instance, to modify the Apache AirFlow web server settings, you may update the [webserver] section of the configuration file /airflow/airflow.cfg
More configuration guidance can be found at Apache Airflow documentation.
After the configuration updates, you may restart the relevant services via;
sudo /bin/systemctl enable airflow-webserver
sudo /bin/systemctl start airflow-scheduler
After adding / updating new users, or changing the default authentication provider, please disable the default user;
sudo /bin/systemctl disable airflow-defaultpass
sudo /bin/systemctl stop airflow-defaultpass
Upgrades
Please use Amazon RDS based repository configuration for your Airflow deployment to ensure smooth migrations later on, and please make sure you apply the same configuration options when you migrate. AirFlow configuration files are kept under the /airflow directory