We are seeking a highly skilled and motivated Software Engineer (Data) to join our data team. You will be responsible for designing, developing, and maintaining robust and scalable data processing systems exclusively on the AWS platform. The ideal candidate has a strong background in building data pipelines using services like AWS Glue and Lambda, extensive experience with PySpark and ETL frameworks, and is an expert in orchestrating complex workflows with Apache Airflow. You will work in a fast-paced environment to deliver cutting-edge, cloud-native data solutions.
Responsibilities
Design, implement, and maintain scalable, reliable data pipelines and ETL processes using AWS Glue and Python/PySpark.
Develop and deploy serverless data processing jobs using AWS Lambda triggered by services like S3, SNS, or scheduled events.
Orchestrate complex data workflows and dependencies using Apache Airflow.
Monitor, troubleshoot, and optimize the performance and cost-efficiency of AWS data pipelines and database queries.
Collaborate with cross-functional teams to understand data requirements and deliver end-to-end, data-driven solutions.
Perform data cleansing, validation, and preparation to ensure high data quality and integrity.
Create and maintain technical documentation and participate in code reviews to ensure high standards.
Required Skills & Experience
Education: Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field.
Experience: 4+ years of dedicated experience in data engineering, with a strong focus on the AWS ecosystem.
Core Technical Skills:
Advanced proficiency in Python and SQL.
Strong, hands-on experience with Apache Spark (PySpark) for large-scale data processing.
Experience with Apache Airflow for workflow orchestration is a plus.
AWS Services:
Proven experience building ETL pipelines with AWS Glue.
Proficiency in developing and deploying AWS Lambda functions for data processing.
Hands-on expertise with core AWS data services such as S3, EMR, Redshift, and Athena.
Good understanding of AWS IAM, CloudWatch, and CloudFormation for managing and monitoring resources is a plus.
Data Principles: Solid experience in data modeling, data warehousing concepts, and ETL/ELT design patterns.
Soft Skills:
Strong analytical and troubleshooting skills for complex data issues.
Self-motivated with a strong sense of ownership and the ability to work independently.
Preferred Skills & Qualifications
Streaming Data: Experience with real-time data streaming technologies like Amazon Kinesis or Apache Kafka
Containerization: Familiarity with containerization tools like Docker for packaging and deploying applications.