We are seeking an experienced Data Engineer to design, build, and maintain our scalable data infrastructure. The ideal candidate will architect reliable data pipelines, optimize data storage solutions, and enable advanced analytics across the organization. You'll collaborate with data scientists, analysts, and business teams to transform raw data into actionable insights while ensuring data quality, security, and accessibility.
Key Responsibilities:
Design and implement scalable data pipelines and ETL/ELT processes
Build and maintain data warehouses/lakes for optimal storage and retrieval
Develop data models and schemas for analytical and operational use cases
Optimize data processing performance and implement monitoring solutions
Ensure data quality through validation, cleansing, and governance practices
Collaborate with analytics teams to enable self-service data access
Implement data security measures and compliance controls
Automate data workflows and infrastructure provisioning
Troubleshoot and resolve data pipeline issues
Stay current with emerging data technologies and best practices
Technical Skills & Competencies: Data Processing:
Apache Spark (Databricks, PySpark)
Apache Kafka/Flume
Airflow/Luigi for workflow orchestration
Data Storage:
SQL (PostgreSQL, MySQL, SQL Server)
NoSQL (MongoDB, Cassandra)
Data Warehousing (Snowflake, Redshift, BigQuery)
Data Lakes (Delta Lake, Iceberg)
Cloud Platforms:
AWS (Glue, EMR, Athena, Kinesis)
Azure (Synapse, Data Factory, HDInsight)
GCP (Dataflow, Bigtable, Pub/Sub)
Programming & Scripting:
Python (Pandas, NumPy)
Scala/Java
SQL (advanced query optimization)
Bash scripting
Data Modeling:
Dimensional modeling
Star/snowflake schemas
Data vault modeling
DevOps & Infrastructure:
Docker/Kubernetes
Terraform/CloudFormation
CI/CD pipelines
Data Governance:
Metadata management
Data lineage tracking
GDPR/CCPA compliance
Qualifications: Education:
Bachelor's/Master's in Computer Science, Data Engineering, or related field