Data Engineer
2+ years
Job Description
About the role
We’re looking for a hands-on Data Engineer with 2–5 years of experience to build reliable data pipelines, optimize data models, and support analytics and product use cases. You’ll work across batch and streaming workloads in the cloud, ensuring data is accurate, timely, and cost-efficient
Key Responsibilities
- Build Pipelines: Develop, test, and deploy scalable ETL/ELT pipelines for batch and streaming use cases.
- Model Data: Design clean, query-optimized data models (star schema, SCD, slowly changing logic as needed).
- SQL Excellence: Author performant SQL for transformations, materializations, and reports.
- Orchestrate Workflows: Implement DAGs/workflows with Airflow/Prefect; maintain SLAs and retries.
- Data Quality: Add validation checks, schema enforcement, and alerting (e.g., Great Expectations).
- Performance & Cost: Tune Spark/warehouse queries, optimize storage formats/partitions, and control costs.
- Collaboration: Work with Analytics, Data Science, and Product to translate requirements into data models.
- Ops & Reliability: Monitor pipelines, debug failures, and improve observability and documentation.
- Security & Compliance: Handle data responsibly (PII), follow RBAC/least privilege, and secrets management.
Technical Requirements
Must-Have
- Programming: Solid Python (pandas, PySpark or data frameworks); modular, testable code.
- SQL: Strong SQL across analytical databases/warehouses (e.g., Snowflake/BigQuery/Redshift/Azure Synapse).
- ETL/ELT: Experience building production-grade pipelines and transformations.
- Cloud: Exposure to at least one cloud (AWS/Azure/GCP/Databricks) for data storage and compute.
- Big Data/Compute: Hands-on with Spark (PySpark) or equivalent distributed processing.
- Orchestration: Airflow or Prefect (DAGs, schedules, sensors, retries, SLAs).
- Version Control & CI/CD: Git workflows; basic CI for data jobs.
- Data Formats: Good understanding of Parquet/ORC/Avro, partitioning, and file layout.
- Analytics/BI: Familiarity with Looker/Power BI/Tableau and semantic modeling.
Nice-to-Have
- Data Virtualization: Familiarity with Data Virtualization Tools like Denodo is a huge plus.
- Streaming: Kafka/Kinesis/Event Hubs; basics of stream processing (Flink/Spark Structured Streaming).
- dbt: Experience with dbt for SQL transformations, testing, and documentation.
- Governance, Quality & Lineage: Collibra, Alation, Ataccamma, Great Expectations, Soda, OpenLineage/Marquez.
- Containers: Docker basics; Kubernetes exposure is a plus.
- Infra as Code: Terraform/CloudFormation for data infra provisioning.
What We Offer
- A dynamic and collaborative work environment.
- Opportunities for professional growth and development.
- Competitive compensation and benefits.
- The chance to shape impactful products that solve real-world problems.
- Exposure to cutting-edge technologies and tools, with opportunities to innovate and explore new business solutions.