Job Title: Data Engineer + Databricks Developer

Location: Hyderabad, India

Experience Required: 5+ Years

Technical Knowledge:
Databricks (PySpark, SQL, Notebooks), Delta Lake, Apache Spark, Python, Azure Data Factory (ADF), Azure Synapse Analytics, AWS S3, Glue, Lambda, CI/CD Pipelines, GitLab, Data Quality Tools, Power BI/Tableau, Data Cataloging, Data Governance, Unity Catalog, MLflow, Job Clusters

Role Summary:
We are seeking a Databricks Developer with over 5 years of experience in building and optimizing scalable data pipelines using Databricks and Apache Spark. This role requires strong expertise in PySpark, Delta Lake, and cloud-native data engineering across Azure and/or AWS. You will lead the development of real-time and batch data solutions, contribute to data architecture, and ensure high performance, cost efficiency, and reliability across platforms. The ideal candidate is passionate about distributed computing, clean code, and modern data stack automation.

Key Responsibilities:

Design, build, and optimize robust ETL/ELT pipelines on Databricks using PySpark, SQL, and Delta Lake.
Lead the development of reusable and modular Notebooks, Jobs, and Workflows for both batch and streaming workloads.
Collaborate with data architects and engineers to build scalable data lakes and data warehouse layers on cloud platforms.
Implement best practices for coding standards, version control, testing, and performance tuning in Databricks.
Use Unity Catalog to manage metadata, lineage, access policies, and data governance across workspaces.
Integrate Databricks with orchestration tools like ADF, AWS Glue, or Airflow, depending on the environment.
Build and manage CI/CD pipelines using GitLab, GitHub Actions, or other DevOps tools for automated deployments.
Work with business users and data scientists to develop high-performance pipelines for analytics and ML models.
Monitor and troubleshoot Databricks Jobs, Job Clusters, and compute usage, optimizing for cost and speed.
Ensure security and compliance across data assets, leveraging row-level security, tokenization, and audit logging.
Maintain comprehensive documentation of pipelines, job configs, and data processing logic.

Requirements:

5+ years of hands-on experience working with Databricks, building scalable, distributed data pipelines using PySpark and SQL.
Strong understanding of Apache Spark internals, cluster tuning, job performance optimization, and partitioning strategies.
Expertise in working with Delta Lake, schema evolution, and ACID-compliant data lakes.
Proficient in Python for scripting and data transformation, and SQL for data wrangling and analysis.
Experience integrating Databricks with Azure (ADF, Synapse) and/or AWS (S3, Glue, Lambda) environments.
Knowledge of Unity Catalog or similar governance tools for managing access, lineage, and policies.
Solid understanding of data modeling principles, especially for lakehouse and medallion architecture.
Experience implementing CI/CD pipelines and DevOps practices in a Databricks environment.
Strong communication skills to work with stakeholders, analysts, and engineers across teams.
Bachelor's or Master’s degree in Computer Science, Engineering, or a related technical field.

Preferred Skills:

Experience with Databricks Workflows, MLflow, and Job Clusters for advanced data science and automation use cases.
Familiarity with data observability tools and performance monitoring dashboards.
Knowledge of Power BI/Tableau and building semantic layers on top of Databricks SQL endpoints.