Brief Description
Reporting to the DataOps Engineering Lead, the job holder will play a pivotal role in designing, implementing, and maintaining cloud-based data pipelines within our data ecosystem. S/he will work closely with data platform engineers, machine learning engineers, data scientists, data analytics engineers and software engineers to ensure seamless data integration, processing, and storage. This role requires a strong understanding of data engineering principles, software engineering best practices, and the ability to build scalable and efficient data pipelines
Key Responsibilities:
Platform Architecture: Design and develop a scalable and extensible data engineering platform to support the organization's data-driven initiatives. Architect data pipelines, storage solutions, and frameworks to handle large volumes of data efficiently.
Data Integration and Processing: Implement data ingestion pipelines to integrate data from various sources, including databases, data warehouses, APIs, and streaming platforms. Develop ETL (Extract, Transform, Load) processes to preprocess and clean raw data for analysis.
Analytics Tools and Technologies: Evaluate, select, and integrate analytics tools and technologies to support data exploration, visualization, and modeling. Implement and optimize databases, data warehouses, and analytics frameworks such as SQL, Hadoop, Spark, and Elasticsearch.
Scalability and Performance: Optimize data processing pipelines and analytics workflows for scalability, performance, and efficiency. Implement parallel processing, distributed computing, and caching mechanisms to handle large-scale data engineering workloads.
Data Governance and Security: Ensure compliance with data governance policies, regulatory requirements, and security best practices. Implement access controls, encryption, and auditing mechanisms to protect sensitive data and ensure data privacy and confidentiality.
Monitoring and Maintenance: Develop monitoring and alerting systems to track platform performance, data quality, and system health. Proactively identify and resolve issues to minimize downtime and ensure uninterrupted data analytics operations.
Automation and DevOps: Implement automation pipelines for infrastructure provisioning, configuration management, and deployment. Establish continuous integration and continuous deployment (CI/CD) processes to streamline platform development and operations.
Documentation and Training: Document platform architecture, data pipelines, and analytics workflows. Provide training and support to data analysts and data scientists to ensure effective use of the data analytics platform.
Qualifications
Degree in Science/computer science/Engineering or any other related field.
Solid understanding of data engineering principles, techniques, and methodologies.
Solid understanding of distributed systems.
Over 5 years experience building cloud based data engineering solutions
At least 2 years of experience in software development in any of the following programming languages Python, Java, or Scala.
At least 2 years of experience working on streaming processing tools such as Amazon Kinesis, Apache kafka, Apache Flink, apache pulsar.
At least 3 years of experience working with data processing frameworks such as EMR , Apache Spark, Apache Hadoop, or Apache Nifi.
Familiarity with database systems such as SQL databases, NoSQL databases, and distributed file systems both cloud and on premise.
Experience with cloud platforms such as AWS, GCP, or Azure.
Strong problem-solving skills and attention to detail.
Proficient understanding of distributed computing principles
Experience in collecting, storing, processing and analyzing large volumes of data.