Big Data Engineer / Developer – focus on Hadoop / Spark
€750-1500 EUR
Paid on delivery
Key Responsibilities As (Senior) Big Data Engineer / Developer you will be closely working with IT architects to elicit requirements, to optimize the system performance as well as to advance its technological foundation.
Manage very large-scale, multi-tenant and secure, highly-available Hadoop infrastructure supporting rapid data growth for a wide spectrum of innovative internal customers Provide architectural guidance, planning, estimating cluster capacity, and creating roadmaps for Hadoop cluster deployment Install Hadoop distributions, updates, patches, version upgrades Design, implement and maintain enterprise-level security (Kerberos, LDAP/AD, Sentry, etc.) Develop business relevant applications in Spark, Spark Streaming, Kafka using functional programming methods in Scala Implement statistical methods and machine learning algorithms to be executed in Spark applications, which are automatically scheduled and running on top of the Big Data platform Identify new components, functions and features and drive from exploration to implementation Create run books for troubleshooting, cluster recovery and routine cluster maintenance Troubleshoot Hadoop-related applications, components and infrastructure issues at large scale Design, configure and manage the strategy and execution for backup and disaster recovery of big data 3rd-Level-Support (DevOps) for business-critical applications and use cases Evaluate and propose new tools and technologies to meet the needs of the global organization Work closely with infrastructure, network, database, application, business intelligence and data science units.
Key Requirements, Skills and Experience University degree in computer science, mathematics, business informatics or in another technical field of study Deep expertise in distributed computing and the factors determining and affecting distributed system performance
Experience with implementing Hadoop clusters in a large scale environment, preferably including multitenancy and security with Kerberos Excellent hands-on working experience with Hadoop ecosystem for at least 2 years, including Apache Spark, Spark Streaming, Kafka, Zookeeper, Job Tracker, HDFS, MapReduce, Impala, Hive, Oozie, Flume, Sentry, but also with Oracle, MySQL, PSQL Strong expertise in functional programming, object oriented programming and scripting, i.e. in Scala, Java, Ruby, Groovy, Python, R Proficiency with IDEs (IntelliJ IDEA, Eclipse, etc.), build automation (Maven, etc.) and continuous integration tools (Jenkins, etc.) Strong Linux skills; hands-on experience with enterprise-level Linux deployments as well as shell scripting (bash, tcsh, zsh) Well versed in installing, upgrading & managing distributions of Hadoop (CDH5x), Cloudera Manager, MapR, etc. Hadoop cluster design, cluster configuration, server requirements, capacity scheduling, installation of services: name node, data node, zookeeper, job tracker, yarn, etc. Hands-on experience with automation, virtualization, provisioning, configuration and deployment technologies (Chef, Puppet, Ansible, OpenStack, VMware, Docker, etc.) Experience working in an agile and international environment – excellent time-management skills Excellent communication skills and high level of motivation (self-starter) Strong sense of ownership to independently drive a topic to resolution Ability and willingness to go the extra mile and support the overall team Business fluent English in speech and writing, German is a plus.
Project ID: #16310248