Spanish bilingual and Hispanic jobs since 1997. Diversity job fairs since 2006. employers     login   |   register - post a job
Hispanic Diversity Recruitment - best jobs for hispanic, latino & bilingual (spanish & portuguese) jobseekers
HOME
    Log me in!   |   Site Map   |   Help   
 Hadoop Site Reliability Engineer (SRE) - Cupertino, California, United States

   
Job information
Posted by: Apple 
Hiring entity type: Retail 
Work authorization: Not Specified for United States
Position type: Direct Hire, Full-Time 
Compensation: ******
Benefits: See below
Relocation: Not specified 
Position functions: Computers - IT Management
 
Travel: Unspecified 
Accept candidates: from anywhere 
Languages: English - Fluent
 
Minimum education: See below 
Minimum years experience: See below 
Resumes accepted in: English
Cover letter: No cover letter requested
Job code: 200159616 / Latpro-3741879 
Date posted: May-22-2020
State, Zip: California, 95014

Description

Hadoop Site Reliability Engineer (SRE)

Santa Clara Valley (Cupertino) , California , United States

Software and Services

Summary

Posted: May 21, 2020

Weekly Hours: 40

Role Number: 200159616

This position can be located in Santa Clara Valley (CA) or Austin (TX) Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Apple's Applied Machine Learning team has built systems for a number of large-scale data science applications. We work on many high-impact projects that serve various Apple lines of business. We use the latest in open source technology and as committers on some of these projects, we are pushing the envelope. Working with multiple lines of business, we manage many streams of Apple-scale data. We bring it all together and extract the value. We do all this with an exceptional group of software engineers, data scientists, SRE/devops engineers and managers.

Key Qualifications

  • SRE Experience on Hadoop based technologies - HDFS/Yarn cluster administration, Hive, Spark
  • Experience managing Hadoop/YARN clusters with thousands of nodes and 10's of petabytes of data running 10's of thousands of jobs
  • Have a passion for automation by creating tools using Python, Java or other JVM languages
  • Strong expertise in troubleshooting complex production issues.
  • Expert understanding of Unix/Linux based operating system
  • Excellent problem solving, critical thinking, and communication skills
  • Experience deploying and managing CI/CD pipelines
  • Experience with Solr cluster administration
  • Expertise in configuration management (such as Ansible, salt) for deploying, configuring, and managing servers and systems
  • The candidate should be adapt at prioritizing multiple issues in a high pressure environment
  • Should be able to understand complex architectures and be comfortable working with multiple teams
  • Ability to conduct performance analysis and troubleshoot large scale distributed systems
  • Should be highly proactive with a keen focus on improving uptime availability of our mission-critical services
  • Comfortable working in a fast paced environment while continuously evaluating emerging technologies
  • Proficient in unix, command-line tools, and general system debugging
  • The position requires solid knowledge of secure coding practices and experience with the open source technologies.

Description

Monitor production, staging, test and development environments for a many hadoop/YARN clusters spanning thousands of nodes, in an agile and dynamic organization. You like to automate anything which you do and you documents it for the benifit of others. You are an independent problem-solver who is self-directed and capable of exhibiting deftness to handle multiple simultaneous competing priorities and deliver solutions in a timely manner. Provide incident resolution for all technical production issues. Create and maintain accurate, up-to-date documentation reflecting configuration, and responsible for writing justifications, training users in complex topics, writing status reports, documenting procedures, and interacting with other Apple staff and management. Provide guidance to improve the stability, security, efficiency and scalability of systems. Determine future needs for capacity and investigate new products and/or features. Strong troubleshooting ability will be used daily; will take steps on their own to isolate issues and resolve root cause through investigative analysis in environments where the candidate has little knowledge/experience/documentation. Administer and ensure the proper execution of the backup systems. Provide 24x7 on-call support to handle urgent critical issues.

Education & Experience

BS in computer science with 7+ years or MS plus 5+ years experience or related experience.

Additional Requirements

  • Experience with Kubernetes, Docker Swarm, or other container orchestration framework.
  • Experience building and operating large scale hadoop/spark data infrastructure used for machine learning in a production environment.
  • Experience in tuning complex hive and spark queries.
  • Expertise in debugging hadoop/spark/hive issues using Namenode, datanode, Nodemanager, spark executor logs.
  • Exeprience in Capacity management on multi tenant hadoop cluster.
  • Exeprience in Workflow and data pipeline orchestration (Airflow,Oozie,Jenkins etc.)
  • Experience in Jupyter based notebook infrastructure.





Requirements

See job description

 

Apple requires you to fill in their on-line form which will open in a different window.

Enter your email address and click 'Apply':
       Apply
  Prefer not to enter your email?