Santa Clara Valley (Cupertino) , California , United States
Software and Services
Posted: Oct 1, 2020
Role Number: 200196768
We live in a mobile and device driven world where knowledge of the physical world around us is needed. We rely on this knowledge to get around, to learn about our environment and to enable spectacular new features for custom applications. Apple is meeting those needs as robustly and as creatively as possible and is interested in people who want to help meet that commitment. The success will be the result of very skilled people working in an environment which cultivates creativity, partnership, and thinking of old problems in new ways. If this sounds like the kind of environment that you find intriguing, then let's talk. We are looking for a Manager of Site Reliability Engineers to build and run new monitoring solutions that demonstrate a myriad of technologies and tools to achieve global monitoring of applications and services.
- Experience managing an engineering team on large-scale projects with technical deep-dives into code, networking, operating systems and/or storage.
- Experience with operating business-critical systems and with successfully addressing operational failures, performing Root Cause Analysis, scheduling maintenance downtime, and implementing system-wide corrections to prevent further reoccurrence of issues
- Production level expertise with Networking, OS, Security, and application monitoring
- Experience aligning resources to meet strategic goals, mentoring and coaching employees.
- Experience in one of the following languages: Java, Python, Go
- Strong communication skills to work well with multi-functional engineering teams addressing conflicts and critical issues.
- Strong organizational skills: ability to organize and complete multiple projects of varying lengths
- Excellent verbal and written communications skills
- Ability to explain technical concepts in clear, non-technical language
As the demand for our services continues to grow rapidly we are forming a team to best position ourselves to deliver a service to the highest level, as expected at Apple. We handle very large number of sites across the globe for monitoring, and support a wide variety of business units with unique needs that we can solve. Lead a team of software/systems engineers and be directly responsible for monitoring solution design and implementation globally. Ensure the highest level of up-time and Quality of Service (QoS) to Apple's customers through operational excellence. Design, and maintain globally distributed production monitoring systems. Ownership of performance stability issues using a wide variety of tools. Identify areas to improve service resiliency through techniques such as chaos engineering, performance / load testing, etc...
Education & Experience
B.S, or M.S in Computer Science, or equivalent experience