Spanish bilingual and Hispanic jobs since 1997. Diversity job fairs since 2006. employers     login   |   register - post a job
Hispanic Diversity Recruitment - best jobs for hispanic, latino & bilingual (spanish & portuguese) jobseekers
    Log me in!   |   Site Map   |   Help   
 AI/ML - Search Data Engineer, AI/ML Data - Seattle, Washington, United States

Job information
Posted by: Apple 
Hiring entity type: Retail 
Work authorization: Not Specified for United States
Position type: Direct Hire, Full-Time 
Compensation: ******
Benefits: See below
Relocation: Not specified 
Position functions: Computers - Programming Languages
Computers - Platforms
Computers - Networks
Computers - Software Engineer
Travel: Unspecified 
Accept candidates: from anywhere 
Languages: English - Fluent
Minimum education: See below 
Minimum years experience: See below 
Resumes accepted in: English
Cover letter: No cover letter requested
Job code: 200188037 / Latpro-3824615 
Date posted: Oct-17-2021
State, Zip: Washington, 98199


AI/ML - Search Data Engineer, AI/ML Data

Seattle , Washington , United States

Machine Learning and AI


Posted: Aug 25, 2020

Role Number: 200188037

Siri's universal search engine powers search features across a variety of Apple products, including Siri Assistant, Spotlight, Safari, Messages, and News. The Siri Data organization seeks to improve Siri by using data as the voice of our customers. Within this organization the Search Data Engineering team builds systems that process data reliably at scale to generate scalable and high quality datasets that support confident, data-driven decision making for Siri Search. We're looking for exceptional data engineers who are passionate about our product and values; who love working with data at scale; and who are committed to that hard work necessary to continuously improve. As a part of this group, you will work with petabytes of data daily using diverse technologies like Spark, Flink, Kafka, Hadoop and others. You will be expected to effectively partner with upstream engineering teams and downstream consumers, including analysts and product engineers. In this role you will build datasets to support analytics, experimentation, and machine learning. Specifically, you will build out stream processing applications powering real-time metrics and you will help to drive our self-serve strategy for reporting on-behalf of data scientists and product engineers as we collectively make Siri better.

Key Qualifications

  • You have excellent written and verbal communication skills
  • You are curious and have excellent analytical and problem solving skills
  • You are excited about digging into massive petabyte-scale semi-structured datasets
  • 1+ years of industry experience working with distributed data technologies (e.g. Hadoop, MapReduce, Spark, etc.)
  • Proficiency in at least one high-level programming language (Python, Go, Java, Scala, or equivalent)
  • Experience with large, complex, highly dimensional data sets; hands-on experience with SQL
  • You are pragmatic, not letting "the perfect" be the enemy of "the good"
  • You are self-directed and capable of operating amidst ambiguity
  • You are humble, continually growing in self-awareness and possessing a growth mindset
  • Extras we'd be excited about...
  • Experience building stream-processing applications using Apache Flink, Spark-Streaming, Apache Storm, Kafka Streams or others
  • Experience with data engineering in support of ML: Anomaly detection in time series data, engineering work to product-ionize models developed by data scientists, etc.


Developing data pipelines and/or software libraries to process, transform, and analyze data to identify signals from the billions of events we collect every day Designing and building abstractions that hide the complexity of the underlying big data stack (HDFS, Hadoop, Hive, Impala, Spark, Kafka, Parquet, etc) and that allow partners to focus on their strengths: product, data modeling, data analysis, search, information retrieval, and machine learning Defining and implementing the "source of truth" for our most fundamental data-such as search activity and content-as well as our core metrics across a variety of products Optimizing end-to-end workflows of data users (crafting libraries, providing abstractions to define jobs, scheduling data pipelines, managing access datasets, etc) Building internal services and tools to help in-house partners implement, deploy and analyze datasets with a high level of autonomy and limited friction. Surfacing datasets in near-real-time to mission critical products and business applications throughout the company, providing the signal that feeds our machine learning algorithms as well as our daily product-defining decisions Automating and handling lifecycle of datasets (schema evolution, metadata store, backfill management, deprecation, migration) Improving the quality and reliability of our pipelines (monitoring, retry, failure detection)

Education & Experience

Surprise us! Many will have an MS or BS in CS, Engineering, Math, Statistics, or a related field or equivalent practical experience in data engineering.


See job description


Apple requires you to fill in their on-line form which will open in a different window.

Enter your email address and click 'Apply':
  Prefer not to enter your email?