Senior Applied Scientist, AI Data Platform (CoreAI)
Company: Microsoft
Location: Redmond
Posted on: October 31, 2025
|
|
|
Job Description:
Join Microsoft’s CoreAI team to build the AI Data Platform, the
foundation for secure, scalable, reusable datasets that power model
development. The AI Data Platform teams mission is to build a
central AI data platform that breaks down Microsoft’s data silos
and manages the full lifecycle of first-party, third-party,
synthetic, and human-labeled data, accelerating AI model
development with secure, reusable, and compliant datasets. The AI
Data Platform team is responsible for large-scale data
infrastructure, automation tools, and intelligence services to
transform how Microsoft collects, generates, manages, and shares AI
training data. We are seeking Applied Scientists to drive
scientific innovation in data generation, validation, evaluation,
and automation. You will set the vision for intelligent, ML-driven
services that manage the end-to-end data lifecycle, and partner
with leaders across Microsoft to ensure Microsoft’s data
investments deliver maximum AI impact. Qualifications Required
Qualifications Bachelors Degree in Statistics, Econometrics,
Computer Science, Electrical or Computer Engineering, or related
field AND 4 years related experience (e.g., statistics predictive
analytics, research) OR Masters Degree in Statistics, Econometrics,
Computer Science, Electrical or Computer Engineering, or related
field AND 3 years related experience (e.g., statistics, predictive
analytics, research) OR Doctorate in Statistics, Econometrics,
Computer Science, Electrical or Computer Engineering, or related
field AND 1 year(s) related experience (e.g., statistics,
predictive analytics, research) OR equivalent experience. 2 years
of experience applying machine learning or data science in
practical settings. Programming skills in Python and ML frameworks
(e.g., PyTorch, TensorFlow, Scikit-learn). Experience with data
analysis, dataset design, or evaluation methodologies. Other
Requirements: Ability to meet Microsoft, customer and/or government
security screening requirements are required for this role. These
requirements include but are not limited to the following
specialized security screenings: Microsoft Cloud Background Check:
This position will be required to pass the Microsoft Cloud
background check upon hire/transfer and every two years Preferred
Qualifications Master’s degree or PhD in Computer Science, Machine
Learning, Statistics, or related field, or equivalent experience. 4
years of experience applying machine learning or data science in
practical settings. Experience with LLM training pipelines,
synthetic data generation, or data-centric AI approaches. Knowledge
of PII detection, data privacy, fairness, or compliance in AI
systems. Familiarity with distributed data systems (e.g., Spark,
Databricks, Azure Data Lake). Strong collaboration skills with
engineers, TPMs, and product partners across multiple orgs. Applied
Sciences IC4 - The typical base pay range for this role across the
U.S. is USD $119,800 - $234,700 per year. There is a different
range applicable to specific work locations, within the San
Francisco Bay area and New York City metropolitan area, and the
base pay range for this role in those locations is USD $158,400 -
$258,000 per year. Microsoft posts positions for a minimum of 5
days, with applications accepted on an ongoing basis until the
position is filled. Responsibilities Advancing machine learning and
data science to improve data quality, automate dataset generation,
and design intelligent agent-driven services that manage the
end-to-end data lifecycle. Develop ML-based pipelines for data
generation, validation, augmentation, and discovery (e.g.,
synthetic data, human-in-the-loop workflows). Design and train
intelligent agents to automate key parts of the dataset lifecycle,
including ingestion, validation, PII detection and handling,
governance, discovery, and feedback loops. Build evaluation methods
to measure dataset quality, coverage, and usefulness for
large-scale model training. Leverage AI/ML techniques (e.g.,
classification, clustering, anomaly detection, embeddings,
LLM-based evaluation) to improve data discovery, curation, and
governance. Collaborate with engineers to integrate scientific
methods and models into scalable pipelines and platform services.
Partner with AI product and research teams (CoreAI, MAI, M365,
GitHub, MSR, and more) to align datasets with model training needs
and identify new opportunities. Contribute thought leadership by
publishing or sharing insights internally and externally to shape
Microsoft’s data-centric AI practices.
Keywords: Microsoft, Redmond , Senior Applied Scientist, AI Data Platform (CoreAI), IT / Software / Systems , Redmond, Washington