Amazon Machine Learning Conference (AMLC 2025)
Published:
I’ve spent the last few days in Seattle at Amazon’s internal Machine Learning Conference (AMLC). If last year was defined by the frontier of GenAI capabilities, this year the focus shifted decisively toward agents, reliability, and real-world deployment. The conversation has moved from “Can we do X?” to “How do we evaluate, govern, and safely operationalize X at scale?”. It felt like a distinctly Amazonian event: pragmatic, execution-oriented, and full of hallway discussions about shipping real systems and delivering customer impact.
I participated in the Machine Learning for Healthcare Roundtable and gave a talk on LLM trustworthiness in medical product question answering, drawing on my earlier work on Rufus. But the real highlight was learning from the impressive work happening across teams. Below are a few themes that stood out to me.

Pattie Maes on human flourishing with AI
One of my favorite moments at AMLC was seeing Pattie Maes give a keynote. Pattie is a professor at the MIT Media Lab, where I did my PhD, and she has been thinking about agents and augmentation for decades, long before today’s wave of foundation models. She pioneered the concept of “software agents” in the 90s.
Her keynote traced an arc from early work on robotic and software agents to modern research on contextual, multimodal assistants that help with memory, daily functioning and decision making. But the core of her talk was a clear challenge of AI: the impact of AI on people is not uniformly beneficial. While always-on, agentic AI promises to improve productivity, current evidence suggests the impact is “mixed at best”. Pattie emphasized the risk of deskilling, where reliance on AI atrophies human capability over time, and pointed to a growing body of studies showing that pervasive AI assistance can influence or weaken fundamental human capacities.
For example, she cited a striking Lancet study in which oncologists who used AI assistance for three months improved their diagnostic accuracy during that period. However, when the AI was taken away, they were 20% less effective at identifying cancerous lesions than they had been before the experiment began. In addition to this, she also pointed out AI’s tendency towards sycophancy, prioritizing telling users what they want to hear over the truth, which also homogenizes thought and reduces critical thinking. She also noted that in her group’s recent work, students using ChatGPT showed significantly less prefrontal cortex activity (the area associated with thinking) compared to those writing essays with standard tools.
This creates a paradox for us as builders: How do we build systems that assist without automating away the user’s agency? To tackle this, Pattie described new initiatives at the MIT Media Lab aimed at building a science of human–AI interaction, such as the Advancing Humans with AI (AHA) program.
Among other efforts, the AHA team is working on benchmarks for the human impact of AI, analogous to today’s technical benchmarks (e.g. for accuracy, latency, etc.), and an “atlas” mapping how design choices influence understanding, critical thinking, social connection and well-being.
Pattie’s keynote was a powerful reminder that as we push toward more capable and autonomous AI systems, we must stay grounded in the question of how they reshape human skills. AI will not only transform workflows; it will transform the people inside them.
Chronos-2: the “foundation model” paradigm comes to time series forecasting
While LLMs and agents dominated many of the talks, one of the most significant technical unlocks I saw at AMLC was in the domain of time series forecasting, which is often the invisible backbone of logistics, retail, and healthcare.
Amazon’s new model, Chronos-2 (available chronos-forecasting), moves beyond traditional univariate setups and treats forecasting as a universal, multivariate, zero-shot problem, bringing the foundation model paradigm to time series forecasting.
Chronos-2 is built as an encoder-only transformer with two key ingredients:
- Time attention, which captures patterns within an individual series over time.
- Group attention, which lets the model share information across related series in-context.
This second feature is what unlocks the “foundation model” behavior: groups of time series and external covariates provide structure the model can leverage without explicit fine-tuning. Instead of retraining or adjusting models for every target, Chronos-2 learns from relationships between series, not just within them.
