Link: The last six months in LLMs in five minutes via Simon Willison

Based on his ongoing research of LLMs, Simon Willison summarized the last six months of LLM developments at PyCon US 2026. In November, he found that coding agents crossed a quality threshold where they could be used to get real work done, and he started new projects to see how far he could push them. He also noted that the OpenClaw project started that same month. In the past month, he found that local models, while weaker than frontier models, had started to outperform his expectations.

It took a little while for this to become clear, but the real news from November was that the coding agents got good. OpenAI and Anthropic had spent most of 2025 running Reinforcement Learning from Verifiable Rewards to increase the quality of code written by their models, especially when paired up with their Codex and Claude Code agent harnesses. In November the results of this work became apparent. Coding agents went from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done, without needing to spend most of your time fixing their stupid mistakes.

A lot of stuff happened just in the past month. Google released the Gemma 4 series of models, which are the most capable open weight models I’ve seen from a US company. Also last month, Chinese AI lab GLM came out with GLM-5.1—an open weight 1.5TB monster! This is a very effective model… if you can afford the hardware to run it. The other neat Chinese open weight models in April came from Qwen. Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7. That’s a 20.9GB open weights model that runs on my laptop! (I think this mainly demonstrates that the pelican on the bicycle has firmly exceeded its limits as a useful benchmark.)

Via: https://simonwillison.net/2026/May/19/5-minute-llms/