Open Challenges for AI Engineering: Simon Willison

8.9K views · Jul 17, 2024 · 18:49 min · Watch on YouTube ↗

Takeaway

The GPT-4 moat has fallen but using these models well remains a hard, undocumented power-user skill that AI engineers must help others navigate.

Summary

GPT-4 monopoly is over: Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama 3 70B, Command R+, DeepSeek now compete in the GPT-4 class, while Haiku/Flash dominate the cheap tier
MMLU is bar-trivia and a poor proxy; LMSYS Chatbot Arena Elo is currently the best vibes-based comparison
ChatGPT is a power-user tool — even the question 'when is it effective to upload a PDF?' has nontrivial answers (searchable text, code interpreter quirks, image-screenshot workaround for tables)
Free access to frontier models (GPT-4o, Claude 3.5 Sonnet) means a wave of users will hit the same usability/capability discovery curve AI engineers went through 12 months earlier

llm-modelsevalsusability

Original description

About Simon
Simon Willison is the creator of Datasette, an open source tool for exploring and publishing data. He currently works full-time building open source tools for data journalism, built around Datasette and SQLite.

Prior to becoming an independent open source developer, Simon was an engineering director at Eventbrite. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Combinator funded company he co-founded in 2010.

He is a co-creator of the Django Web Framework, and has been blogging about web development and programming since 2002 at simonwillison.net