← back

Open Challenges for AI Engineering: Simon Willison

8.9K views · Jul 17, 2024 · 18:49 min · Watch on YouTube ↗
Takeaway

The GPT-4 moat has fallen but using these models well remains a hard, undocumented power-user skill that AI engineers must help others navigate.

Summary

  • GPT-4 monopoly is over: Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama 3 70B, Command R+, DeepSeek now compete in the GPT-4 class, while Haiku/Flash dominate the cheap tier
  • MMLU is bar-trivia and a poor proxy; LMSYS Chatbot Arena Elo is currently the best vibes-based comparison
  • ChatGPT is a power-user tool — even the question 'when is it effective to upload a PDF?' has nontrivial answers (searchable text, code interpreter quirks, image-screenshot workaround for tables)
  • Free access to frontier models (GPT-4o, Claude 3.5 Sonnet) means a wave of users will hit the same usability/capability discovery curve AI engineers went through 12 months earlier
llm-modelsevalsusability
Original description
About Simon
Simon Willison is the creator of Datasette, an open source tool for exploring and publishing data. He currently works full-time building open source tools for data journalism, built around Datasette and SQLite.

Prior to becoming an independent open source developer, Simon was an engineering director at Eventbrite. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Combinator funded company he co-founded in 2010.

He is a co-creator of the Django Web Framework, and has been blogging about web development and programming since 2002 at simonwillison.net