← back

AI Engineering at Jane Street - John Crepezzi

70.5K views · Mar 28, 2025 · 16:57 min · Watch on YouTube ↗
Takeaway

For obscure internal languages, custom fine-tuning on workspace snapshots + build-state transitions beats off-the-shelf coding models.

Summary

  • John Crepezzi describes Jane Street's AI Assistant team's challenge: the firm runs on OCaml (for web via js_of_ocaml, Vim plugins via vcaml, FPGA via hardcaml) — a low-resource language for which off-the-shelf coding models perform poorly.
  • Inspired by Meta's CodeCompose paper for Hack, they fine-tune a custom model to produce multi-file diffs from natural-language prompts, targeting up to 100-line changes that apply cleanly and type-check.
  • Training data isn't features (PRs are too large) or commits (used as checkpoints, no descriptions); instead they snapshot developer workstations every ~20 seconds plus build status, mining red→green transitions as isolated edits.
  • Prompts for each diff are generated by an LLM writing detailed descriptions, then filtered down to human-length.
  • Also covers editor integrations into Emacs (67% of firm), VS Code, and Neovim, plus eval infrastructure for picking best models.
fine-tuningocamlcode-generation
Original description
Programmers using mainstream languages enjoy a wealth of intelligent coding assistants and tools. At Jane Street, where we primarily use OCaml, we faced the challenge of building these tools for a powerful but low-resource functional programming language. This talk explores our journey in creating custom assistants and editor tooling for OCaml, tackling everything from data collection and model training to seamless editor integrations. We'll dive into our end-to-end process: gathering quality training data, developing meaningful evaluations for custom-trained models, building out underlying infrastructure, and creating tools that fit how we work.

Recorded live at the Agent Engineering Session Day from the AI Engineer Summit 2025 in New York. Learn more at https://ai.engineer and purchase tickets to our next event, the AI Engineer World's Fair, in SF June 3 - 5 here: https://ti.to/software-3/ai-engineer-worlds-fair-2025

About John

John Crepezzi is an engineer at Jane Street, where he works on building LLM-powered coding assistants and the infrastructure to enable others to create applications leveraging large language models. His work focuses on enhancing developer productivity, particularly in Jane Street's OCaml-centric environment.

Before joining Jane Street, John was a Principal Software Engineer at GitHub, where he contributed to several impactful projects in the developer productivity space, including Codespaces, Merge Queues, and Contribution Graphs. With a career dedicated to improving how developers work, John is passionate about creating tools that empower engineers to solve complex problems more effectively