[Full Workshop] Llama 3 at 1,000 tok/s on the SambaNova AI Platform

819 views · Feb 07, 2025 · 60:57 min · Watch on YouTube ↗

Takeaway

SambaNova's custom-chip full stack delivers 1,000+ tok/s Llama 3 inference and pitches expert ensembles as the enterprise AI path.

Summary

SambaNova workshop demonstrates running Llama 3 at 1,000+ tokens/sec — record speed per Artificial Analysis benchmarks — on their custom RDU chips.
Full-stack vertically integrated AI platform built since 2017, founded out of Stanford by Kunle Olukotun, Chris Re, Rodrigo Liang; over $1B in funding.
Targets enterprise/government for sovereign AI, trillion-parameter scale, fine-tuning + pre-training + inference all on one stack.
Composition of experts thesis — smaller fine-tuned open-source models orchestrated together beat large monolithic closed models for enterprise use cases.
Hands-on workshop builds against the Samba-1 Turbo demo via Python notebooks and API keys distributed in Discord.

inferencehardwaresambanova

Original description

In this workshop, you will learn how to build LLM-based apps, such as a question-answering system with RAG, in LangChain using Llama-3 at 1,000 tokens per second on the SambaNova AI Platform.

Level: Intermediate

Abstract: SambaNova delivers generative AI capabilities to the enterprise. In this workshop, you will learn:

● About SambaNova’s full-stack generative AI platform, powered by the SN40L AI chip and delivering unparalleled performance for training and inference

● Samba-1, a trillion parameter composition of experts (CoE) model, and how it can be used for enterprise settings

● How to build and deploy a question-answering app end-to-end with retrieval augmented generation (RAG) for enterprise search using the following suite: LangChain as framework, Unstructured for pre-processing text documents, E5-large-v2 embedding, ChromaDB vector store, and Llama-3-8B-Instruct running at speed record of 1,000 tokens per second via SambaNova.

This workshop is designed for tech professionals, engineers, and anyone interested in enterprise generative AI applications.

Prerequisites: Experience programming, ideally in Python, a Github account, and laptop

Assets: We will provide a link to the Github repo with step-by-step instructions on how to install the required libraries and how to run the Jupyter notebooks and Streamlit apps. We will also provide SambaNova API keys for the CoE and Llama-3 endpoints.

GitHub Repo: https://github.com/sambanova/ai-starter-kit/tree/main/workshops/ai_engineer_2024/ Dev Setup for Exercise 1: https://github.com/sambanova/ai-starter-kit/blob/main/workshops/ai_engineer_2024/basic_examples/README.md Dev Setup for Exercise 2: https://github.com/sambanova/ai-starter-kit/blob/main/workshops/ai_engineer_2024/ekr_rag/README.md