Navigating Challenges and Technical Debt in LLMs Deployment: Ahmed Menshawy

2.0K views · Dec 31, 2024 · 16:14 min · Watch on YouTube ↗

Takeaway

Enterprise LLM deployment lives or dies on RAG + scalable infra; auto-regressive models won't be AGI, so focus on operationalizing today's risks and unstructured data.

Summary

Ahmed Menshawy (Mastercard) notes 80%+ of enterprise data is unstructured and 71% of orgs struggle to secure it — LLMs finally make this usable as contextual memory.
Reuses RAG vs closed-book/fine-tuning framing — closed-book hits hallucination, datedness, model-editing problems, and customization difficulty; RAG decouples model from enterprise memory.
Cites Mastercard press release showing generative AI boosted fraud detection by up to 300%.
References NeurIPS 2015 'ML code is <5% of an ML system' chart to argue AI engineering is plumbing + infra + evals, not just API calls.
Warns LLMs are autoregressive — mistakes compound; AGI doomerism distracts from regulating current real-world risks (echoes Nature article).

enterpriseragtechnical-debt

Original description

Large Language Models (LLMs) have become essential in advancing AI, enabling remarkable capabilities in natural language processing and understanding. However, the efficient deployment of LLMs in production environments reveals a landscape of challenges and technical debt.

Ethically, LLMs face issues such as bias amplification, where they might perpetuate existing stereotypes in their outputs. Misinformation is another concern, with the potential misuse of LLMs to create convincing yet false narratives. Privacy risks emerge from LLMs possibly memorizing and revealing personal data. Moreover, societal challenges include the impact on employment, as LLMs could automate tasks but also lead to job displacement. These challenges highlight the need for careful management and ethical considerations in the deployment of LLMs.

In this talk, Ahmed will highlight the key challenges and technical debt associated with LLMs' deployment, which demands customization and sophisticated engineering solutions not readily available in broad-use machine learning libraries or inference engines.

Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025

About Ahmed
Ahmed Menshawy is the Vice President of AI Engineering at Mastercard's Cyber and Intelligence division. In this role, he leads the AI Engineering team, driving the development and operationalization of AI products and addressing the broad range of challenges and technical debts surrounding ML pipelines deployment. Ahmed also leads a team dedicated to creating a number of AI accelerators and capabilities, including Serving engines and Feature stores, aimed at enhancing various aspects of AI engineering.

He is also the co-author of the famous paper in ACM EuroMLSys, titled ""Navigating Challenges and Technical Debt in Large Language Models Deployment,"" which mainly focuses on the complexities involved in deploying large language models efficiently.

In addition, Ahmed is the co-author of ""Deep Learning with TensorFlow"" and the author of ""Deep Learning by Example,"" focusing on advanced topics in deep learning. He is also collaborating on an upcoming O'Reilly book, ""Graph Learning for the Enterprise,"" which aims to guide enterprises in efficiently training and deploying graph learning pipelines at scale.