List: LLM deployments | Curated by Miguel Doctor Yuste

Jan 27, 2025
51 stories
LLM deployments 
In
AI Advances
by
Debmalya Biswas
Long-term Memory for AI AgentsWhy Vector Databases are not sufficient for Memory Management of Agentic AI Systems?
Dec 8, 2024
20
Dec 8, 2024
20
In
AI Advances
by
Isuru Lakshan Ekanayaka
Deploying and Managing Ollama Models on Kubernetes: A Comprehensive GuideDeploying machine learning models can be challenging, especially when aiming for scalable and maintainable deployments. Kubernetes (K8s)…
Nov 9, 2024
Nov 9, 2024
In
vmacwrites
by
Vidyasagar Machupalli
Nvidia MIG with GPU Optimization in KubernetesMulti-Instance GPU (MIG) is a technology that allows partitioning of a single GPU into multiple smaller, isolated GPU instances. This…
Dec 7, 2024
Dec 7, 2024
In
Level Up Coding
by
Weining Mai
Finetune Llama3.2:1b for free with Unsloth and use in Ollama locallyA few months ago, when I needed to finetune a multi-modal model, I had to rent an Nvidia GPU from Runpod and setup a virtual environment…
Nov 15, 2024
Nov 15, 2024
In
TDS Archive
by
Tula Masterman
Introducing Layer Enhanced Classification (LEC)A novel approach for lightweight safety classification using pruned language models
Dec 20, 2024
3
Dec 20, 2024
3
In
GoPenAI
by
kirouane Ayoub
Serving Large models (part two): Ollama and TGIIn the first part of our “Serving Large Models” series, we explored powerful tools like VLLM, LLAMA CPP Server, and SGLang, each offering…
Aug 19, 2024
Aug 19, 2024
In
Artificial Intelligence in Plain English
by
Sreedevi Gogusetty
From RAG to TAG: Leveraging the Power of Table-Augmented Generation (TAG): A Leap Beyond…As artificial intelligence (AI) continues to evolve, so do the methods it uses to interact with and leverage data. Two significant…
Dec 6, 2024
8
Dec 6, 2024
8
In
TDS Archive
by
Thomas Reid
Boost Your Python Code with CUDATarget your GPU easily with Numba’s CUDA JIT
Nov 20, 2024
2
Nov 20, 2024
2
In
TDS Archive
by
Nikola Milosevic (Data Warrior)
How to Easily Deploy a Local Generative Search Engine Using VerifAIAn open-source initiative to help you deploy generative search based on your local files and self-hosted (Mistral, Llama 3.x) or commercial…
Nov 21, 2024
3
Nov 21, 2024
3
In
TDS Archive
by
Eric Broda
Agentic Mesh — Principles for an Autonomous Agent EcosystemFoundational principles that let autonomous agents find each other, collaborate, interact, and transact in a growing Agentic Mesh…
Nov 19, 2024
1
Nov 19, 2024
1
In
TDS Archive
by
Maxime Jabarian
From Local to Cloud: Estimating GPU Resources for Open-Source LLMsEstimating GPU memory for deploying the latest open-source LLMs
Nov 18, 2024
1
Nov 18, 2024
1
In
Dev Genius
by
Tim Urista | Senior Cloud Engineer
Implementing AgentOps for Observability in Foundation Model-Based AgentsAs the capabilities of Large Language Models (LLMs) continue to advance, the development of foundation model-based autonomous agents has…
Nov 15, 2024
Nov 15, 2024
In
TDS Archive
by
Eric Silberstein
Tracing the Transformer in DiagramsWhat exactly do you put in, what exactly do you get out, and how do you generate text with it?
Nov 7, 2024
7
Nov 7, 2024
7
In
TDS Archive
by
Vinícius Trevisan
Using SHAP Values to Explain How Your Machine Learning Model WorksLearn to use a tool that shows how each feature affects every prediction of the model
Jan 17, 2022
7
Jan 17, 2022
7
Don Lim
What is 1-bit LLM? — Bitnet.cpp may eliminate GPUsMicrosoft introduces Bitnet.cpp, a lightweight AI model that can run efficiently on a portable device.
Oct 19, 2024
4
Oct 19, 2024
4
In
TDS Archive
by
Muhammad Ardi
Paper Walkthrough: Attention Is All You NeedThe complete guide to implementing a Transformer from scratch
Nov 3, 2024
12
Nov 3, 2024
12
In
Level Up Coding
by
Md Monsur ali
Meta LayerSkip Llama3.2 1B: Achieving Fast LLM Inference with Self-Speculative Decoding locallyA Comprehensive Guide to LayerSkip Technology, Its Advantages, Evaluation, and Practical Meta LayerSkip Tutorial in Local Machine
Oct 31, 2024
2
Oct 31, 2024
2
In
TDS Archive
by
Alex Punnen
Leveraging Smaller LLMs for Enhanced Retrieval-Augmented Generation (RAG)Llama-3.2–1 B-Instruct and LanceDB
Oct 18, 2024
6
Oct 18, 2024
6
In
TDS Archive
by
Thuwarakesh Murallie
I Fine-Tuned the Tiny Llama 3.2 1B to Replace GPT-4oIs the fine-tuning effort worth more than few-shot prompting?
Oct 15, 2024
29
Oct 15, 2024
29
In
TDS Archive
by
Thuwarakesh Murallie
How Much Stress Can Your Server Handle When Self-Hosting LLMs?Do you need more GPUs or a modern GPU? How do you make infrastructure decisions?
Oct 19, 2024
5
Oct 19, 2024
5