Skip to content

AI Infrastructure Architect & Enterprise Solution Architect

About
Research
- PersonaGraph
Contact
Disclaimer

Tag: vLLM

Why a 7B Parameter Model Won’t Run Comfortably on a 14 GB GPU (And Why Most Engineers Get This Wrong)

If you’ve recently started working with Large Language Models (LLMs), you’ve probably seen a calculation like this: 7 Billion Parameters × 2 Bytes (FP16) ≈ 14 GB At first glance, it seems perfectly reasonable to conclude: “A GPU with 14 GB of VRAM should be enough.” Unfortunately, that’s one of the most…

Amit Patriwala

June 30, 2026
The Hidden Context Window Problem in RAG Systems: A Real Production Incident with vLLM and Qwen3

When Your 32K Context LLM Fails at 4K Tokens: A Production vLLM Troubleshooting Guide One of the most common misconceptions in Generative AI systems is: “The model supports 32K context, so my application automatically supports 32K context.” In production, that assumption can lead to unexpected failures. Recently, we encountered a production issue…

Amit Patriwala

June 22, 2026

Type your email…

Posts

Tools: Deep Agents (langchain-ai)

July 2, 2026
Tools: RAG-Anything (HKUDS)

July 2, 2026
Tools: ade-python (LandingAI)

July 2, 2026
News: Microsoft Makes Governance the Gate for Enterprise AI Agents

July 2, 2026
Tools: ADE Document Processing Skills (LandingAI)

July 2, 2026
Tools: OpenWiki (langchain-ai)

July 2, 2026

.pem file permission setup in windows Agentic AI AI Agents AI Architecture AI Engineering AI For Beginners AI Infrastructure Artificial Intelligence ASP.NET CORE Azure SQL Database Compare vCore and DTU based model Conversational AI Could not load file or assembly 'Microsoft.Extensions.Configuration.Abstractions Create a scrollable Gridview in asp.net 2.0 Deep Learning Embeddings Enterprise AI Generative AI HIERARCHYID Data Type HIERARCHYID Data Type In Sql Server 2008 Hybrid AI Importance of AI in Marketing Inference Optimization Large Language Models Learning AI LLM LLMOps Machine Learning Multimodal AI new data types in sql server 2008 Predicted trends for AI in Marketing Production AI Prompt Engineering Python PyTorch RAG Retrieval Augmented Generation Revolutionizing Marketing Success: Using Predictive AI Semantic Search TensorFlow Understanding AI for marketing Understanding the Data Vector Search vLLM Windows SSH: Permissions for 'private-key' are too open

AI Infrastructure Architect & Enterprise Solution Architect

Loading Comments...

Write a Comment...

Email (Required)

Name (Required)

Website

Notifications