Architecting a Resilient AI API Gateway: Deep Dive into Distributed Rate Limiting

In the modern era of Generative AI, computing power is the ultimate currency, and backend GPUs are fundamentally fragile. If you have ever integrated with an LLM provider, you are intimately familiar with the dreaded 429 Too Many Requests response. Providers enforce these limits to protect their infrastructure from malicious abuse (or poorly written while(true) loops) and to enforce tier-based monetization. If you are a platform engineer exposing an AI model to the world, a robust API Gateway isn’t optional—it is your primary line of defense. ...

February 20, 2026 · 6 min · 1278 words · Aaron Wu

Designing for the Surge: Architectural Trade-offs in Building a High-Concurrency Ticketing System

Executive Summary: What You Will Learn Building a system to handle massive, instantaneous traffic spikes—such as flash sales, sneaker drops, or limited-quota registrations—is rarely about writing clever algorithms. It is an exercise in systematic resource management and architectural trade-offs. In this article, I will unpack the engineering decisions behind FlashForm, a high-performance backend engine designed to survive the “Thundering Herd” problem. Whether you are a founder anticipating hyper-growth or an engineer scaling a bottlenecked system, this breakdown will provide a concrete framework for thinking about asynchronous load leveling, multi-tier inventory control, cache reliability, database tuning, infrastructure isolation, and testing asynchronous flows. We will explore not just how the system was built, but why specific conveniences were sacrificed to guarantee extreme performance and absolute data integrity. ...

February 20, 2026 · 7 min · 1299 words · Aaron Wu

[System Design] Why Is Your App 'Overselling'? A Founder's First Lesson in Concurrency

You’re Bob, a bootstrapped founder who can’t afford to hire a software engineer. In this era of “Vibe Coding,” you believe you can build a Minimum Viable Product (MVP) just by relying on AI. 🤖 Today, you finally figured out how to set up an API, handle frontend rendering, and connect a SQL database. You actually built an e-commerce app! The tests passed, and it’s finally time for the soft launch. 🚀 ...

January 9, 2026 · 4 min · 833 words · Aaron

[System Design] 為什麼你的 App 會「超賣」?給創業者的併發第一課

你是 Bob,一個請不起軟體工程師的創業者。在 Vibe Coding 的時代,你相信自己靠 AI 寫 MVP 沒問題的。 今天你學會架 API、做前端渲染、連接 SQL 資料庫,總算是做出了個電商 App,測試沒問題,終於可以試用期上線啦! 哇!結果一上線辦活動,你好不容易吸引來的用戶開始哀哀叫啦! 「Bob!我明明看到還有庫存,為什麼扣款了卻不出貨?!」 你去後台一看,庫存只有 10 個,卻賣出了 12 個!?你要去哪裡生出另外兩雙喬丹限量鞋? 你汗流浹背了。你可不想讓人發現你只是個摳門的創業者,這牌子剛打磨好就要砸啦! 你趕快跑來問後端工程師 Aaron。 「Aaron Aaron,這是甚麼情形?難道天要亡我,電腦就這麼故意多算兩筆數據嗎?」 Aaron 拍拍你的背,「這不是運氣不好,這是典型的 Race Condition (競態條件)。」 「Race Condition?」 🛑 為什麼你的程式碼會說謊? Bob,你寫的邏輯大概是長這樣吧? 讀取:看資料庫,還有貨嗎? (Select count...) 判斷:如果 > 0,就賣給他。 寫入:把庫存 -1,寫回資料庫。 (Update...) 這邏輯在只有你一個人測試時,完美無缺。 但當 User A 和 User B 在「同一毫秒」按下購買鍵時,悲劇發生了: A 看到庫存是 1。 B 同時也看到庫存是 1 (因為 A 還沒來得及扣掉)。 A 買到了,庫存變 0。 B 也以為自己買到了,庫存變 -1。 這就是 Race Condition。 ...

January 9, 2026 · 1 min · 164 words · Aaron