Architect Scenarios
Architects develop judgment through reps. Each scenario is one rep — a real situation with no obvious right answer, where you have to make a decision and explain why.
Networking
Handle a sudden 10x traffic spike with no warning
Your SaaS product was just featured in a major newsletter.
Design a DR strategy for a fintech app with a 4-hour RTO
You're the principal architect at a fintech startup.
Design a VPC layout for a three-tier web app
You're setting up a new production environment for a web application with three tiers: a load balancer that accepts public traffic, application servers running your API, and a database.
Private vs public subnets: where does each service live?
You're deploying a three-service architecture: a web frontend (React SPA served from S3/CDN), an API server, and a PostgreSQL database.
Choose a load balancer for a WebSocket-heavy real-time app
You're building a real-time collaborative whiteboard app.
Global traffic routing: latency vs cost vs failover
Your SaaS is growing globally.
Zero-trust network architecture for a remote-first company
Your 80-person engineering team is fully remote.
Security
Store patient records for a healthcare startup in the EU
You're the founding engineer at a healthcare startup.
Secrets management for a 3-service microservices app
You've split your monolith into 3 microservices: an API, a worker, and a scheduler.
Encrypt data at rest: KMS vs application-level encryption
Your B2B SaaS stores customer data in an RDS PostgreSQL database and S3.
WAF placement for a public API with 10M requests/day
Your developer-facing public API processes 10 million requests per day.
DDoS mitigation strategy for a B2C platform
Your consumer marketplace was hit with a volumetric DDoS attack last month — 50Gbps of UDP flood traffic that took the site down for 2 hours.
PCI-DSS compliance: isolate the card data environment
Your e-commerce platform is preparing for PCI-DSS Level 1 compliance.
Zero-trust access for an internal engineering tools platform
Your engineering team of 60 uses a suite of internal tools: GitHub Enterprise, Jira, internal CI/CD dashboards, and a private npm registry.
Storage
Choose a database for a growing blog
You're building a blog platform that currently has 10,000 registered users.
Design storage for user-uploaded photos at consumer scale
You're building a consumer app where users upload profile photos and share image posts.
Choose a backup strategy for 50TB of archival data
Your company generates 50TB of log and audit data annually that must be retained for 7 years for compliance.
Design a file-sharing layer for a document collaboration app
You're building a document collaboration SaaS — think Google Docs for legal teams.
Replicate a 10TB PostgreSQL database across two regions
Your SaaS platform is expanding from US to EU.
Choose an object storage strategy for a multi-tenant SaaS
Your B2B SaaS serves 200 enterprise tenants.
Cold storage for compliance: 7-year WORM retention
You're the architect at a financial services firm.
Containers
Monolith or microservices at Series A with 4 engineers
You just closed your Series A.
Package a Node.js API into containers for the first time
Your team has built a Node.
Choose between ECS and EKS for a 5-service microservices app
Your team is planning to deploy 5 microservices in containers.
Design a blue-green deployment strategy for containerized services
Your team deploys to production 3–5 times per week.
Service mesh: do you need one with 8 microservices?
Your platform has grown to 8 microservices communicating over HTTP.
Container image security: supply chain hardening strategy
After the Log4Shell incident, your security team has flagged container supply chain risk as a priority.
Multi-tenant Kubernetes: namespace isolation vs separate clusters
Your platform-as-a-service serves 30 enterprise tenants, each running containerized workloads on your shared Kubernetes infrastructure.
Compute
Pick compute for a personal project with a tight budget
You're building a side project: a REST API that powers a mobile app you're making for fun.
Choose a VM type for a memory-intensive analytics workload
Your data team runs nightly analytics jobs that join multiple large datasets in-memory.
Auto-scaling strategy for a flash-sale e-commerce site
You run an e-commerce site that holds daily flash sales.
Spot/preemptible instances: when is the risk worth it?
Your data pipeline runs nightly batch jobs that process 200GB of raw event data into analytics-ready tables.
Migrate a legacy on-prem Windows app to cloud compute
Your company runs a 10-year-old Windows Server 2012 application that handles internal HR workflows.
Multi-region active-active compute for a global gaming leaderboard
You're building the backend for a mobile game with 5M daily active users across North America, Europe, and Asia.
Right-size compute for a batch ML training job
Your data science team has trained a sentiment analysis model using PyTorch.
Serverless
Choose your cloud runtime for a first serverless function
You're building a lightweight backend for a contact form: receive a POST request, validate input, send an email via SendGrid, and log the event.
Choose a trigger type for an event-driven file processing pipeline
Users upload CSV files to your app for bulk data import.
Cold start mitigation for a latency-sensitive serverless API
Your Lambda-powered REST API serves a mobile app.
Serverless vs containers: finding the cost crossover point
Your API handles 2 million requests/day today, growing 20% month-over-month.
Orchestrate a 5-step data pipeline: Step Functions vs alternatives
You're building a data processing pipeline: (1) validate input, (2) enrich with third-party data, (3) run ML scoring, (4) write to data warehouse, (5) notify downstream systems.
Serverless fan-out architecture for 1M events per hour
Your platform processes user activity events — clicks, purchases, page views — at 1 million events per hour.
Stateful workflows in serverless: when do durable functions pay off?
You're building an order fulfillment system.
Identity
Add authentication to a new SaaS app: build vs buy
You're building a new B2B SaaS product.
OAuth scopes and least-privilege for a third-party integration
You're integrating with a customer's CRM via OAuth.
SSO for a B2B app: SAML vs OIDC vs federation
Three enterprise customers are ready to sign contracts — but each requires SSO.
Machine-to-machine auth: API keys vs service accounts vs mTLS
Your platform has 12 microservices communicating with each other.
Multi-tenant identity: shared vs isolated user pools
Your B2B SaaS currently stores all users in a single AWS Cognito User Pool with a tenant_id custom attribute.
Privileged access management for cloud infrastructure
Your engineering team of 20 all have broad AWS Admin access via IAM users.
Token strategy: short-lived JWTs vs opaque tokens vs session cookies
You're redesigning the auth token strategy for a mobile app + web app + public API.
Web
Is multi-cloud worth the complexity for this B2B SaaS?
You run a B2B analytics SaaS with 40 enterprise customers.
CDN strategy for a global news site with breaking-news spikes
You run a news website.
Choose a caching layer for a high-read REST API
Your REST API serves a product catalog: 500,000 SKUs with prices, descriptions, and availability.
Design A/B testing infrastructure at 5M daily users
Your product team wants to run 5 concurrent A/B experiments: different checkout flows, homepage layouts, and pricing page copy.
API gateway vs direct service exposure: when does it matter?
Your company is building a developer platform with a public API.
Edge compute vs origin: where does personalization logic live?
Your e-commerce homepage must show personalized product recommendations based on a user's purchase history, geographic location, and active promotions.
Rate limiting and throttling for a developer-facing public API
You run a developer-facing public API with 3,000 registered API consumers.
AI-ML
Choose a managed ML platform for your first model deployment
Your data science team has trained a classification model in Python/scikit-learn.
Batch vs real-time inference: which serving pattern fits?
Your ML team has built a churn prediction model.
Vector database selection for a RAG-based document Q&A app
You're building a RAG (Retrieval-Augmented Generation) app that lets users query a corpus of 2 million internal documents.
Fine-tuning vs RAG vs prompt engineering: when to use each
Your company wants to build an AI assistant for customer support that knows your product inside-out.
GPU instance strategy for training a 7B parameter model
Your ML team wants to fine-tune a 7B parameter open-source LLM (Llama-3-8B) on 50GB of proprietary text data.
Feature store design for a real-time recommendation engine
You're building a real-time product recommendation engine.
LLM inference at scale: latency vs throughput vs cost
Your product has an AI writing assistant powered by an LLM.
Messaging
First message queue: choose your cloud runtime
Your web app sends welcome emails, processes image thumbnails, and syncs data to a third-party CRM after user signup — all currently happening synchronously in the API response.
At-least-once vs exactly-once delivery: when does it matter?
You're processing payment events from a webhook.
Event streaming vs message queuing for order processing
You're designing the messaging layer for an e-commerce order processing system.
Dead letter queue strategy for a payment processing pipeline
Your payment processing pipeline uses SQS → Lambda.
Fan-out pattern: send one event to 20 independent consumers
When a user places an order, 20 downstream systems need to react: inventory deduction, shipping calculation, loyalty points, fraud scoring, analytics, 5 regional notification services, 3 ERP integrations, and 7 partner APIs.
Event sourcing + CQRS: when does the complexity pay off?
You're the architect for a financial trading platform.
Bridge events between AWS and Azure without tight coupling
After an acquisition, your company now has two engineering teams: one building on AWS (SQS, Lambda), one building on Azure (Service Bus, Azure Functions).