S3

Amazon S3 (Simple Storage Service) is AWS's object storage service that provides virtually unlimited, highly durable storage for any type of file — images, videos, logs, backups, static websites, and data lakes. Unlike a traditional file system, S3 organizes data as objects inside buckets, where each object consists of the data itself plus metadata and a unique key. S3 stores every object redundantly across multiple Availability Zones, delivering 99.999999999% (eleven nines) durability. You can control access with bucket policies and IAM permissions, enable versioning to recover accidentally deleted files, set lifecycle rules to automatically archive old data to S3 Glacier, and configure event triggers to invoke Lambda functions when new files arrive. The Azure equivalent is Azure Blob Storage; GCP's equivalent is Cloud Storage; OCI offers Object Storage. When would you use S3? Use S3 for any file that needs to be stored durably and accessed on demand: user-uploaded content (images, videos, documents), application logs and audit trails, static website hosting, machine learning training datasets, backups, and as the landing zone for data lake pipelines. It's also the default destination for services like CloudTrail, AWS Config, and ELB access logs. Common mistakes: making buckets publicly accessible by accident (always set Block Public Access at the account level), not enabling versioning on important buckets (a single accidental delete is unrecoverable without versioning), and forgetting that S3 is eventually consistent for overwrite and delete operations in some regions — plan read-after-write behavior accordingly.

Example: A photo sharing app stores all user photos in S3, serves them globally through CloudFront CDN for fast delivery, uses lifecycle rules to move photos older than 90 days to S3 Glacier for cheaper storage, and triggers a Lambda function on upload to generate thumbnails automatically. Architecture use case: an analytics pipeline lands raw CSV files in an S3 'raw' bucket, a Lambda function transforms and moves them to a 'processed' bucket, and Athena queries the processed files directly using SQL — a fully serverless data warehouse with no servers to manage.

Category: data

Difficulty: basic