Glue

Definition

AWS fully managed ETL service for preparing data for analytics. Like having a data processing factory that automatically cleans and organizes raw data.

Use Cases

Provider Equivalents

Frequently Asked Questions

What's the difference between AWS Glue and Amazon Athena?
AWS Glue prepares and organizes data (ETL) and stores table definitions in the Glue Data Catalog. Amazon Athena is a query service that runs SQL directly on data in Amazon S3. In practice, Glue often creates/maintains the tables and partitions, and Athena queries them.
When should I use AWS Glue?
Use AWS Glue when you need to discover data (crawlers), maintain a central catalog of tables, and run managed ETL to clean, join, and transform data for analytics or machine learning. It’s a good fit for data lakes on S3, recurring batch pipelines, and situations where you don’t want to manage Spark clusters.
How much does AWS Glue cost?
Pricing is usage-based. Common cost drivers include: (1) ETL job run time and the amount of compute allocated (measured in DPUs for many Glue job types), (2) number of crawler runs and their duration, (3) Data Catalog object storage (tables/partitions) and requests, and (4) any additional features you use (for example, development endpoints in older workflows). Exact costs depend on how long jobs run, how often crawlers scan, and how much data is processed.

Category: data

Difficulty: advanced

Related Terms

See Also