Question 1

What's the difference between AWS Glue and Amazon Athena?

Accepted Answer

AWS Glue prepares and organizes data (ETL) and stores table definitions in the Glue Data Catalog. Amazon Athena is a query service that runs SQL directly on data in Amazon S3. In practice, Glue often creates/maintains the tables and partitions, and Athena queries them.

Question 2

When should I use AWS Glue?

Accepted Answer

Use AWS Glue when you need to discover data (crawlers), maintain a central catalog of tables, and run managed ETL to clean, join, and transform data for analytics or machine learning. It’s a good fit for data lakes on S3, recurring batch pipelines, and situations where you don’t want to manage Spark clusters.

Question 3

How much does AWS Glue cost?

Accepted Answer

Pricing is usage-based. Common cost drivers include: (1) ETL job run time and the amount of compute allocated (measured in DPUs for many Glue job types), (2) number of crawler runs and their duration, (3) Data Catalog object storage (tables/partitions) and requests, and (4) any additional features you use (for example, development endpoints in older workflows). Exact costs depend on how long jobs run, how often crawlers scan, and how much data is processed.

Glue

Definition

Use Cases

Provider Equivalents

Frequently Asked Questions

Related Terms

See Also