Question 1

What's the difference between a TPU and a GPU?

Accepted Answer

A GPU is a general-purpose parallel processor used for graphics and many kinds of compute, including AI. A TPU is a specialized accelerator designed primarily for neural network math (especially large matrix multiplications). TPUs can be very efficient for supported deep learning workloads, while GPUs are more flexible across a wider range of models, libraries, and custom operations.

Question 2

When should I use a TPU?

Accepted Answer

Use a TPU when you are training or serving neural networks that are well-supported by TPU software stacks (commonly TensorFlow or JAX with XLA) and your workload is dominated by dense linear algebra (matrix multiplies). TPUs are often a good fit for large-scale training, high-throughput inference, or when you want to scale out using TPU pods. If you rely on niche CUDA-only libraries, custom GPU kernels, or unsupported ops, a GPU may be a better choice.

Question 3

How much does TPU cost?

Accepted Answer

TPU cost depends on the TPU generation (e.g., v4 vs v5e), the number of chips, how long you run them, and the region/availability. Pricing is typically per TPU chip (or per TPU VM configuration) per hour, with additional costs for attached storage, networking, and any supporting services. For accurate numbers, check the current GCP Cloud TPU pricing page for your region and TPU type, and consider committed use discounts or reservations if available.

TPU

Definition

Use Cases

Provider Equivalents

Frequently Asked Questions

Related Terms

See Also