Data Fusion

Definition

Google Cloud's fully managed, cloud-native data integration service for building and managing ETL/ELT pipelines with a visual interface.

Use Cases

Provider Equivalents

Frequently Asked Questions

What’s the difference between Cloud Data Fusion and Dataflow?
Cloud Data Fusion is a visual data integration tool for building and managing ETL/ELT pipelines using connectors and a drag-and-drop interface. Dataflow is a managed stream/batch processing service (Apache Beam) focused on large-scale data processing code pipelines. Use Data Fusion when you want faster integration with minimal code; use Dataflow when you need custom, high-scale processing logic (especially streaming) and fine-grained control.
When should I use Cloud Data Fusion?
Use Cloud Data Fusion when you need to integrate data from multiple systems (databases, files, and some SaaS sources), standardize/clean it, and load it into targets like BigQuery—especially when a visual designer, reusable templates, and managed operations (scheduling, monitoring) will speed delivery. It’s a good fit for teams that want to reduce custom ETL code and rely on managed connectors and pipeline patterns.
How much does Cloud Data Fusion cost?
Pricing is primarily based on the Data Fusion edition/instance type and how long the instance runs, plus the underlying resources used by pipeline execution (for example, Dataproc/Compute resources if used), and any data processing/storage costs in services like BigQuery and Cloud Storage. Costs typically increase with higher availability/throughput configurations, more concurrent pipelines, and heavier transformations.

Category: data

Difficulty: intermediate

Related Terms

See Also