Data Fusion
Definition
Google Cloud's fully managed, cloud-native data integration service for building and managing ETL/ELT pipelines with a visual interface.
Use Cases
- Google Cloud: Integrating operational data into BigQuery for analytics and reporting — Teams use Cloud Data Fusion’s visual studio and prebuilt connectors to ingest data from databases and SaaS sources, apply transformations, and load curated datasets into BigQuery; pipelines are scheduled and monitored from the managed service. (Faster pipeline development with less custom code, improved reliability through managed operations, and quicker time-to-insight for analytics users.)
- Sainsbury’s: Retail data integration to support analytics and decision-making — Uses Google Cloud data services (including BigQuery) and managed integration patterns to move and transform data from multiple sources into analytics-ready datasets; visual pipeline tooling reduces engineering effort for common ingestion and transformation tasks. (More timely analytics and improved ability to standardize and operationalize data pipelines across teams.)
Provider Equivalents
- AWS: AWS Glue
- Azure: Azure Data Factory
- GCP: Cloud Data Fusion
- OCI: Oracle Data Integrator (ODI)
Frequently Asked Questions
- What’s the difference between Cloud Data Fusion and Dataflow?
- Cloud Data Fusion is a visual data integration tool for building and managing ETL/ELT pipelines using connectors and a drag-and-drop interface. Dataflow is a managed stream/batch processing service (Apache Beam) focused on large-scale data processing code pipelines. Use Data Fusion when you want faster integration with minimal code; use Dataflow when you need custom, high-scale processing logic (especially streaming) and fine-grained control.
- When should I use Cloud Data Fusion?
- Use Cloud Data Fusion when you need to integrate data from multiple systems (databases, files, and some SaaS sources), standardize/clean it, and load it into targets like BigQuery—especially when a visual designer, reusable templates, and managed operations (scheduling, monitoring) will speed delivery. It’s a good fit for teams that want to reduce custom ETL code and rely on managed connectors and pipeline patterns.
- How much does Cloud Data Fusion cost?
- Pricing is primarily based on the Data Fusion edition/instance type and how long the instance runs, plus the underlying resources used by pipeline execution (for example, Dataproc/Compute resources if used), and any data processing/storage costs in services like BigQuery and Cloud Storage. Costs typically increase with higher availability/throughput configurations, more concurrent pipelines, and heavier transformations.
Category: data
Difficulty: intermediate
Related Terms
See Also