The Parent-Child design pattern in Azure Data Factory (ADF) is a powerful approach to dynamically orchestrating data pipelines. It simplifies handling scenarios where multiple tasks need to process data dynamically based on input. This blog post outlines how to design a Parent pipeline that orchestrates a Child pipeline to create files dynamically in an Azure Data Lake Storage Gen2 container. We’ll use an example involving schema and table names dynamically fetched from a database.
Use Case
We want to generate files in Azure Data Lake Gen2 with filenames that are dynamically created using a combination of the database schema and table names. This design leverages the Parent pipeline for control and the Child pipeline for execution.
Parent Pipeline
The Parent pipeline handles:
Fetching Table Metadata:
A
Lookup
activity runs a query to fetch table schema and table names dynamically from the source database. Here's the query used:SELECT table_schema, table_name FROM information_schema.tables WHERE table_name LIKE 'Customer%'
This query returns a list of tables whose names start with "Customer" from the database.
Iterating Over the Results:
- A
ForEach
activity iterates through the array returned by theLookup
activity. Each iteration corresponds to a row in the table list (one table schema and table name pair).
- A
Calling the Child Pipeline:
- Inside the
ForEach
activity, anExecute Pipeline
activity is used to invoke the Child pipeline. The schema and table name from the current iteration are passed as parameters to the Child pipeline.
- Inside the
Child Pipeline
The Child pipeline is responsible for:
Copying Data to ADLS Gen2:
A
Copy Data
activity is used to transfer data from the source database to Azure Data Lake Storage Gen2.The destination filename is dynamically defined based on the parameters passed from the Parent pipeline (
schema
andtable_name
). For instance, if the table is in thesales
schema and is namedCustomerInfo
, the output file would be namedsales.CustomerInfo
.This is achieved using the Sink Dataset in the
Copy Data
activity. The filename is configured using the following expression:@concat(pipeline().parameters.schema, '.', pipeline().parameters.table_name)
This expression concatenates the schema and table name with a dot separator.
Steps to Create the Pipelines
1. Create the Parent Pipeline
Add a
Lookup
activity:- Configure it with the query to fetch
schema
andtable_name
from the source database.
- Configure it with the query to fetch
Add a
ForEach
activity:Set the
Items
property to the output of theLookup
activity.Inside the
ForEach
activity, add anExecute Pipeline
activity and pass theschema
andtable_name
dynamically.
2. Create the Child Pipeline
Add parameters to the pipeline:
schema
andtable_name
.
Add a
Copy Data
activity:Configure the Source dataset with the dynamic table (
schema.table_name
) if needed.Configure the Sink dataset:
- Set the
filename
property to use the expression@concat(pipeline().parameters.schema, '.', pipeline().parameters.table_name)
.
- Set the
Benefits of this Design
Reusability: The Child pipeline can be reused for different tasks since it is driven by parameters.
Scalability: The Parent pipeline handles multiple table schemas and table names dynamically without manual configuration.
Flexibility: You can extend this design for additional transformations or use cases
Conclusion
The Parent-Child design pattern in Azure Data Factory is ideal for dynamic and scalable data orchestration. In this example, you’ve learned how to:
Fetch dynamic metadata from a database.
Use
ForEach
to iterate through the metadata.Invoke a parameterized Child pipeline to handle dynamic data movement and transformations.
This design can be a robust foundation for automating data workflows in Azure.