The objective is to copy various types of files, including JSON, CSV, Text, and Excel files, using a single Copy data activity in Azure Data Factory (ADF) pipeline.
Step 1: Utilize the “Get Metadata” activity to retrieve the list of files stored in the blob storage. Select the “Child Items” option from the field list to include all items within the specified folder.
Dataset: In this scenario, a binary type of dataset was utilized. However, it’s also feasible to use a delimited text dataset to fetch the list of files. The choice between datasets does not impact the functionality.
Step 2: After obtaining the list of files, iterate through them using a For Each loop activity, and copy each file individually using a single Copy Data activity.
Step 3: Inside the For Each loop, utilize a single Copy Data activity to copy files from the source to the destination. In the image below, take the output of the For Each loop to iterate through file names and pass each name to the source dataset to identify the file from the blob storage.
Source Dataset:
Sink: For the sink side, there’s no need to add parameters as files can be copied without specifying their names. The files will be automatically copied to the destination with the same names. However, if needed, the file name can be passed through parameters to the sink dataset, allowing for dynamic expression to change the name of the file.
Sink Dataset:
Pipeline Run: After completing the above steps, execute the pipeline. This process will retrieve a list of all file types, preserving their original format from the source, and proceed to copy the files to the destination.
Azure Blob Storage: In this case, there are four files in different format. As depicted in the image above, the copy data activity is executed four times to transfer these files.
Input Folder:
Output Folder:
The demonstrated approach provides a simplified method for copying files of any format using a single source dataset, a single destination dataset, and a single copy data activity. This method eliminates the need to create multiple datasets for different file formats, streamlining the data copying process in Azure Data Factory (ADF).
© All Rights Reserved. Inkey IT Solutions Pvt. Ltd. 2024
Leave a Reply