Client Region – USA
Company Size – 1000 – 5000 people
Domain – Civil Engineering
Challenges
Our client faced challenges in consolidating data from diverse sources, including APIs and Azure Excel sheets. To address this, they adopted Microsoft Fabric and implemented a layered data transformation approach—Bronze, Silver, and Gold layers—ensuring structured, efficient, and seamless data management:
- Bronze Layer: Raw data was loaded and stored as-is, with notebooks fetching data from the sources.
- Silver Layer: Specific columns were selected from the raw data for analysis, and calculations were performed to prepare the data for final use.
- Gold Layer: The final layer was designed for housing semantic models and final tables for analysis.
Despite this architecture, they faced significant challenges:
- Role Assignment and Access Issues: The misalignment of notebooks with their respective layers caused role assignment problems. For instance, the Silver Layer contained notebooks meant for both Silver and Gold layers. This forced administrators to grant access to the entire Silver Layer even for users who only required access to Gold processes, violating proper security and architectural standards.
- Inconsistent API Data: The data from APIs lacked uniformity, which disrupted calculations and workflows.
- Static Notebooks: The names of workspaces and notebooks were hardcoded, reducing flexibility.
- Data Calculation Issues: API inconsistencies resulted in misaligned calculations, requiring manual adjustments.
The client needed a solution to make their processes dynamic, address API inconsistencies, and optimize their data pipeline for accuracy and efficiency.
Key Features of the Solution
We analysed their existing architecture and implemented the following enhancements:
- Dynamic Workspace and Notebooks: We modified their notebooks to replace static workspace and notebook names with dynamic configurations. This improved flexibility and reusability across their environment.
- Data Consistency Improvements: To address API inconsistencies, we integrated an Excel file as a supplementary data source. This ensured that calculations in the Gold Layer were accurate and aligned with business requirements.
- Refinement of Layered Workflow:
Bronze Layer: Maintained raw data integrity for reference, ensuring only notebooks responsible for fetching data from sources were present in this environment.
Silver Layer: Streamlined column selection for focused datasets, ensuring only notebooks required for refining the data were present in this environment.
Gold Layer: Moved the notebook responsible for performing final calculations and loading data into tables from the Silver Layer to the Gold Layer, aligning with the purpose of this layer where semantic models and final data reside.
- Security Improvements: By aligning notebooks to their proper layers, role assignment became seamless. This resolved the issue of granting broader-than-necessary access to users, enhancing the security of each layer.
Qualitative Impact:
- Traceability & Auditability: Enhanced data lineage tracking and simplified debugging with raw data in the bronze layer.
- Governance: Improved data governance through a structured, well-defined approach.
- Collaboration & Maintainability: Clear layer separation made the pipeline easier to maintain and adapt.
- Reduced Reporting Complexity: Clean gold layer dataset reduced complexity in Power BI reports.
- Business Decision-Making: Timely, accurate insights supported better organizational decision-making.
Quantitative Impact:
- Data Quality: Reduced data discrepancies by 15%, improving the reliability of Power BI reports.
- Performance: Faster data processing with a 30% improvement in processing time and reduced query time for reports by 25%.
- Scalability: Enabled scalable pipelines, supporting a 50% year-over-year growth in data volume.
Conclusion
By leveraging Microsoft Fabric and implementing a dynamic, structured approach, the client successfully addressed challenges in their data pipeline. Dynamic configurations improved flexibility, while supplementing API data with an Excel source ensured accurate calculations. Refining the layered workflow and realigning notebooks enhanced data processing, security, and compliance.
As a result, the client’s data architecture became more efficient and adaptable, providing accurate insights and establishing a strong foundation for future data management.