Architecture Type | Definition | Suitable Data Types | Use Cases |
---|---|---|---|
Data Warehouse | Centralized repository that stores structured data from multiple sources for analysis and reporting | Structured data from operational systems, business applications, and databases | Business intelligence, historical analysis, enterprise reporting |
Data Lake | Storage repository that holds raw data in its native format until needed | Structured, semi-structured, and unstructured data (logs, images, videos, social media, IoT data) | Big data analytics, machine learning, data discovery |
Data Lakehouse | Hybrid architecture combining data lake storage with data warehouse capabilities | Both structured and unstructured data with schema enforcement when needed | Real-time analytics, unified data platform, combined ML and BI workloads |
Data Mesh | Decentralized architecture treating data as a product owned by domain teams | Domain-specific data products with standardized interfaces | Large organizations with diverse data domains, distributed ownership |
Data Fabric | Integrated layer of technologies and services providing consistent capabilities across environments | Enterprise-wide data from multiple sources requiring integration | Cross-platform data integration, metadata management, governance |
Lambda Architecture | Data processing architecture with batch and stream processing paths | High-velocity data requiring both real-time and batch processing | Real-time analytics with historical context, IoT applications |
Inmon Style : Source > Staging ( where Raw data lands ) > Enterprise Datawarehourse “ 3rd Normal Format “ is how we normalize and structure out data > Data Marts “ subset of the data that transformed to be consumed for reporting > BI tool
Kimball : Source > Staging > Directly to Data Marts > BI Tool “ Faster than Inmon approach but redundancy is everywhere “
Data Vault : Source > Staging > Raw Vault “ Where data is still raw “ > Business Vault “ where we apply business logic and transformations “ > Data Marts > BI Tools
Medallion Architecture : Source > Bronze Layer “ Where we have the data Raw which helps us find issues > Silver Layer “ We apply transformation and data cleansing but no business rules yet “ >
Gold Layer “ Where we can built objects not only for reporting but also for Machine Learning and AI “ > BI Tools
Since we chose the Medallion Architecture, here’s a diagram of it summarizing each stage and its characteristics :
Separation of concerns that we design every layer totally independent and it takes full charge of the tasks assigned to it example : if we clean data in silver layer, we handle every cleaning aspect in silver layer and don’t pass data to another layer to clean it wither its bronze or gold
Here, drawio is used to draw the diagram in a modern way to have a visual que for the whole design and what are our goals and how we initialize each layer based on what