Articles
| Open Access |
https://doi.org/10.55640/82gchj60
DATA LAKEHOUSE ARCHITECTURES AND CLOUD DATA WAREHOUSING: A UNIFIED THEORETICAL AND EMPIRICAL ANALYSIS OF MODERN BIG DATA ECOSYSTEMS
Abstract
Contemporary data-intensive environments have witnessed an unprecedented evolution in architectural paradigms designed to support scalable, flexible, and robust analytics. This research advances a comprehensive examination of integrative data architectures — namely data lakes, data lakehouses, and traditional data warehousing — through an interdisciplinary lens that synthesizes theoretical constructs, empirical insights, and strategic frameworks. Rooted in extant scholarship, including canonical treatises on cloud-based warehousing (Worlikar, Patel, & Challa, 2025) and multidisciplinary surveys of data lake constructs (e.g., MDPI, ResearchGate, and IEEE sources), the article interrogates the ontological and functional dimensions of these architectures. It establishes the historical trajectory of data management solutions, juxtaposes competing models, and elucidates the implications for data governance, metadata orchestration, and analytical performance. By engaging with scholarly debates and technical innovations, including ACID compliance, object store optimization, and adaptive metadata handling, this analysis foregrounds both the canonical and emergent contours of data ecosystems. The findings contribute to the formulation of integrative conceptual frameworks that inform academic inquiry and practical implementations in enterprise-level data platforms.
Keywords
data lakes, data lakehouse, data warehousing, metadata management
References
Worlikar, S., Patel, H., & Challa, A. (2025). Amazon Redshift Cookbook: Recipes for building modern data warehousing solutions. Packt Publishing Ltd.
Michael Armbrust et al. (2020). Delta Lake: High‑performance ACID table storage over cloud object stores.
Data Lakes: A Survey of Concepts and Architectures. MDPI.
Architecture of Data Lake. ResearchGate.
On data lake architectures and metadata management. HAL.
Data Lakes: A Survey of Functions and Systems. IEEE.
Data Lakehouse: A survey and experimental study. ScienceDirect.
Spatial big data architecture: From Data Warehouses and Data Lakes to the LakeHouse. ScienceDirect.
Toward data lakes as central building blocks for data management and analysis. PMC.
Data Lake Strategy: Its Benefits, Challenges, and Implementation. DataVersity.
Apache Hudi. https://hudi.apache.org
Apache Iceberg. https://iceberg.apache.org
Apache Parquet. https://parquet.apache.org
Apache ORC. https://orc.apache.org
Apache Hadoop. https://hadoop.apache.org
Amazon Athena. https://aws.amazon.com/athena/
Azure Synapse: Create external file format. https://docs.microsoft.com/en-us/sql/
BigQuery: Creating a table definition file for an external data source. https://cloud.google.com
Armbrust et al. (2015). Spark SQL: Relational data processing in Spark.
Ananthanarayanan et al. (2012). PACMan: Coordinated memory caching for parallel jobs.
Alagiannis, I., Idreos, S., & Ailamaki, A. (2014). H2O: a hands‑free adaptive store.
Bailis, P., Ghodsi, A., Hellerstein, J., & Stoica, I. (2013). Bolt‑on causal consistency.
Breck, E., Zinkevich, M., Polyzotis, N., Whang, S., & Roy, S. (2019). Data validation for machine learning.
Brantner, M., Florescu, D., Graf, D., Kossmann, D., & Kraska, T. (2008). Building a database on S3.
Dageville, B. et al. (2016). The Snowflake elastic data warehouse.
Boncz, P., Neumann, T., & Leis, V. (2020). FSST: Fast random access string compression.
The concept of an intelligent data lake management system: machine consciousness and a universal data model. ScienceDirect.
Framework architecture of a secure big data lake. ScienceDirect.
The next information architecture evolution: the data lake wave. ACM.
Article Statistics
Downloads
Copyright License
Copyright (c) 2025 Dr. Clara Meinhardt (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.