The data warehouse stores conformed, highly trusted data, structured into traditional star, snowflake, data vault, or highly denormalized schemas. Structured data storage in the data warehouse The storage layer can store data in different states of consumption readiness, including raw, trusted-conformed, enriched, and modeled. In a Lake House Architecture, the data warehouse and data lake natively integrate to provide an integrated cost-effective storage layer that supports unstructured as well as highly structured and modeled data. The data storage layer of the Lake House Architecture is responsible for providing durable, scalable, and cost-effective components to store and manage vast quantities of data. It can ingest and deliver batch as well as real-time streaming data into a data warehouse as well as data lake components of the Lake House storage layer. It provides the ability to connect to internal and external data sources over a variety of protocols. The ingestion layer in the Lake House Architecture is responsible for ingesting data into the Lake House storage layer. These modern sources typically generate semi-structured and unstructured data, often as continuous streams. In addition to internal structured sources, you can receive data from modern sources such as web applications, mobile devices, sensors, video streams, and social media. Many of these sources such as line of business (LOB) applications, ERP applications, and CRM applications generate highly structured batches of data at fixed intervals. The Lake House Architecture enables you to ingest and analyze data from a variety of sources. We describe these five layers in this section, but let’s first talk about the sources that feed the Lake House Architecture. You gain the flexibility to evolve your componentized Lake House to meet current and future needs as you add new data sources, discover new use cases and their requirements, and develop newer analytics methods.įor this Lake House Architecture, you can organize it as a stack of five logical layers, where each layer is composed of multiple purpose-built components that address specific requirements. This Lake House approach consists of following key elements:įollowing diagram illustrates this Lake House approach in terms of customer data in the real world and data movement required between all of the data analytics services and data stores, inside-out, outside-in, and around the perimeter.Ī layered and componentized data analytics architecture enables you to use the right tool for the right job, and provides the agility to iteratively and incrementally build out the architecture. The data lake allows you to have a single place you can run analytics across most of your data while the purpose-built analytics services provide the speed you need for specific use cases like real-time dashboards and log analytics. Lake House approachĪs a modern data architecture, the Lake House approach is not just about integrating your data lake and your data warehouse, but it’s about connecting your data lake, your data warehouse, and all your other purpose-built services into a coherent whole. This Lake House approach provides capabilities that you need to embrace data gravity by using both a central data lake, a ring of purpose-built data services around that data lake, and the ability to easily move the data you need between these data stores. In this post, we present how to build this Lake House approach on AWS that enables you to get insights from exponentially growing data volumes and help you make decisions with speed and agility. To overcome this data gravity issue and easily move their data around to get the most from all of their data, a Lake House approach on AWS was introduced. As data in these systems continues to grow it becomes harder to move all of this data around. To get the best insights from all of their data, these organizations need to move data between their data lakes and these purpose-built stores easily. At other times, they are storing other data in purpose-built data stores, like a data warehouse to get quick results for complex queries on structured data, or in a search service to quickly search and analyze log data to monitor the health of production systems. In order to analyze these vast amounts of data, they are taking all their data from various silos and aggregating all of that data in one location, what many call a data lake, to do analytics and ML directly on top of that data. Organizations can gain deeper and richer insights when they bring together all their relevant data of all structures and types and from all sources to analyze. October 2022: This post was reviewed for accuracy.
0 Comments
Leave a Reply. |