1/16/2024 0 Comments Aws architectural diagramsThe following are key points when considering a data mesh design: This approach can enable better autonomy and a faster pace of innovation, while building on top of a proven and well-understood architecture and technology stack, and ensuring high standards for data security and governance. In this post, we demonstrate how the Lake House Architecture is ideally suited to help teams build data domains, and how you can use the data mesh approach to bring domains together to enable data sharing and federation across business units. This model is similar to those used by some of our customers, and has been eloquently described recently by Zhamak Dehghani of Thoughtworks, who coined the term data mesh in 2019. At AWS, we have been talking about the data-driven organization model for years, which consists of data producers and consumers. The analogy in the data world would be the data producers owning the end-to-end implementation and serving of data products, using the technologies they selected based on their unique needs. Each service we build stands on the shoulders of other services that provide the building blocks. We aren’t limited by centralized teams and their ability to scale to meet the demands of the business. The end-to-end ownership model has enabled us to implement faster, with better efficiency, and to quickly scale to meet customers’ use cases. This is distinct from the world where someone builds the software, and a different team operates it. Service teams build their services, expose APIs with advertised SLAs, operate their services, and own the end-to-end customer experience. This data-as-a-product paradigm is similar to Amazon’s operating model of building services. This reduces overall friction for information flow in the organization, where the producer is responsible for the datasets they produce and is accountable to the consumer based on the advertised SLAs. They own everything leading up to the data being consumed: they choose the technology stack, operate in the mindset of data as a product, enforce security and auditing, and provide a mechanism to expose the data to the organization in an easy-to-consume way. Therefore, they’re best able to implement and operate a technical solution to ingest, process, and produce the product inventory dataset. If a discrepancy occurs, they’re the only group who knows how to fix it. They’re the domain experts of the product inventory datasets. For instance, product teams are responsible for ensuring the product inventory is updated regularly with new products and changes to existing ones. You can often reduce these challenges by giving ownership and autonomy to the team who owns the data, best allowing them to build data products, rather than only being able to use a common central data platform. However, managing data through a central data platform can create scaling, ownership, and accountability challenges, because central teams may not understand the specific needs of a data domain, whether due to data types and storage, security, data catalog requirements, or specific technologies needed for data processing. A different team might own data pipelines, writing and debugging extract, transform, and load (ETL) code and orchestrating job runs, while validating and fixing data quality issues and ensuring data processing meets business SLAs. For instance, one team may own the ingestion technologies used to collect data from numerous data sources managed by other teams and LOBs. Data platform groups, often part of central IT, are divided into teams based on the technical functions of the platform they support. Benefits of a data mesh modelĪ centralized model is intended to simplify staffing and training by centralizing data and technical expertise in a single place, to reduce technical debt by managing a single data platform, and to reduce operational costs. This approach enables lines of business (LOBs) and organizational units to operate autonomously by owning their data products end to end, while providing central data discovery, governance, and auditing for the organization at large, to ensure data privacy and compliance. In this post, we describe an approach to implement a data mesh using AWS native services, including AWS Lake Formation and AWS Glue. As you look to make business decisions driven by data, you can be agile and productive by adopting a mindset that delivers data products from specialized teams, rather than through a centralized data management platform that provides generalized analytics. They are eagerly modernizing traditional data platforms with cloud-native technologies that are highly scalable, feature-rich, and cost-effective. Organizations of all sizes have recognized that data is one of the key enablers to increase and sustain innovation, and drive value for their customers and business units.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |