The following article aims to illustrate an example of how it is possible to use Azure Synapse Analytics with the wide family of services hosted in the Azure cloud, in order to create a modern data platform capable of responding to the most common needs that arise in an organization that wants to make full use of its data.
Cases of use of analyses
The use cases of the analysis that can be covered by the architecture are illustrated starting from the different data sources (structured, semi-structured, unstructured and streaming) that are positioned on the left side of the diagram, while the data flow, which is broken down into stages (insertion, storage, processing, enrichment and use) takes place from the bottom up.
Let’s see below how the various phases are structured.
Use Azure Synapse/Azure Data Factory pipelines to load data using a wide range of connectors to databases (both hosted in the cloud and on-premises) or semi-structured files (CSV or JSON files) or unstructured files (such as images, videos or the result of calls REST APIs).
Pipelines can be activated based on a predefined schedule, in response to an event, or they can be explicitly called via the REST APIs.
Organize your data lake (Azure Data Lake Store Gen 2) using zone-based best practices, folder structure, file format, and access policy (ACLS) for each analytics scenario.
By using the Azure Synapse/Azure Data Factory pipelines, it is possible, by exploiting copy data activities, to “stage” the data copied from data sources in the raw zone of the data lake; you can save the data in delimited text format (CSV) or in compressed columnar mode as a Parquet file.
Process & Enrich
Use the Data Flow component of the Azure Synapse/Azure Data Factory pipelines, Serverless SQL queries or Spark notebooks to perform validation processing, transformation and data enrichment in the curated/prepared area of the data lake.
Additionally, you can call up Machine Learning templates from SQL pools with standard T-SQL (via Predict command) or via Spark notebook; these ML models can be used to enrich datasets and generate other detailed information or be used by Azure cognitive services or directly in the Azure ML service.
It is possible to store the data in Synaspe SQL pool tables to implement the functionalities of an Enterprise Datawarehouse or, using the Serverless component of the Synapse engine, to implement a Logical Datawarehouse based on the treated area of the data lake.
Load relevant data from the Azure Synapse SQL pool or data lake into Power BI datasets or Tabular models from Azure Analysis Services, for data visualization: Power BI/Azure Analysis Services models represent in this case the semantic model to simplify the analysis of data and their relationships; business users, using Power BI reports and dashboards, can analyze data and extract detailed business information.
Structured and unstructured data stored in Synapse or the data lake can also be used to create knowledge mining solutions and use AI to discover more valuable business information.