Microsoft Power BI gives organizations the insights they need to make confident decisions. But making the most of Power BI requires understanding the different ways you can work with it and taking the right approach for the right situation. A sometimes-overlooked capability that comes into play when data volumes start to exceed what you can do with just a Power BI dataset is Power BI dataflows. In those cases, we often recommend dataflows for the ease of development, increased efficiencies, and cost savings. Here’s why.
A dataflow is a collection of tables created in Power BI Workspaces, which is a Power Query process in cloud. It is an ETL (extract, transform, load) process that gets source data, transforms it, and loads it into Azure Data Lake Storage Gen2 storage account.
The key to dataflows is that they create an intermediary step between data sources and datasets, consolidating data from multiple sources in one place. This creates greater efficiencies across the board and provides a quicker path to actionable insights as users can take advantage of self-serve, no-code access to big data in Power BI.
We’ve seen that Power BI dataflows are most applicable when customers are looking for:
- Quick availability of cleaned up data: By default, dataflows store tables in ADLS Gen2, which is created by Power BI. We can also configure it to a specific ADLS Gen2 Storage Account, which can be accessed by other Azure services like Data Bricks or Data Factory, enabling quick data availability for AI and ML processes.
- A more efficient way to reuse information: There are many scenarios where organizations are sourcing the same table over and over, running multiple queries to fetch a table against the source. To avoid this, dataflows can do a Power Query operation and datasets can reuse the tables from dataflows in Power BI. Here’s an example: Say a date dimension is used in multiple datasets which is refreshed every time we refresh the datasets. If a dataflow is used for this date dimension, it is refreshed only once in the dataflow. From there, multiple Power BI datasets can use it. This helps minimize overall impact on data sources.
- Serves as a single source of truth: Dataflow implementation in small, medium, or large-scale organizations enforces a single source of truth for any data usage for reporting, analysis, ML, AI and data science needs. This provides much more standardization and control, as we can force analysts or developers to use tables from dataflows to build multiple views of datasets and visual dashboards.
- Certified dataflows reduce duplicate work: With certified dataflows, you can create pipelines that can be more easily leveraged across the organization so users aren’t duplicating development work. These certified dataflows that are fully QA’d and approved can be posted for use across the organization to help ensure data accuracy and reduce rework.
- Self-serve capabilities: Dataflows are no-SQL/no-code, so users without SQL or ETL backgrounds can leverage dataflows to build their own dashboards, since they don’t need to know the data source connection process. Even those with basic Power BI training can directly fetch tables from the data for reporting and analysis, propelling quicker insights.
- Works as an ETL process: For small-scale organizations with minimal data, dataflows can act as a no-code ETL process to quickly get the data from sources, clean, and shape it for their own needs.
- Acts as a centralized data warehouse: Dataflows sourced from enterprise warehouses (like Azure Synapse Analytics) can work as a centralized data warehouse for end users, power users, report developers, analysts, research teams, etc.
- Sharing workloads from datasets: Datasets can do both Power Query and Tabular Model functionalities that can end up in large and complex processes. Dataset refreshes start failing with increases in complexity and data size. Dataflows can be used to take up the ETL process (Power Query process) from datasets and the loads gets split between datasets and dataflows. Dataflows tables can be incrementally refreshed if the workspace is in Premium.
- Connection to other Microsoft services: Dataflows can be part of Microsoft Dataverse, which is a shared data model that lets you store and manage data from different source applications into ADLS so it can be used by several Microsoft technologies.
Visualize value in relevant use cases
Visuals help illustrate the possible use cases for dataflows. The example below demonstrates the concept of reusability: dataflow A is used to create two datasets, which are then used as the basis for follow-on reports:
Our second example shows different levels of transformation of data, where we can create multiple dataflows, which can then be used in one or more reports. Here, dataflow A is being used in dataflow B:
In this third example, we can take multiple dataflows (different subject entities) to create a dataset comprised of both source dataflows. Picture dataflow A coming from finance and dataflow B coming from sales – both are then used to populate Dataset AB for follow-on reporting:
Contact Inviso to leverage Power BI benefits
If you’re interested in learning how to increase efficiencies and reduce costs by improving the manageability, reliability, and scalability of your Power BI architecture, get in touch! We’re happy to talk with you about your Power BI needs and ways Inviso can help.