Business IntelligencePower BI

The Microsoft Fabric and Direct Lake revolution

By April 17, 2024 No Comments
Microsoft Fabric + Databricks

When first introduced, the lakehouse architecture was a significant advancement in data analytics. Today, Microsoft Fabric builds on that foundation, integrating a variety of Microsoft Azure services into a single unified data platform. Direct Lake is a feature within Fabric that allows Power BI users to directly access and analyze data stored in OneLake, making data analysis even quicker and easier. The combination of Fabric and Direct Lake opens up exciting new ways for data analysts to manage, analyze, and get insights from data more effectively. The payoff can be compelling: cost savings, increased efficiency, and the ability to drive greater business value and innovation.

Let’s take a look at one example of how this could play out in the real world. In this blog we’ll explore potential applications within the manufacturing industry. While we’ve selected manufacturing for illustrative purposes, the same general approach could be implemented across a variety of industries.

Looking ahead to the future of data analytics in manufacturing

Today, most manufacturers have to contend with vast amounts of data. Those with large numbers of  SKUs and/or large amounts of daily sales are facing challenges analyzing all of the information at hand. Often, data is collected from retailers daily to help with informed decision making.

For example, sales representatives from manufacturing companies are frequently stationed on the production floors of large-scale manufacturers and are responsible for negotiating shelf space. They’re tasked with figuring what to stock, placement, and arrangement. These decisions are critical as strategic placement is crucial for maximizing visibility and sales.

Representatives rely on dashboards to compare current sales against previous year data, helping prepare for seasonal spikes in demand.  Typically, a large manufacturer would have an average of 200 to 600 sales representatives per retailer in the US alone, all accessing the dashboards used for planning and strategizing every morning. Category management teams work with business intelligence developers,  data scientists, and data engineers to develop dashboards for strategy, sales, and marketing.

Azure Databricks is a leading data platform for these types of demanding data management challenges and has proven its effectiveness over the years. Despite its successes, the platform has faced some limitations. Integrating Databricks with Fabric leverages the strength of both solutions, offering a more robust solution as detailed below.

The architecture as it is: Databricks + Power BI

In a common scenario, a manufacturer relies on Databricks for heavy-duty data processing and Microsoft Power BI for semantic modeling and analytics. While this combination offers powerful analytical capabilities, handling billions of records from diverse retail sources, it’s not without its challenges. Dashboards use live connections to a large semantic model for each retailer. The semantic model is built using direct query since the data is large and cannot be imported in memory.  The primary issues at hand include:

Scalability and concurrency limitations: As the volume of data grows, the existing infrastructure struggles to maintain optimal performance, particularly during peak analytics workloads. Clients run dashboards in the mornings, which results in queries being queued or even failing at times.

Cost efficiency: The cost associated with scaling compute and storage resources in Databricks to meet the demands of extensive data analysis became increasingly prohibitive. Autoscaling the SQL warehouse in Databricks helps to a certain point, but can be costly.

Envisioned future architecture: Databricks + Microsoft Fabric

Building on the current architecture of Databricks and Power BI, the addition of Fabric introduces several transformative benefits that enhance data management and analytics capabilities:

  1. Enhanced concurrency and scalability: By leveraging Microsoft Fabric, manufacturers can significantly improve the concurrency of data queries and analytics operations. Fabric’s advanced data management capabilities allow for more efficient processing of large-scale datasets, reducing bottlenecks and enabling real-time insights.

According to Microsoft, Direct Lake queries are limited only by the size of the Fabric capacity. If the limits are exceeded, the semantic model falls back to using Direct query. Direct Lake is as fast as Import mode once the measures are run at least once within the session. Direct Lake mode caches just the data needed for a specific visual. Moreover, Import requires importing data which can be resource and time consuming, while Direct Lake allows near real time access to data in the Lakehouse.  These differences between the three query modes are summarized in this visual:

(Image courtesy of Microsoft)

Direct Lake connects the semantic model to a Lakehouse delta parquet table. Semantic models using Direct Lake perform better than semantic models using direct query. The technology behind Direct Lake allows the semantic model to query and load in-memory, loading only the data needed for a specific calculation.

  1. Cost-effective data operations: Fabric’s computation model offers a more cost-effective solution for handling vast amounts of data. By optimizing resource utilization and leveraging Fabric’s integration with Power BI, manufacturing organizations can achieve substantial savings on operational costs.
  2. Seamless integration and simplified data flow: Seamless integration between Databricks and Fabric simplifies the data management process. Data stored in Databricks can be effortlessly moved to Fabric’s Lakehouse architecture, where it’s going to be analyzed in Power BI using Direct Lake mode. This streamlined flow enhances the agility and speed of data analytics, enabling manufacturers to quickly adapt to evolving market trends and consumer demands.
To Be Architecture

The outcome: A new era of data analytics

The updated architecture marks a strategic shift for data analysts in manufacturing. Not only does it resolve existing challenges of concurrency, cost, and complexity – it also unlocks new opportunities for innovation and deeper insights.

Analysts can now delve deeper into retail data, uncovering nuanced customer behaviors and market dynamics that drive more informed strategic decisions. For example, analysts can easily create semantic models using Direct Lake from the Databricks-curated layer within a Fabric Lakehouse, leading to powerful reports with high ROI to the business. Data enriched using ML models can be saved in the Lakehouse in delta parquet format and used by analysts for advanced business intelligence reporting. Enhancements greatly improve the user experience, with seamless access to reports and dashboards without latency issues.

Leading the charge in data analytics innovation

Our examples have shown how using Fabric in existing Databricks analytics solutions along with Direct Lake Mode is the future of data analytics.  It’s a future where organizations across a wide range of industries – from manufacturing to healthcare to financial services and beyond – can overcome current limitations and embrace a more efficient, scalable, and cost-effective data analytics paradigm.

In an era where data is the currency of decision-making, businesses contemplating this transition are poised to lead the charge in innovation, leveraging the combined strengths of Databricks and Microsoft Fabric to redefine the possibilities of data analytics.

Jaouad Safouani

Author Jaouad Safouani

More posts by Jaouad Safouani