Need to handle big data at scale? Azure SQL Database Hyperscale may be the solution

By July 29, 2021 No Comments
Server farm

Azure SQL Database Hyperscale is still somewhat of a hidden gem in the Azure data ecosystem, although it’s now been available for more than two years, reaching public preview in September 2018, and general availability (GA) in May 2019. Unlike other editions of Azure SQL (general purpose and business critical) and Azure SQL Managed Instance, Azure SQL Hyperscale is a more modular cloud offering in that the key operations of a database have been split into independent services. The compute, log processing, and storage are completely decoupled, allowing each to perform and grow independent of the others.

Reap the benefits of scalability

Compute can be scaled up and down quickly, and while it doesn’t currently auto-scale out of the box, logic apps, etc. can be created to do that. Read-write operations go directly into local SSD data cache on the compute tier and are then sent to a Premium Azure Blob Storage log landing zone. They are then picked up by the log service, which distributes those transactions to page servers in the storage tier and the local SSD data cache of the read-only replicas. This all happens at a rate of 100mb/sec.

Big data scale solutions

Source: Microsoft

Azure SQL Database supports database sizes up to 4TB and Azure SQL Managed Instance will support up to 8TB. Meanwhile, Azure Synapse is intended for big data that can be efficiently spread across its 60 nodes, so if your data size doesn’t have at least one key table with a minimum of 60 million records, it will be very difficult to reap the benefits of the Azure Synapse architecture.

On the other hand, Azure SQL Hyperscale will start at 40GB by default with one page server and will automatically add page servers incrementally for roughly every additional 128GB. You don’t have to design how to distribute your data across those page servers to get the best performance and you don’t have to be concerned about running out of space for the data or the log because both will scale automatically as needed.

Meet customer downstream data needs

From an analytics perspective, the biggest benefit is that Hyperscale allows you to have up to four read-only replicas to direct ad-hoc queries and other read-only operations away from the primary read-write node that would typically compete with its resources, possibly slow performance and force scaling up. The read-only processes specify an application intent of read-only in their connections and are automatically routed to one of the read-only replicas. Now in public preview, named replicas are available where you can have up to 30 read-only replicas with a different database name from the primary read-write database node.

The named replicas can be on separate logical servers, can have a different (less expensive) service level objective, and can have different authentication and different logins, so they are completely isolated from the primary node. This provides isolated user environments for downstream data consumers to have real-time access to data without competing for resources and allows each user group to determine the cost/performance tier best suited to their needs. It also protects the primary read-write node from failures to specify the read-only application intent in the connection string by processes that only require read-only access.

Finally, backup of the database is nearly instantaneous because it is all snapshot-based; the process of creating a backup does not impact the performance of the primary compute node or any of the read-only replicas. Restoring a database is very fast because it’s also snapshot-based and not based on reading a data file and a transaction log.

Azure SQL Hyperscale is very similar in architecture to Amazon Aurora and Google Cloud SQL. All three are the true SaaS implementation of a relational SQL database, taking advantage of a services-based architecture and independently scalable modular database operations. It’s an ideal fit for large OLTP databases, but it will take customer demand to get vendors to certify their software to run on it. From an analytics perspective, if you don’t need the ability to scale to more than 100TB in the near future, it’s a great cloud target platform for most on-premises SQL Server data warehouses, and the migration process will likely be much simpler than to Azure Synapse.

Questions? We’re here to help!

Find out how Azure SQL Hyperscale can help optimize your environment. Contact Inviso’s experienced team of Azure experts.

Jeff Bishop

Author Jeff Bishop

More posts by Jeff Bishop