Engineered Systems – Bugra Parlayan | Oracle Database & Exadata Blog

Creating an ASM Disk on Exadata FlashDisk

Bugra Parlayan — Sat, 19 Jul 2025 19:06:23 +0000

Recently, we noticed performance drops at certain times on an Exadata server operating as a data warehouse (DWH). Upon analyzing AWR reports, the following wait event stood out:

direct path read/write temp waits

We knew this event was caused by temporary (temp) data being written to or read from disk during query execution. Although we had applied various SQL optimizations, they did not significantly reduce the wait times.

Root Cause: Heavy TEMP Tablespace Usage

Large SQL queries in data warehouses—especially those involving operations like hash joins, sorts, and group by—require extensive use of TEMP space. In our case, the TEMP tablespace was located on high-latency disk groups by default, which increased IO wait times.

Solution: Move TEMP Tablespace to FlashDisk on Exadata

To directly improve performance, we decided to move the TEMP tablespace to the FlashDisk layer on Exadata. This layer offers much higher IO throughput compared to traditional disks, making it particularly advantageous for handling temporary data operations.

P.S. Since we will drop and recreate the existing flash disks during this operation, there may be minor negative impacts during the process and until the flash cache warms up again. Therefore, this work should be performed during a planned and quiet maintenance window.

dcli –g cell_group –l root cellcli -e "alter flashcache all flush"
dcli -g cell_group -l root cellcli -e "LIST CELLDISK ATTRIBUTES name, flushstatus, flusherror" | grep FD
dcli -g cell_group -l root cellcli -e drop flashcache
dcli -g cell_group -l root cellcli -e create flashcache all size=20.2875976562500T;
dcli -g cell_group -l root cellcli -e CREATE GRIDDISK ALL FLASHDISK PREFIX='FLASHTMP';

sqlplus / as sysasm

alter system set asm_diskstring='o/*/DATA_*','o/*/RECO_*','o/*/FLASHTMP*';

CREATE diskgroup TEMPDG normal redundancy disk 'o/*/FLASHTMP*' attribute 'compatible.rdbms'='19.0.0.0.0', 'compatible.asm'='19.0.0.0.0', 'cell.smart_scan_capable'='TRUE', 'au_size'='4M';

sqlplus & as sysdba

CREATE TEMPORARY TABLESPACE TEMP_FLASH TEMPFILE '+ TEMPDG' SIZE 32G EXTENT MANAGEMENT LOCAL UNIFORM SIZE 1M;

ALTER DATABASE DEFAULT TEMPORARY TABLESPACE TEMP_FLASH;

DROP TABLESPACE TEMP INCLUDING CONTENTS AND DATAFILES;

The post Creating an ASM Disk on Exadata FlashDisk appeared first on Bugra Parlayan | Oracle Database & Exadata Blog.

Best Practices for Very Large Database (VLDB) Backup and Recovery:

Bugra Parlayan — Sat, 14 Jun 2025 17:44:00 +0000

1. Executive Summary

Backing up and recovering Very Large Databases (VLDBs) presents a critical yet increasingly complex challenge for organizations in today’s data-driven world. With data volumes growing exponentially, traditional backup methods often fall short in meeting performance targets, Recovery Time Objectives (RTOs), and Recovery Point Objectives (RPOs). This report examines best practices in VLDB backup and recovery, drawing on insights presented in Oracle MAA (Maximum Availability Architecture) blog posts, with a specific focus on Oracle’s Zero Data Loss Recovery Appliance (ZDLRA) solution.

The ZDLRA is a purpose-built engineered system designed to address these challenges. Its core strategies include “Incremental Forever” backups, which significantly reduce the load on production systems; real-time redo protection for near-zero data loss; and continuous recovery validation, enhancing the reliability of backups. These features are tailored to meet the unique demands of VLDBs, offering substantial improvements in achieving RTO and RPO targets. Oracle’s development and promotion of a specialized hardware/software appliance like ZDLRA suggest that traditional, software-only backup methods are increasingly inadequate for the scale and criticality of modern VLDBs. This implies the problem’s complexity has reached a point where integrated hardware and software solutions offer a more effective approach than generic software tools running on general-purpose hardware. This is a significant paradigm shift in high-end backup and recovery strategies. Consequently, organizations managing VLDBs must assess whether their current backup infrastructure can realistically meet future demands or if a specialized appliance approach is becoming a necessity.

2. Introduction to VLDB Backup and Recovery Challenges

Very Large Databases (VLDBs) typically contain terabytes to petabytes of data and are continuously growing. The sheer size and complexity of these databases introduce unique challenges in backup and recovery processes.

Defining VLDBs and Their Criticality: VLDBs are central repositories for businesses’ core operations, customer data, financial records, and other vital information. Therefore, any data loss or prolonged downtime in these systems can lead to severe financial losses, reputational damage, and legal repercussions. Business continuity and regulatory compliance are primary drivers for robust backup and recovery strategies for VLDBs.
Common Pain Points:
- Backup Windows: Completing full backups of VLDBs within limited timeframes without an acceptable impact on production performance is extremely difficult. As database size increases, full backup times lengthen, often encroaching on business hours and negatively affecting system performance.
- Recovery Time Objectives (RTO): Restoring and recovering massive databases quickly enough to meet business needs in the event of a disaster or failure is a major hurdle. Long recovery times lead to extended business disruptions and, consequently, increased costs.
- Recovery Point Objectives (RPO): There is always a risk of significant data loss due to the time gap between backups. Even hourly or more frequent backups can lead to unacceptable data loss in high-transaction-volume systems.
- Performance Impact: Backup operations generate significant I/O (Input/Output) and CPU (Central Processing Unit) load on production database servers. This load can degrade application performance, especially during peak hours.
- Storage Costs: Managing and storing large volumes of backup data incurs substantial storage costs. Long-term retention policies and multiple backup copies further escalate these costs.
- Complexity: Managing complex backup scripts, schedules, and recovery procedures creates a significant operational burden and increases the risk of human error.

These challenges are not just technical but also economic and operational. The “pain points” are interconnected; for example, trying to shrink backup windows with traditional methods can increase the performance impact, and aggressive RPO targets can lead to higher storage costs. Because VLDBs are large, backups are inherently time-consuming. Businesses demand short RTOs and near-zero RPOs. Attempting frequent full backups on VLDBs (for RPO) exacerbates backup window and performance impact issues. Using traditional incremental backups can lead to complex and lengthy recovery processes, jeopardizing RTO. This creates a cycle of trade-offs where optimizing one aspect negatively affects another. This highlights the need for a holistic solution that addresses these interconnected challenges simultaneously, rather than in isolation, which is the rationale for an integrated system like ZDLRA.

3. Oracle’s Zero Data Loss Recovery Appliance (ZDLRA): A Purpose-Built Solution for VLDBs (Based on Oracle MAA Blog Insights)

Oracle’s Zero Data Loss Recovery Appliance (ZDLRA) is a purpose-built engineered system developed to address the challenges encountered in backing up and recovering Very Large Databases (VLDBs). This section will examine the core features of ZDLRA and its significance in VLDB protection, based on points highlighted in Oracle MAA blogs.

3.1. Overview of the ZDLRA Approach

The ZDLRA is a purpose-built engineered system developed by Oracle to centralize and optimize database backup and recovery operations, focusing on protection, efficiency, and scalability for Oracle databases. It’s crucial to emphasize that ZDLRA is not merely a software solution but a comprehensive one where hardware and software are co-engineered for optimal performance and reliability in the demanding context of VLDB protection. As stated in the Oracle MAA blog post, “ZDLRA is a purpose-built engineered system designed to maximize hardware and software to provide a highly available, zero data loss environment. It notes that software alone cannot achieve this, implying that ZDLRA’s integrated hardware and software approach is critical for meeting stringent RTO and RPO requirements.” This positions ZDLRA not just as software, but as a comprehensive solution where hardware and software are co-designed for optimal performance and reliability under the demanding conditions of VLDB protection.

3.2. The “Incremental Forever” Strategy for VLDBs

One of ZDLRA’s most notable features is its “Incremental Forever” or “virtual full” backup strategy. This strategy fundamentally changes the backup process for VLDBs.

Mechanism: After an initial full (Level 0) backup, only changed data blocks are sent from the production database to the ZDLRA. The ZDLRA then synthesizes full backups (“virtual fulls”) from these incremental backups. This eliminates the overhead of taking full backups every day.
Benefits for VLDBs:
- Reduced Production Impact: “This strategy reduces the processing load on production systems by only transmitting changed data during daily incremental backups.” This minimizes I/O and CPU load on the source database, which is critical for performance-sensitive VLDBs. Traditional Level 0 + Level 1 backups are problematic for VLDBs: Level 0s are too large and impact performance; recovery from many Level 1s is slow. ZDLRA’s “Incremental Forever” sends only changed blocks after the initial full backup. This dramatically reduces the daily backup workload on the production database.
- Storage Efficiency: Efficient storage of incremental data on the ZDLRA and pointer-based virtual fulls “can lead to a 10X decrease in space consumption.” This offers a significant cost advantage, especially when dealing with large data volumes.
- Faster Backup Completion: Daily “backups” are essentially small incremental backups, significantly shortening the backup window.
- Efficient Recovery: It “allows for more efficient recovery compared to traditional RMAN incremental-based recovery.” Restoring a virtual full backup is similar to restoring a true full backup, without the need to sequentially apply numerous incrementals. ZDLRA takes on the task of creating “virtual full” backups from these incrementals. This means the appliance, not the production server, does the heavy lifting. For recovery, the database can be restored from a virtual full, which is much faster than applying a long chain of traditional incrementals. This directly improves RTO.
- Offloading of Backup Operations: “By offloading backup compression, deletion, validation, and maintenance operations to the appliance, production systems can focus on workloads.” This further frees up production server resources.

This approach fundamentally changes the backup paradigm for Oracle databases. By shifting intelligence and workload from the source database to a specialized appliance, it allows production systems to dedicate their resources to business operations. It also simplifies recovery processes, reducing the potential for human error.

3.3. Achieving Real-Time, Near-Zero Data Loss Protection (Near-Zero RPO)

ZDLRA offers an innovative approach to minimizing the Recovery Point Objective (RPO).

Mechanism: “The Recovery Appliance uses Oracle’s real-time redo transport to deliver continuous, real-time data protection. Transactional changes (redo) are transmitted directly to the appliance, where archived redo log backups are created and stored.” This mechanism is similar to Data Guard redo transport but specifically designed for backup and recovery assurance.
Benefits: “This provides immediate, zero data loss protection of all changes, and directly addresses the RPO objective of minimizing data loss.” This means recovery can be performed up to the last committed transaction received by the ZDLRA, achieving an RPO of seconds or sub-seconds rather than hours. Traditional RPO is often tied to the frequency of archived log backups or discrete incremental data backups. For VLDBs, there can still be gaps between these discrete operations (e.g., every 15 mins, 1 hour). Real-time redo transport sends redo data as it’s generated (or very close to it) to the ZDLRA. The ZDLRA then archives this redo. This means that even if a failure occurs between discrete incremental data backups, the redo logs up to (or very near) the point of failure are already secured on the ZDLRA. This dramatically improves RPO beyond what traditional scheduled backups can offer, allowing for recovery with minimal or no data loss.

This feature is a game-changer for businesses with extremely low tolerance for data loss. It elevates ZDLRA from merely a backup device to a key component of a high-availability and data protection strategy, approaching disaster recovery capabilities for recent transactions. It also implies a tighter integration with the database’s transaction processing cycle.

3.4. Ensuring Data Integrity with Continuous Recovery Validation

The reliability of backups is paramount for successful recovery. ZDLRA takes a proactive approach to this.

Process: “The appliance performs corruption detection throughout the backup cycle to validate data consistency and immediately alerts administrators if corruption is detected. It checks all incoming and replicated backups for block-level validity. Any corrupted data is detected, recorded, and alerted, allowing administrators to take action.”
Benefits: “This assurance of valid data is a key component for a successful recovery, directly impacting RTO by ensuring that restored data is usable and not corrupt.” This proactive validation prevents the discovery of corruption only at the critical moment of recovery, which could severely impact RTO and business operations. Data block corruption can occur on the primary database and, if undetected, propagate to backups. Traditional validation might happen during the backup process (e.g., RMAN VALIDATE DATABASE) or as a separate scheduled task, but ZDLRA makes it an intrinsic part of data ingestion. By checking blocks as they arrive and as virtual fulls are created/maintained, ZDLRA provides an early warning system. If corruption is detected in an incoming backup, administrators are alerted immediately. This allows them to address the issue on the primary database or ensure subsequent backups are clean, rather than discovering the problem months later during a critical restore. This ensures that backups stored on ZDLRA are known to be good, which is fundamental for a predictable and successful RTO.

This feature increases confidence in the backup repository. It means that when a recovery is initiated, there is a much higher certainty that the restored data will be valid and uncorrupted, reducing the risk of failed recoveries or recoveries that bring back corrupt data, which can be worse than no recovery at all. This also reduces the need for extensive manual validation efforts.

3.5. The Significance of an Engineered System Approach for VLDBs

The idea that ZDLRA is not just software but an integrated hardware and software solution is fundamental to its effectiveness in VLDB protection. “The article emphasizes that the ZDLRA is a purpose-built engineered system designed to maximize hardware and software… It states that software alone cannot achieve this.” This co-engineering allows for optimizations in I/O, network traffic, storage management, and processing that would be difficult to achieve with general-purpose components. Protecting VLDBs efficiently requires high throughput for backups, fast access for restores, and robust processing for tasks like validation and virtual full creation. General-purpose hardware and backup software might not be optimally configured to work together for these specific, demanding Oracle database workloads. An engineered system allows the vendor (Oracle) to control and optimize all layers: the database-side agents, the network protocols used, the internal processing within the appliance, and the storage layout. This tight integration can lead to performance, reliability, and manageability benefits that are hard to replicate with a piecemeal approach.

The “engineered system” argument positions ZDLRA as a premium, high-performance solution where the whole is greater than the sum of its parts. It implies that Oracle has fine-tuned every component of the stack, from database interaction to storage within the appliance, for the specific task of Oracle database protection. While potentially carrying a higher upfront cost, the engineered system approach aims to deliver a lower TCO (Total Cost of Ownership) through operational efficiencies, reduced risk, and superior performance. It also signifies a single-vendor commitment to supporting the entire solution stack, potentially simplifying troubleshooting and support. This is a strategic choice for organizations where VLDB protection is a top-tier priority.

Challenge Area	Traditional Approach Pain Points	ZDLRA Solution & Key Features Leveraged
Backup Window	Long full backups, performance impact	Incremental Forever, offloaded processing
RTO	Slow recovery from many incrementals, risk of corrupt backup	Virtual Full Backups, Continuous Recovery Validation
RPO	Data loss since last backup (hours)	Real-Time Redo Transport
Production Impact	High CPU/IO during backups	Incremental Forever (sends only changes), offloaded processing (compression, validation)
Storage Consumption	Multiple full backups, large incrementals	Incremental Forever (stores deltas efficiently), space-efficient virtual fulls
Backup Integrity	Corruption detected late (at restore or via periodic checks)	Continuous Recovery Validation (proactive, during backup cycle )
Management Complexity	Complex scripting, scheduling, manual validation	Centralized appliance management, automated validation and virtual full creation

This table visually reinforces how ZDLRA directly addresses specific, long-standing pain points in VLDB management, making it easier to quickly grasp the benefits that would justify evaluating such a system.

4. Key Considerations and Best Practices in ZDLRA-Centric VLDB Backup and Recovery Implementations

While ZDLRA offers powerful capabilities that significantly improve VLDB backup and recovery processes, fully leveraging these capabilities requires careful planning, configuration, and adherence to operational best practices. This section will translate ZDLRA’s features into actionable considerations and best practices. Although the provided Oracle blog post summaries indicate they do not offer additional general best practices beyond ZDLRA itself , this section will focus on how best to leverage ZDLRA’s capabilities and what to pay attention to within the ZDLRA context.

4.1. Optimizing Recovery Time Objective (RTO) with ZDLRA

Leverage ZDLRA’s virtual full backups for rapid restores. This significantly shortens recovery times.
Ensure ZDLRA sizing is adequate to meet restore performance demands. Insufficient resources can lead to missed RTO targets.
Regularly test recovery scenarios to validate RTOs. ZDLRA’s “Continuous Recovery Validation” ensures backups are valid, a prerequisite for meeting RTOs, but real-world tests confirm the entire process works as expected.

4.2. Minimizing Recovery Point Objective (RPO) with ZDLRA

Implement and monitor real-time redo transport diligently. This is the primary mechanism for achieving near-zero RPO according to.
Understand and meet network requirements to ensure real-time redo transport does not lag. Insufficient network bandwidth or high latency can compromise RPO targets.

4.3. Managing Production System Performance

While ZDLRA’s “Incremental Forever” strategy significantly reduces production impact, confirm this by monitoring baseline database performance metrics post-implementation.
Optimize network bandwidth between production databases and the ZDLRA. This is critical for the efficiency of both incremental backups and real-time redo transport.

4.4. Ensuring Backup Data Integrity and Reliability

Rely on ZDLRA’s “Continuous Recovery Validation” , but also understand its alerting mechanisms and integrate them into operational monitoring systems. Early warnings allow for proactive resolution of potential issues.
Consider ZDLRA replication to a secondary ZDLRA for disaster recovery of the backup data itself. This ensures backups are protected even if the primary ZDLRA fails.

4.5. Storage Management and Efficiency within ZDLRA

Understand ZDLRA’s internal storage management, space reclamation, and how the “up to 10X decrease in space consumption” is achieved and monitored.
Plan retention policies carefully to balance recovery needs with storage capacity. Overly long retention periods can lead to unnecessary costs, while too short retention can limit recovery capabilities.

4.6. Network Configuration and Sizing

Emphasize the importance of dedicated, high-bandwidth, low-latency network connectivity between production databases and the ZDLRA, especially for real-time redo transport and large data transfers. The network should not be a bottleneck for backup and recovery performance.

4.7. Regular Testing and Validation of Recovery Procedures

Even with ZDLRA’s automation and validation, conduct periodic, full recovery drills to test the end-to-end process, human procedures, and infrastructure. This validates the entire recovery plan, not just the technology.

Implementing ZDLRA is not a “set it and forget it” solution. While it automates and optimizes many aspects, careful planning, configuration, ongoing monitoring, and testing are still critical to realizing its full benefits. The “best practices” shift from managing the intricacies of RMAN scripts to managing the ZDLRA ecosystem. ZDLRA offers advanced features like “Incremental Forever,” “Real-Time Redo,” and “Continuous Validation.” These features have prerequisites and operational aspects (e.g., network for redo, monitoring alerts for validation, capacity planning for storage). Simply deploying the appliance does not guarantee optimal RTO/RPO or reliability. Administrators must understand how these features work, configure them correctly, monitor their performance, and integrate ZDLRA into broader DR and operational procedures. Regular testing is essential to confirm the entire system (database, network, ZDLRA, recovery procedures) performs as expected under pressure. The role of the Database Administrator (DBA) also evolves in this context. They may spend less time on low-level backup scripting and more on strategic data protection management for ZDLRA, capacity planning, and ensuring end-to-end recoverability of business services. Expertise specific to ZDLRA itself becomes important.

5. Conclusion and Recommendations

As presented in the Oracle MAA blog posts, the Zero Data Loss Recovery Appliance (ZDLRA) offers significant advantages in the realm of Very Large Database (VLDB) backup and recovery. These benefits include near-zero data loss through a vastly improved Recovery Point Objective (RPO), reliable Recovery Time Objective (RTO) via virtual full backups and continuous validation, reduced impact on production systems, and enhanced data integrity.

As an engineered system, ZDLRA represents a strategic approach to tackling the complexities of VLDB protection. The co-engineering of hardware and software allows for performance and reliability optimizations that are difficult to achieve with general-purpose solutions. This is a critical differentiator, especially in today’s environment where data volume and transaction rates challenge traditional methods.

However, it must be emphasized that while ZDLRA offers powerful capabilities, successful implementation requires careful planning, a full understanding of its features, and adherence to operational best practices, particularly concerning network configuration, monitoring, and regular recovery testing. Adopting ZDLRA is not merely a technical decision but signifies a commitment to a high level of data protection and availability, driven by the critical nature of the VLDBs it protects. This is an investment that should align with the value of the data and the cost of downtime/data loss.

It is important to note that this report focuses on ZDLRA-centric best practices highlighted in the provided Oracle blog post summaries. A comprehensive discussion of all VLDB backup and recovery techniques, including non-ZDLRA alternatives or complementary strategies like storage snapshots or Oracle Data Guard for DR beyond backup, would require additional resources beyond the scope of the provided material.

In conclusion, organizations should evaluate ZDLRA as part of their overall IT strategy, considering its integration with other systems, the skills required to manage it, and its alignment with long-term data growth and protection needs. When implemented and managed correctly, ZDLRA can provide unparalleled protection and recovery assurance for VLDBs, helping businesses secure one of their most valuable assets: their data.

Ref:

https://blogs.oracle.com/maa/post/very-large-database-backup-and-recovery-best-practices

https://blogs.oracle.com/maa/post/very-large-database-backup-and-recovery-best-practices-part-2

The post Best Practices for Very Large Database (VLDB) Backup and Recovery: appeared first on Bugra Parlayan | Oracle Database & Exadata Blog.

Analysis of Delta Push and Delta Store Mechanisms within ZDLRA

Bugra Parlayan — Wed, 04 Jun 2025 19:35:29 +0000

I. Introduction to Oracle Zero Data Loss Recovery Appliance (ZDLRA) Technology

The Oracle Zero Data Loss Recovery Appliance (ZDLRA or Recovery Appliance), an engineered system specifically designed for Oracle Databases, was developed to eliminate data loss and significantly reduce the data protection workload on production database servers. Its primary goal is to protect transactions in real-time, enabling databases to be recovered to within less than a second in the event of an outage or ransomware attack. This approach fundamentally differs from traditional backup solutions, which often lead to data loss measured in hours or even a day. ZDLRA works in tight integration with Oracle Database and Recovery Manager (RMAN) and offers capabilities not possible with general-purpose backup solutions. It is built upon Exadata hardware, from which it inherits performance and scalability features. Positioned for modern cybersecurity protection, ZDLRA offers features like backup immutability and continuous validation.

The data protection philosophy underlying this system represents a paradigm shift from reactive backup to proactive, continuous data protection. Traditional backup operations are performed periodically (e.g., nightly), which inherently carries the potential for data loss since the last backup. ZDLRA, on the other hand, captures changes as they occur through mechanisms like “real-time protection” and “real-time redo transport”. This continuous capture reduces the Recovery Point Objective (RPO) to sub-second levels , significantly mitigating the business risk associated with data loss and moving beyond mere backup to provide near-continuous data assurance.

The fact that ZDLRA is an “engineered system” (built on Exadata ) is critical to its performance and reliability. General-purpose backup solutions run on general-purpose hardware and software, which may not be optimized for the unique I/O patterns and metadata intensity of Oracle Database backups. As an engineered system like Exadata, ZDLRA’s hardware (storage, nodes, InfiniBand ) and software are co-engineered and pre-tuned for Oracle Database workloads. This co-engineering enables high throughput (up to 60 TB per hour per rack ), efficient handling of Oracle-specific block formats, and the scalability required for enterprise-wide database protection. Thus, ZDLRA’s effectiveness stems not just from its software features but from its holistic system design, purpose-built for Oracle databases.

A. “Incremental Forever” Backup Strategy: A Paradigm Shift

ZDLRA implements an “incremental-forever” backup architecture. After an initial one-time full (level 0) backup, only incremental (level 1) backups are sent from the protected databases to the Recovery Appliance. This strategy eliminates the need for recurring full backups, which are resource-intensive on production systems and can impact application performance. The “incremental forever” approach significantly reduces backup windows, database server load (CPU, memory, I/O), and network traffic.

The “incremental forever” strategy is more than just sending fewer full backups; it is enabled by ZDLRA’s sophisticated backend processing (Delta Store) that synthesizes these incremental backups into readily usable “virtual full backups.” Simply sending only incremental backups without a mechanism to consolidate them would make restores complex and slow, requiring the sequential application of many incrementals. ZDLRA overcomes this by using the Delta Store to process incoming incremental backups and create “virtual full backups”. These virtual full backups are representations of a full backup at a specific point in time, constructed from the initial level 0 and subsequent level 1 block changes. This means that for recovery, RMAN can restore a single virtual full backup without the burden of applying numerous incremental backups on the client side , making the “incremental forever” strategy practical and highly efficient for recovery.

The following table summarizes the key concepts of ZDLRA and the benefits it offers:

Table 1: Key ZDLRA Concepts and Benefits

Concept	Brief Definition in ZDLRA Context	Key Benefit to the Organization
Zero Data Loss	Goal of reducing data loss to virtually zero by protecting database transactions in real-time.	Minimizes critical data loss risk, enhances business continuity.
Incremental Forever	Only incremental backups are taken after the initial full backup, eliminating the need for periodic full backups.	Shortens backup windows, reduces load on production systems, saves storage.
Real-Time Redo Transport	Instantaneous transfer of database redo log changes to ZDLRA.	Provides sub-second RPO (Recovery Point Objective), minimizing data loss.
Virtual Full Backup	A logical backup synthesized on ZDLRA from incremental backups, behaving like a full backup.	Enables fast restores, uses storage space efficiently.
Sub-Second RPO	Reduction of data loss tolerance to below one second.	Minimizes data loss impact for business-critical applications.
Continuous Validation	Backup integrity and recoverability are continuously checked by ZDLRA.	Ensures reliable restores, reduces risk of corrupt backups.

II. Delta Push: The Continuous Data Ingestion Engine

A. Delta Push Concept and Objectives

“Delta Push” is a term Oracle uses to describe the process by which protected databases send only the minimum necessary data (i.e., the “delta difference” or changes) to ZDLRA for protection. This process encompasses both incremental backups of data blocks and real-time transport of redo log changes. The primary objective is to minimize the impact on production systems by transmitting only unique changed data, thereby reducing CPU, I/O, and network load. It is a source-side optimization enabled by RMAN block change tracking and tight integration with the Oracle database.

“Delta Push” is more than just an incremental backup; it is a holistic strategy to capture all relevant database changes (data blocks and redo) with minimal production impact, forming the ingestion mechanism for ZDLRA’s continuous data protection. Traditional incremental backups capture changed data blocks at set intervals. Real-time redo transport continuously captures transaction log changes, even between incremental backups. While explicitly states, “Changes in the database are sent to ZDLRA using the Delta Push process,” clarifies, “Oracle calls Virtual Backups + Real-Time Redo as Delta Push.” Therefore, Delta Push is not a single technology but a combination of RMAN incremental backups and real-time redo transport working in concert to ensure comprehensive and timely capture of all database modifications. This dual approach is key to achieving both efficient backups and near-zero RPO.

B. Operational Mechanisms

1. Leveraging RMAN Incremental Backups

ZDLRA utilizes the RMAN “incremental backup” API to capture changes from the source database. After the initial level 0 backup, all subsequent backups are level 1 incremental backups. These can be cumulative backups that use the latest virtual level 0 as their baseline. The ZDLRA Backup Module (an SBT library) facilitates the transfer of these incremental backups from the protected database to the Recovery Appliance. RMAN block change tracking on the source database efficiently identifies changed blocks, so only those blocks are read and sent.

ZDLRA fundamentally transforms the purpose and outcome of an RMAN incremental backup. While the RMAN command on the client-side might look similar, ZDLRA processes it not just as a standalone incremental backup, but as a component of a virtual full backup. Traditionally, an RMAN incremental backup is a set of blocks changed since the last backup of a certain level. However, states, “When a DELTA PUSH is executed, the results are automatically transformed into a VIRTUAL FULL backup in what is known as the Delta Store inside of ZDLRA.” explains that ZDLRA opens the RMAN block, reads the data file blocks within it, and creates a new virtual level 0 using the backup already existing on ZDLRA. This indicates that ZDLRA doesn’t just store the incremental backup set; it actively parses it and integrates its constituent changed blocks into its versioned block store (Delta Store) to synthesize a new point-in-time full representation. This is a critical difference from how incremental backups are handled by traditional backup software.

2. Real-Time Redo Transport: Achieving Sub-Second RPO

This feature is key to ZDLRA’s “zero data loss” claim, providing a zero to sub-second RPO. Redo data (records of all database changes) is streamed from the protected database’s memory buffers (LGWR) directly to the Recovery Appliance, typically asynchronously to minimize performance impact. This is similar to Data Guard redo transport. ZDLRA validates the redo and writes it to a staging area. Upon a log switch on the protected database, ZDLRA converts these redo changes into compressed archived redo log file backups and tracks them in its catalog. If the redo stream terminates unexpectedly (e.g., database crash), ZDLRA can create a partial archived redo log, protecting transactions up to the last change received. With real-time redo transport enabled, this obviates the need for separate archived log backups from the database host to ZDLRA.

Real-time redo transport effectively decouples redo protection from the source database’s archiving process, enabling more resilient and immediate capture of transactions. Traditional redo protection often relies on the database successfully writing to its online redo logs and then archiving them. ZDLRA’s real-time redo transport taps into the redo stream before or concurrently with local archiving, sending it directly from memory. Even if the primary database crashes before successfully archiving a log, ZDLRA can construct a partial archive log from the redo it has already received. This means ZDLRA acts as an independent, highly available redo log destination, guaranteeing transaction capture even if the source database’s own archiving mechanism is disrupted, which is critical for sub-second RPO.

C. Architectural Integration: Data Flow from Protected Database to ZDLRA

Protected databases use the RMAN client and the Recovery Appliance Backup Module (SBT library) to communicate with ZDLRA. For incremental backups, RMAN identifies changed blocks. These blocks are packaged and sent via the backup module over the network to an HTTP Server Application (Servlet) on ZDLRA. Real-time redo is transported similarly to Data Guard (typically via Oracle Net) to a Remote File Server (RFS) process on ZDLRA. ZDLRA then validates, processes (compression, indexing), and stores the incoming data/redo blocks in the Delta Store. Metadata is updated in the Recovery Appliance Catalog.

The data flow architecture for Delta Push is bifurcated (separate paths for incremental block backups and real-time redo) but converges within ZDLRA to provide a unified data protection state. Incremental data block backups are inherently batch-oriented, typically scheduled RMAN operations, even if frequent. They are processed via the SBT interface. Real-time redo transport is a continuous, stream-based process, capturing transactional changes as they occur using Data Guard-like mechanisms. Both data streams—changed blocks and redo records—arrive at ZDLRA and are processed into and cataloged by the Delta Store. This dual-path ingestion allows ZDLRA to capture both the state of data blocks at specific points in time (via incrementals) and the continuous flow of transactions (via redo), combining the strengths of snapshot-like backups and continuous data replication to enable recovery to almost any point in time.

D. Formation of Virtual Full Backups via Delta Push

While Delta Push is the mechanism for sending changes, its direct result is to enable ZDLRA’s Delta Store to create and maintain “Virtual Full Backups”. Each Delta Push (incremental backup) operation results in a new Virtual Full Backup becoming available in the ZDLRA catalog. This means changes are tracked not just from the last physical full backup, but from the previous Virtual Full backup.

Delta Push acts as the continuous feed that allows the Delta Store to maintain a constantly up-to-date, yet historically deep, set of recovery points represented as virtual full backups. Delta Push transmits the “deltas” – the changed blocks. The Delta Store receives these deltas and intelligently integrates them with previously stored block versions. This integration allows ZDLRA to construct a logically complete backup image for a specific point-in-time corresponding to an ingested incremental backup. Therefore, Delta Push is not just about efficient data transfer; it is the critical data pipeline that fuels the Delta Store’s ability to offer fast, point-in-time recovery through virtual full backups, effectively creating a “time machine” for the database.

III. Delta Store: The Intelligent Repository for Protected Data

A. Delta Store Architectural Overview

The Delta Store is “the totality of all protected database backup data in the Recovery Appliance storage location”. It resides on a dedicated ASM disk group (typically named DELTA) on ZDLRA. It is described as the “brains” of the Recovery Appliance, responsible for validating, compressing, indexing, and storing incoming backup data. It is not merely a passive storage area; it actively manages backup data to enable efficient virtual full backups and space optimization.

The Delta Store is an application-aware storage layer, deeply integrated with Oracle Database block structures and RMAN metadata, which distinguishes it from general-purpose deduplication appliances. General-purpose deduplication appliances typically operate at a generic block level without understanding the internal structure of database files. According to , ZDLRA’s Delta Store captures copies of each Oracle block and organizes them hierarchically. highlights its “Oracle context sensitivity,” where it opens RMAN blocks to inspect their contents and index the data blocks for each data file. This database awareness allows for more intelligent deduplication (block versioning rather than just hash-based deduplication of RMAN backup pieces), validation , and the creation of consistent virtual full backups. This intelligence is what enables ZDLRA to perform block-correctness and RMAN recoverability validation directly on the appliance, offloading the production server.

B. Internal Structure and Data Organization

1. Delta Pools: Granular Management of Data File Backups

The Delta Store contains “delta pools” for all data files across all protected databases. A delta pool is the set of data file blocks from which the Recovery Appliance constructs virtual full backups. Each distinct data file whose backups are sent to ZDLRA has its own dedicated delta pool. For example, datafile 10 from database prod1 has its own delta pool.

The concept of a delta pool signifies a highly granular and organized approach to managing backup data, enabling efficient block versioning and retrieval at the individual data file level. Databases consist of multiple data files, each with its own lifecycle of changes. By maintaining a separate delta pool per data file , ZDLRA can track and version blocks specifically for that file. When an incremental backup for a data file arrives, ZDLRA updates the relevant delta pool with the new block versions. This level of granularity is essential for constructing a virtual full backup, as ZDLRA can quickly locate the correct versions of all blocks for each data file belonging to a specific point-in-time backup by querying these distinct pools. It also likely aids in space management and reclamation, as old blocks can be managed within the context of their specific data file pool.

2. Block Versioning and Indexing Mechanisms

The Delta Store is effectively a database of block versions. As incremental backups (Delta Pushes) arrive, the changed blocks are indexed into the Delta Store. The Recovery Appliance receives an incremental backup, validates it, compresses it, and writes it to a delta store. It indexes the backup so that corresponding virtual full backups become available. The ZDLRA metadata database, which includes the RMAN recovery catalog, manages the metadata about these blocks and their versions.

The indexing of individual database blocks within the Delta Store, not just backup pieces, is the core enabler of “virtual full backups” and efficient space utilization. Traditional backups store entire backup pieces (full or incremental). Restoration requires locating and processing these pieces. ZDLRA, in contrast, extracts individual data blocks from incoming incremental backups and indexes these blocks. The Delta Store maintains various versions of these blocks. A “virtual full backup” is essentially a metadata construct – a list of pointers to the correct versions of all blocks (from various delta pools) that constitute the database at a specific point in time. This block-level versioning and indexing mean that unchanged blocks are stored only once, and new “full” backups are created logically by updating pointers, rather than physically re-copying all data. This is the essence of space efficiency and rapid virtual full creation.

C. Creation and Management of Virtual Full Backups within Delta Store

The Delta Store uses the ingested incremental backups (via Delta Push) to create virtual full backups. A virtual full backup is a pointer-based representation of a physical full backup at the time of the incremental backup. It appears as a standard level 0 backup in the RMAN catalog. To create a virtual full backup, ZDLRA converts an incoming incremental level 1 backup into a virtual representation of an incremental level 0 backup. It combines the new changed blocks from the incremental backup with the previous unchanged blocks already present in the Delta Store. These virtual full backups are typically 10 times more space-efficient than physical full backups.

The creation of virtual full backups is an ongoing, dynamic process within the Delta Store, triggered by each successful Delta Push, ensuring that the latest recovery points are always full representations. states, “Each Delta Push sends the latest version of each changed block. Those changed blocks are indexed into the Delta Store and combined with previous un-changed blocks to form a Virtual Full Backup.” notes, “After the process [backup], the catalog reflects all the new virtual full backups that are available.” This implies a continuous synthesis. As new incremental data arrives, the Delta Store doesn’t just store the incremental; it actively processes it to update its pointers and metadata, making a new, comprehensive virtual full backup immediately available. This proactive synthesis is why ZDLRA can offer fast restores to recent points in time without the delay of manually applying many incrementals during the restore operation itself. The “merge” or “synthesis” happens upfront on ZDLRA.

D. Storage Optimization and Efficiency

1. Advanced Compression Techniques (including `RA_FORMAT`)

ZDLRA employs specialized block-level compression algorithms. A newer client-side library feature, RA_FORMAT=TRUE (introduced around ZDLRA 23.1), allows for compression of the data within blocks before sending to ZDLRA. This is compatible with ZDLRA’s ability to create virtual full backups and validate stored backup sets. This client-side compression can compress the contents of TDE encrypted blocks as well as non-TDE blocks. If RMAN encryption is also on, non-TDE blocks are compressed then encrypted. This compression reduces network bandwidth for backups and replication, and storage space on ZDLRA. Archive log compression (BASIC, LOW, MEDIUM, HIGH) can also be configured; LOW, MEDIUM, and HIGH do not require ACO on the protected database when ZDLRA is used.

The RA_FORMAT feature represents a significant evolution in ZDLRA’s compression strategy, moving some intelligence to the client to optimize data before transmission and storage, and enabling effective compression even for TDE-encrypted data. Previously, RMAN compression would compress the entire backup set. If the data was TDE encrypted, this compressed backup set was unreadable by ZDLRA for its block-level operations. RA_FORMAT=TRUE compresses the contents of each block, leaving the block headers intact for ZDLRA to read. This allows ZDLRA to perform its virtual full backup creation and validation even on backups originating from TDE tablespaces, because the data within the blocks is compressed, but the block structure ZDLRA needs is preserved. This overcomes a major challenge in backup efficiency for encrypted databases, offering both security (TDE) and storage/network efficiency (compression), which were often mutually exclusive or suboptimal with older methods.

2. Automated Space Management and Delta Pool Optimization

The Recovery Appliance performs automated delta pool space management. This includes deleting old or expired backups (on disk and on tape/cloud) based on recovery window goals and retention policies. ZDLRA periodically reorganizes delta pools to improve restore performance by maintaining contiguity of blocks (delta pool optimization) as old blocks are deleted and new ones arrive.

Automated space management and delta pool optimization are critical for sustaining the long-term performance and efficiency of the “incremental forever” strategy. An unmanaged “incremental forever” system could lead to highly fragmented storage over time as myriad small changes accumulate and old data becomes obsolete. The deletion of old blocks reclaims space, vital for cost-effectiveness. The reorganization of delta pools addresses the potential performance degradation from fragmentation that could arise from frequent updates and deletions, ensuring restore operations remain fast by optimizing read access. These automated background tasks are therefore essential for the sustainability of the ZDLRA model, preventing it from becoming unwieldy or slow over long periods of operation.

The following table summarizes the internal components of the Delta Store and their roles:

Table 2: Delta Store Internal Components and Roles

Component	Description/Structure	Primary Function within Delta Store	Contribution to ZDLRA Efficiency/Recovery
Delta Pool	A logical unit for each data file, containing all backed-up block versions for that specific data file.	Organizing and managing blocks belonging to a specific data file.	Granular block management, efficient versioning, and rapid construction of virtual full backups.
Block Version	A copy of a data block at a specific point in time.	Tracking data changes over time.	Space efficiency (only changed blocks stored), ability to restore to any point in time.
Index	Metadata structure tracking the locations and versions of blocks within the Delta Store.	Enabling rapid location of correct block versions when constructing virtual full backups.	Fast virtual full backup creation, efficient restore operations.
Virtual Full Backup Metadata	Set of pointers to the block versions that constitute a full backup at a specific point in time.	Providing a logical representation of a physical full backup.	Storage efficiency (pointers instead of physical full backups), appears as a standard level 0 backup to RMAN, fast recovery.

IV. Synergistic Architecture: Operation of Delta Push and Delta Store within ZDLRA

A. End-to-End Data Protection Workflow: From Transaction to Recoverable Backup

The data protection process begins when a transaction occurs on the protected database. These changes are transmitted to ZDLRA almost instantaneously as part of the Delta Push mechanism.

Transaction Occurs: Changes are made in the protected database.
Real-Time Redo Push: LGWR (or an asynchronous process) sends redo data from memory buffers to ZDLRA. ZDLRA stages and validates this redo.
Incremental Backup (Delta Push): Periodically , RMAN performs an incremental level 1 backup. Changed blocks are identified via block change tracking.
Data Transfer: The Recovery Appliance Backup Module sends these changed blocks to ZDLRA.
ZDLRA Ingestion and Processing (Delta Store):
- Incoming incremental blocks are validated, compressed (if RA_FORMAT=TRUE or with ZDLRA-side compression), and indexed into their respective delta pools within the Delta Store.
- The Delta Store synthesizes a new virtual full backup using these new blocks and existing unchanged blocks.
- Redo logs are converted by ZDLRA into archived log backups upon log switch.
Catalog Update: ZDLRA’s internal RMAN catalog is updated to reflect the new virtual full backup and archived redo logs, making them available for recovery.
Continuous Validation: ZDLRA continuously validates backups for recoverability.
Lifecycle Management: Policies for retention, replication to another ZDLRA, or archival to tape/cloud are applied.

The synergy between Delta Push (ingestion) and Delta Store (processing and storage engine) creates a closed-loop system for continuous data protection and recovery readiness. Delta Push continuously feeds changed data (blocks and redo) to ZDLRA. The Delta Store immediately processes this data, integrates it into its versioned block repository, and creates virtual full backups. The updated catalog then makes these new recovery points instantly available. This tight, automated loop ensures ZDLRA is always as up-to-date as possible with the state of protected databases, minimizing data loss risk and guaranteeing that recovery assets are constantly refreshed and validated.

B. Role of Key ZDLRA Components

1. Recovery Appliance Backup Module (libra.so / SBT Library)

This Oracle-supplied SBT library is installed on protected database hosts and is used by RMAN to transfer backup data to ZDLRA. It manages communication for backup and restore operations between RMAN and ZDLRA. With newer versions (e.g., ZDLRA 23.1), this library can perform client-side compression and formatting (RA_FORMAT=TRUE).

The Recovery Appliance Backup Module is more than a simple data pipe; it’s an intelligent client-side agent that actively participates in optimizing the backup stream. Traditionally, SBT libraries are primarily interfaces for RMAN to write to third-party media managers. The ZDLRA backup module, especially with features like RA_FORMAT, performs pre-processing (compression, ZDLRA-specific formatting ) on the client-side. This client-side intelligence reduces load on ZDLRA for certain tasks, optimizes network traffic, and enables advanced features like effectively compressing TDE data before it even reaches the appliance. It acts as an essential, integrated part of the ZDLRA solution, not just a generic connector.

2. Recovery Appliance Metadata Database and Catalog

Residing on each Recovery Appliance, it manages metadata for all backups and contains the RMAN recovery catalog for all protected databases. This catalog is mandatory and is automatically updated by ZDLRA as backups are processed. It stores information about backup pieces, archived logs, virtual full backups, delta pools, and block versions, which is essential for orchestrating restores and managing space. ZDLRA uses two main disk groups: DELTA for backups and CATALOG for RMAN catalog tables.

ZDLRA’s centralized and self-managing RMAN catalog serves as the “single source of truth” for all protected databases, enabling simplified management and consistent recovery across the enterprise. In traditional environments, RMAN catalogs might be separate or controlfile-based, leading to management complexity for many databases. ZDLRA mandates and manages a central catalog within its own embedded RAC database. This catalog automatically reflects all virtual full backups and other recovery assets created by ZDLRA. Database administrators (DBAs) interact with this catalog via standard RMAN commands for restores, without needing to know the internal complexities of virtual backups or delta pools. This centralization and automation significantly simplify backup administration, especially in large environments.

C. Control Flow and Policy Enforcement

Protection policies are defined on ZDLRA to manage recovery window goals, data retention periods on disk and tape/cloud, replication, and other backup lifecycle aspects. These policies are applied to protected databases. ZDLRA’s automated space management tasks (deletion of old backups, delta pool optimization) are driven by these policies. Enterprise Manager Cloud Control is typically used to manage and monitor ZDLRA and its policies.

ZDLRA’s policy-based management automates much of the backup lifecycle, abstracting complexity and ensuring adherence to defined service levels. Manual management of backup retention, replication, and tiering for hundreds of databases is error-prone and labor-intensive. ZDLRA allows administrators to define high-level protection policies (e.g., Gold, Silver, Bronze with different RPOs/retention ). The appliance then automatically enforces these policies, managing space, creating virtual full backups, replicating data, and archiving to secondary storage. This automation ensures consistency, reduces administrative burden, and helps organizations meet their data protection SLAs reliably.

The following table illustrates the interactive workflow between Delta Push and Delta Store step-by-step:

Table 3: Delta Push and Delta Store Interaction Workflow

Step No.	Action/Process	Responsible Component(s)	Key Outcome of the Step
1	Database Change	Protected Database	Data is modified.
2	Redo Sent	Protected Database (LGWR), ZDLRA (Delta Push Receiver)	Real-time redo data is transferred to and staged on ZDLRA.
3	Incremental Backup Initiated	Protected Database (RMAN)	Periodic incremental backup process is triggered.
4	Blocks Sent to ZDLRA	RA Backup Module, ZDLRA (Delta Push Receiver)	Changed data blocks are transmitted to ZDLRA.
5	ZDLRA Validates and Compresses	ZDLRA (Delta Store)	Incoming blocks are validated and compressed.
6	Blocks Indexed in Delta Pool	ZDLRA (Delta Store)	Changed blocks are added to the relevant delta pool and indexed.
7	Virtual Full Backup Created	ZDLRA (Delta Store)	A new virtual full backup is synthesized using new and existing blocks.
8	Catalog Updated	ZDLRA (Catalog)	New virtual full backup and archived logs become available in the RMAN catalog for recovery.

V. Conclusion: Technical Significance of Delta Push and Delta Store in ZDLRA

A. Summary of Core Architectural and Operational Principles

Delta Push is an efficient, dual-pronged mechanism (RMAN incrementals + real-time redo) for transferring only necessary changes from protected Oracle databases to ZDLRA. Delta Store is the intelligent, Oracle-aware repository that ingests these changes, versions data blocks at a granular level within delta pools, and synthesizes space-efficient virtual full backups. This “incremental forever” approach, powered by Delta Push and Delta Store, minimizes production impact, dramatically reduces RPO, and simplifies recovery.

B. Impact on Data Protection, Recovery Speed, and Efficiency

The combined effect of Delta Push and Delta Store fundamentally redefines Oracle database backup and recovery from a periodic, resource-intensive chore to a continuous, low-impact, and highly reliable data assurance service.

Data Protection: Near-zero data loss (sub-second RPO) is achieved thanks to Delta Push’s real-time redo transport component. Enhanced resilience against ransomware is offered through immutable backups and rapid recovery capabilities. Continuous validation guarantees backup integrity.
Recovery Speed: Fast restores from virtual full backups are possible without the need to apply numerous incrementals on the production server. The “time machine” feature enables rapid rollback. The Dialog Semiconductor case study showed approximately 4x faster restores.
Efficiency: Significant reductions in backup windows, production server load (CPU, I/O), network traffic, and storage consumption are achieved through the incremental forever strategy, Delta Push, virtual full backups, and advanced compression. Backup operations are offloaded from database servers.

Traditional backups are disruptive events. Delta Push makes data ingestion minimally impactful. Delta Store optimizes backup data for space and makes it immediately ready in a “full” format for rapid recovery. Automation and continuous validation add layers of reliability. This transforms the entire data protection posture from a necessary evil to an integrated, efficient, and highly effective component of Oracle database operations, as evidenced by benefits like those at Dialog Semiconductor and reported overall TCO reductions.

Furthermore, the Delta Push and Delta Store architecture offers a robust defense mechanism against modern cyber threats like ransomware, not just by enabling fast recovery, but by ensuring the integrity and availability of recovery points up to the last moments before an attack. Ransomware attacks aim to encrypt data and backups, making recovery difficult or impossible. ZDLRA’s real-time redo capture via Delta Push allows recovery to within seconds of an attack. Delta Store’s continuous validation helps detect corruption early. Features like backup immutability protect the backups themselves. The ability to rapidly restore a clean, virtual full backup to a secure location means organizations can avoid paying ransoms. Thus, the technical design directly translates into enhanced cyber resilience, a critical requirement in today’s threat landscape

The post Analysis of Delta Push and Delta Store Mechanisms within ZDLRA appeared first on Bugra Parlayan | Oracle Database & Exadata Blog.

Exadata Update Utilities: patchmgr and dbnodeupdate.sh

Bugra Parlayan — Sat, 26 Apr 2025 14:56:09 +0000

1. Introduction

Oracle Exadata Database Machine is a high-performance, optimized platform for Oracle Database workloads. Regularly updating the software components of this platform—including the operating system, Exadata system software, device drivers, and firmware—is crucial for addressing security vulnerabilities, fixing bugs, and leveraging new features. Oracle provides specialized utilities to manage these update processes. Two of the most commonly used tools are patchmgr and dbnodeupdate.sh. This document aims to provide a detailed technical comparison of these two utilities, explaining their functions, key differences, use cases, and parameters for effective Exadata patching.

2. The `patchmgr` Utility

2.1. Definition and Purpose

patchmgr is a centralized utility designed to orchestrate and simplify software updates for Oracle Exadata infrastructure components. It allows administrators to update multiple components—database servers, storage servers, and network switches—using a single command structure, streamlining the Exadata update process.

2.2. Scope and Capabilities

Broad Component Coverage:patchmgr can update various Exadata components :
- Oracle Exadata Storage Servers (Cells)
- Oracle Exadata Database Servers (Compute Nodes)
- RDMA Network Fabric Switches (RoCE Switches)
- InfiniBand Network Fabric Switches
- Management Network Switch (specific models)
Orchestration: It manages the update sequence across multiple targets, supporting both rolling and non-rolling updates.
- Rolling Update: Updates components sequentially (one by one) to maintain overall system availability, ideal for RAC clusters or storage server grids.
- Non-Rolling Update: Updates all specified components concurrently, which is faster but requires a complete system outage.
Automation: For database server updates, patchmgr automates numerous steps, including stopping/starting databases and Grid Infrastructure, managing VMs, handling Oracle Enterprise Manager agents, taking OS backups, relinking Oracle Homes, and applying best practice configurations.
Centralized Execution: patchmgr can be executed from a database server within the Exadata system being patched or from a separate, central server (the “driving system”) running Oracle Linux or Oracle Solaris. This facilitates managing multiple Exadata systems from one location.
User and Concurrency: Can be run by root or a non-root user (requires -log_dir). Multiple patchmgr instances can run concurrently from the same software directory (using distinct -log_dir values) to patch different systems simultaneously.

2.3. Platform Support

Target Systems: The Exadata components updated by patchmgr (database servers, storage servers) typically run Oracle Linux.
Driving System: The patchmgr utility itself can be executed from a server running either Oracle Linux or Oracle Solaris. This means you can initiate and manage the patching of Linux-based Exadata components from a Solaris management server.

2.4. Key Parameters and Usage

The general syntax for patchmgr is: ./patchmgr - - [optional_arguments]

-: Specifies the component type:
- -cells: For Storage Servers.
- -dbnodes: For Database Servers.
- -ibswitches: For InfiniBand Switches.
- -roceswitches: For RoCE Switches.
: A text file listing the hostnames of the components to be updated.
-: Specifies the operation:
- -upgrade: Performs a software upgrade to a specified version.
- -rollback: Rolls back to the previous software version.
- -precheck: Runs prerequisite checks before an upgrade or rollback.
- -backup: Performs a backup (typically for DB nodes).
[optional_arguments]: Modifies the action or behavior:
- --rolling / -rolling: Perform the action in a rolling fashion.
- --iso_repo / -iso_repo : Specifies the path to the patch ISO file.
- --target_version / -target_version : Specifies the target software version.
- --modify_at_prereq: Allows removal of conflicting RPMs during precheck to resolve dependencies (for DB nodes).
- --force_remove_custom_rpms: Forces removal of custom (non-Exadata) RPMs during an OS upgrade (for DB nodes).
- --log_dir : Specifies a directory for log files or uses automatic naming. Required for concurrent execution or non-root users.
- --allow_active_network_mounts: Allows patching to proceed even if active network mounts (like NFS) are detected.
- --dbnode_patch_base : Specifies the directory on target DB nodes where patch files will be extracted.
- --ignore_alerts: Proceeds with patching despite active hardware alerts.
- --sequential_backup: Backs up each node immediately before updating it during a rolling update (default is to back up all nodes first).
- --update_type : Selects security-only (allcvss) or full (full) update.
- --live-update-target: Utilizes Exadata Live Update (on supported versions).

3. The `dbnodeupdate.sh` Utility

3.1. Definition and Purpose

dbnodeupdate.sh is a shell script specifically used to update, roll back, or back up the software on a single Oracle Exadata database server (compute node). Before patchmgr provided orchestration for database servers, updates often involved manually running dbnodeupdate.sh on each node sequentially.

3.2. Scope and Capabilities

Single Database Server Focus: dbnodeupdate.sh operates only on the database server where it is executed.
Core Update Engine: When patchmgr updates database servers, it essentially invokes dbnodeupdate.sh on each target node to perform the actual update, rollback, or backup tasks.
Operating System Updates: It handles updates for the Oracle Linux OS, device drivers, and firmware included in the Exadata system software patch. It supports major OS upgrades (e.g., OL6 to OL7, OL7 to OL8), although older dbnodeupdate.sh versions might be needed for older OS transitions.
Backup and Rollback: Automatically creates a backup of the root filesystem before starting an update. This backup enables rollback to the previous state using the -r option if the update fails or needs to be reverted.
Dependency Management: Includes the -M option to allow the removal of conflicting RPMs during the prerequisite check phase to resolve dependency issues.

3.3. Platform Support

dbnodeupdate.sh runs on the Exadata database server being updated. Modern Exadata database servers exclusively use Oracle Linux. While Solaris was supported on older Exadata compute nodes , dbnodeupdate.sh in current contexts targets Linux.

3.4. Key Parameters and Usage

The basic usage is: ./dbnodeupdate.sh

-u: Initiates the update process.
-r: Initiates a rollback from the pre-update backup.
-b: Performs only the backup step.
-c: Runs the post-reboot completion steps after an update or rollback.
-l : Specifies the path to the update ISO file or the YUM repository URL.
-s: Stops Cluster Ready Services (CRS) before the update/rollback and restarts it afterward.
-p: Runs the bootstrap phase (typically for upgrades from older versions).
-x : Specifies the directory containing helper scripts (often used with -p).
-M: Allows removal of conflicting RPMs during prerequisite checks.
-v: Performs only the prerequisite check.

4. `patchmgr` vs. `dbnodeupdate.sh`: Key Differences

Feature	`patchmgr`	`dbnodeupdate.sh`
Primary Purpose	Orchestration for Exadata infrastructure components	Single Exadata DB server update/rollback/backup
Scope	DB Servers, Storage Servers, Network Switches	DB Servers Only
Execution Mode	Manages multiple targets; invokes `dbnodeupdate.sh` for DB nodes	Runs on a single target node
Update Mode	Rolling or Non-Rolling	Affects only the single node it runs on
Run Location	DB Node or separate Linux/Solaris server	Only on the target DB Node being updated
Typical Use Case	Standard multi-component/multi-node patching	Single node update/rollback; invoked by `patchmgr`; patching last node in rolling update; recovery scenarios
Driving Platform	Linux, Solaris	Linux (on the target DB Node)

5. Which Tool Should Be Used When?

Use patchmgr when:
- Updating multiple storage servers, database servers, or network switches (standard and preferred method).
- Choosing between rolling or non-rolling update strategies.
- Managing the update process from a central server.
- Leveraging automated orchestration steps (service stop/start, backup, relink).
Use dbnodeupdate.sh when:
- Updating or rolling back only one specific database server (e.g., testing on a single node).
- Performing a rolling update with patchmgr and needing to patch the initial driving node last (by running dbnodeupdate.sh locally on it after other nodes are done).
- Recovering a system after a failed update/rollback by booting from a diagnostic ISO.
- As a manual alternative if patchmgr itself encounters issues or is unavailable.

As a general rule, patchmgr is the standard utility for routine Exadata infrastructure patching. dbnodeupdate.sh should be considered an underlying component used by patchmgr for database nodes or a tool for specific single-node scenarios.

6. Conclusion

patchmgr and dbnodeupdate.sh are essential tools for maintaining the currency and security of Oracle Exadata platforms. patchmgr serves as the primary orchestration utility, simplifying the update process across multiple Exadata components (database servers, storage servers, switches) and supporting both rolling and non-rolling strategies. It can be driven from Linux or Solaris systems. dbnodeupdate.sh is the core script that performs the actual update, rollback, or backup on individual Linux-based Exadata database servers, often invoked by patchmgr but also usable standalone for specific single-node tasks or recovery situations. Understanding the distinct roles and capabilities of each tool allows administrators to choose the appropriate method for their specific Exadata maintenance requirements, with patchmgr being the standard choice for most patching operations.

The post Exadata Update Utilities: patchmgr and dbnodeupdate.sh appeared first on Bugra Parlayan | Oracle Database & Exadata Blog.

Comprehensive Guide to Oracle Exadata Automatic Hard Disk Scrubbing

Bugra Parlayan — Fri, 25 Apr 2025 16:48:27 +0000

I. Introduction: Overview of the Exadata Hard Disk Scrubbing Process

Data integrity is a cornerstone of modern computing systems. Errors that may occur during the storage, reading, transmission, and processing of data can have devastating effects on business processes. Various error detection and correction mechanisms have been developed to mitigate these risks. One such mechanism is the “data scrubbing” process.

A. Data Scrubbing: General Concept

Data scrubbing is an error correction technique that periodically inspects storage devices or main memory for errors and corrects detected errors using redundant data, such as checksums or backup copies of the data. Its primary purpose is to reduce the likelihood that single, correctable errors will accumulate over time and lead to uncorrectable errors. This ensures data integrity and minimizes the risk of data loss.

This technique is a widely used error detection and correction mechanism in memory modules (with ECC memory), RAID arrays, modern file systems like ZFS and Btrfs, and FPGAs. For example, a RAID controller can periodically read all hard disks in a RAID array to detect and repair bad blocks before applications access them, thereby reducing the probability of silent data corruption caused by bit-level errors.

B. Exadata Automatic Hard Disk Scrubbing: Definition and Scope

The Oracle Exadata platform employs a multi-layered approach to ensure data integrity. One of these layers is the Exadata Automatic Hard Disk Scrub and Repair feature. As part of the Exadata System Software (Cell Software), this feature automatically and periodically inspects the hard disk drives (HDDs) within the Storage Servers (Cells) when the disks are idle.

The primary goal of this process is to proactively detect and facilitate the repair of bad sectors or other physical/logical defects on the disks before applications attempt to access the affected data. This prevents “latent” or silent data corruption.

The scope of Exadata scrubbing is important. This feature primarily targets physical bad sectors on hard disks. It focuses on detecting physical media errors that might be missed by standard drive Error Correcting Code (ECC) mechanisms or operating system checks. This complements, but does not replace, higher-level logical consistency checks performed by the database (e.g., via the DB_BLOCK_CHECKING parameter ) or the manually executable ASM disk scrubbing process. Furthermore, this automatic scrubbing process does not apply to Flash drives in Exadata; these drives are protected by different mechanisms.

A distinctive aspect of Exadata scrubbing is its proactive nature. While database block checks typically occur during I/O operations , Exadata scrubbing specifically targets data that has not been accessed for a long time, especially when disks are idle. This approach ensures that corruption in rarely used data is detected and repaired long before it can cause an access error at a critical moment.

C. Differences Between Exadata Hard Disk Scrubbing and ASM Disk Scrubbing

The term “scrubbing” can be used in different contexts within the Oracle ecosystem, so it’s crucial to distinguish Exadata’s automatic hard disk scrubbing from the disk scrubbing feature offered by Oracle Automatic Storage Management (ASM).

Exadata Automatic Hard Disk Scrubbing:
- Scope: Operates at the Exadata Storage Server (Cell) level, managed by the Cell Software.
- Focus: Checks the integrity of physical sectors on hard disks.
- Operation: Runs automatically based on a schedule configured in CellCLI.
- Resource Usage: The checking process is local to the storage cell, consuming no CPU on database servers and generating no unnecessary network traffic during the check.
- Monitoring: Monitored via CellCLI metrics and Cell alert logs.
ASM Disk Scrubbing:
- Scope: Operates at the ASM disk group or file level, managed by ASM.
- Focus: Searches for logical corruptions within ASM blocks/extents.
- Operation: Typically triggered manually (via SQL*Plus or asmcmd) or through a script (e.g., a cron job ).
- Resource Usage: The process occurs at the ASM layer and can potentially consume database server resources and inter-cell network traffic.
- Monitoring: Monitored via the V$ASM_OPERATION view and ASM alert logs (alert_+ASM.log).

These two mechanisms are complementary. Exadata scrubbing finds physical errors, potentially preventing them from causing logical corruptions later, while ASM scrubbing can find logical inconsistencies that might arise from sources other than physical media errors (e.g., software bugs). Oracle documentation suggests that due to the presence of automatic Exadata scrubbing in Exadata 11.2.3.3 and later, periodic ASM disk scrubbing becomes less critical for the specific purpose of proactive physical/latent error checking. However, manual ASM scrubbing retains its value for on-demand logical validation of specific files or disk groups.

II. Internal Mechanism of the Exadata Scrubbing Process

The effectiveness of the Exadata Automatic Hard Disk Scrubbing process relies on the tight integration between the core components of the Exadata architecture: the Storage Servers (Cells) and Oracle Automatic Storage Management (ASM).

A. Role of Exadata Storage Servers (Cells)

The scrubbing process is executed by the Exadata System Software (specifically, the Cell Services – CELLSRV process) running on each Exadata Storage Server (Cell). The inspection is local to the cell where the scanned disk resides; data is not sent outside the cell during the sector check phase. This minimizes inter-cell network traffic for the inspection stage.

The Cell Software continuously monitors disk health and I/O utilization to determine when to start, pause, or throttle the scrubbing process. Typically, scrubbing begins or resumes when the average disk I/O utilization drops below a certain threshold (often cited as 25%).

B. Interaction with Oracle ASM for Detection and Repair

When the Exadata scrubbing process detects a bad sector on a hard disk, the procedure unfolds as follows :

Detection: The Cell Software identifies a physical read error or inconsistency during its periodic scan.
Request Submission: The Cell Software that detected the faulty sector automatically sends a repair request to the Oracle ASM instance managing the disk group containing that disk.
Repair by ASM: Upon receiving the request, ASM orchestrates the repair by reading a healthy copy of the data block (extent) containing the bad sector from another storage server where a mirrored copy resides.

This interaction exemplifies Exadata’s “Intelligent Storage” philosophy; low-level physical error detection happens within the cell, while ASM, which understands the database structure and data placement, coordinates the logical repair.

C. Leveraging ASM Mirroring for Data Recovery

Oracle ASM mirroring (Normal or High Redundancy) is fundamental to Exadata’s data protection strategy, and the repair capability of the scrubbing process is entirely dependent on this mechanism.

ASM distributes redundant copies (extents) of data blocks across different failure groups (which in Exadata are typically the Storage Servers). This ensures data accessibility even if an entire cell becomes unavailable, as data can be accessed from other copies.

When ASM receives a repair request triggered by scrubbing, it follows these steps:

Locate Healthy Copy: ASM identifies a disk on a different storage cell that holds a valid copy of the affected data block. ASM knows which disks are “partners” and where mirrored copies are stored.
Read Data: ASM reads the correct data from the disk containing the healthy copy.
Write Over Bad Sector: ASM uses the correct data read to overwrite the bad sector on the original disk, thus correcting the error.

The success of this repair mechanism hinges entirely on the existence of valid and accessible ASM mirrors. If a second disk failure occurs in a Normal Redundancy (2 copies) disk group before a rebalance completes, or if all three copies become inaccessible simultaneously in a High Redundancy (3 copies) group, scrubbing can detect the error, but ASM cannot repair it. This underscores why High Redundancy is strongly recommended for critical systems , as the extra copy significantly reduces the probability of losing all copies concurrently.

Furthermore, the scrubbing process not only repairs isolated bad sectors but can also serve as an early indicator of more severe disk problems. If numerous or persistent errors are detected during scrubbing, it can lead ASM to take the corresponding grid disk offline and initiate a rebalance operation to redistribute data onto the remaining healthy disks. In this context, scrubbing also acts as an early warning system that triggers ASM’s existing high availability (HA) mechanisms. Monitoring the V$ASM_OPERATION view during or after scrub periods is important for tracking such ASM recovery actions.

D. Types of Errors Detected

Exadata Automatic Hard Disk Scrubbing primarily focuses on detecting physical bad sectors and latent media errors on hard disk drives that might not be caught by standard drive ECC or operating system checks. Damaged or worn-out sectors or other physical defects fall under this scope.

The “logical defects” mentioned typically refer to low-level media inconsistencies rather than logical corruptions at the ASM or database level (which is the domain of ASM scrubbing ). The main goal is to find such issues before they impact data access or lead to silent data corruption.

III. Managing and Monitoring the Exadata Scrubbing Process

Effectively utilizing the Exadata Automatic Hard Disk Scrubbing feature requires proper configuration and continuous monitoring. The primary tool for these tasks is the CellCLI (Cell Command Line Interface) utility.

A. CellCLI Commands for Configuration

CellCLI is the main command-line interface for managing Exadata storage server features. Scrubbing-related configuration is done using the ALTER CELL command and specific attributes :

hardDiskScrubInterval: Determines how often the automatic scrubbing process runs. Valid options are:
- daily: Every day
- weekly: Every week
- biweekly: Every two weeks (default)
- none: Disables automatic scrubbing and stops any running process.
- Example: To set weekly scrubbing: CellCLI> ALTER CELL hardDiskScrubInterval=weekly.
hardDiskScrubStartTime: Sets when the next scheduled scrubbing process will start. Valid options are:
- A specific date and time (e.g., in ‘YYYY-MM-DDTHH:MI:SS-TZ’ format).
- now: Triggers the next scrubbing cycle to start immediately (after the current cycle finishes, or for the first run).
- Example: To start at a specific time: CellCLI> ALTER CELL hardDiskScrubStartTime='2024-10-26T02:00:00-07:00'.

To view the current scrubbing settings, use the command: CellCLI> LIST CELL ATTRIBUTES hardDiskScrubInterval, hardDiskScrubStartTime

Note that configuration is done on a per-cell basis, meaning these settings apply to all hard disks within a specific storage server. However, the “Adaptive Scrubbing Schedule” feature can automatically adjust the effective run frequency for specific disks identified as problematic, although the base schedule is configured cell-wide.

B. Monitoring Scrubbing Activity

Several methods are available to understand the status and impact of the scrubbing process:

CellCLI Metrics:
- The most direct way to see real-time scrubbing activity is using the LIST METRICCURRENT command. Specifically, the CD_IO_BY_R_SCRUB_SEC metric shows the read I/O generated by scrubbing in MB/second for each hard disk (CD). Non-zero values indicate active scrubbing on that disk.
- Example Command: CellCLI> LIST METRICCURRENT WHERE name = 'CD_IO_BY_R_SCRUB_SEC'
- Other related metrics (discoverable with LIST METRICDEFINITION WHERE name like '%SCRUB%') might provide additional information about scrubbing wait times or resource usage.
Cell Alert Logs:
- Informational messages indicating the start (Begin scrubbing celldisk) and finish (Finished scrubbing celldisk) of scrubbing operations are logged in the cell alert logs. These logs can be examined using ADRCI (Automatic Diagnostic Repository Command Interpreter) or directly from files under the $CELLTRACE directory. Messages related to errors encountered during scrubbing or disk issues will also appear in these logs.
- Example Command: CellCLI> LIST ALERTHISTORY WHERE message LIKE '%scrubbing%'
AWR Reports (Automatic Workload Repository):
- AWR reports, particularly in their Exadata-specific sections, provide aggregated information about scrubbing I/O activity that occurred during a specific snapshot period. Look for metrics labeled ‘scrub I/O’ in the report.
- Seeing high ‘scrub I/O’ in AWR during periods of low application I/O is normal and expected. However, understanding whether high scrub I/O correlates with performance degradation requires analyzing the overall system load, IORM configuration, and other sections in AWR like ‘Exadata OS I/O Stats’. AWR provides historical context for evaluating impact over time, while CellCLI metrics offer a real-time view.
Real-Time Insight:
- If configured, scrubbing metrics like CD_IO_BY_R_SCRUB_SEC can be sent to a preferred dashboard for visual monitoring of scrubbing activity across all Exadata cells.
ASM Views:
- While Exadata scrubbing doesn’t directly log to V$ASM_OPERATION, if scrubbing triggers an ASM repair or a subsequent rebalance, those operations can be monitored in V$ASM_OPERATION. The V$ASM_DISK_STAT view might also reflect I/O patterns related to scrubbing or repair.

C. Starting, Stopping, and Checking Status

Starting: Scrubbing starts automatically based on the hardDiskScrubInterval and hardDiskScrubStartTime settings. The hardDiskScrubStartTime=now setting can be used to trigger the next cycle immediately. There isn’t a direct command like “start scrubbing now.”
Stopping: To stop and disable automatic scrubbing, use the hardDiskScrubInterval=none command. This will also stop any currently running scrubbing process.
Status Check: There is no single “scrubbing status” command. The status is inferred through the monitoring methods described above (CellCLI metrics, logs, AWR) by looking at active I/O rates and log messages.

D. Table 1: Essential CellCLI Commands for Exadata Hard Disk Scrubbing

The following table summarizes the key CellCLI commands used to manage and monitor the Exadata hard disk scrubbing process:

Command	Purpose	Example	Sources
`ALTER CELL hardDiskScrubInterval = [daily\	weekly\	biweekly\
`ALTER CELL hardDiskScrubStartTime = [‘’\	now]`	Sets the start time for the next scheduled scrubbing operation.	`ALTER CELL hardDiskScrubStartTime=now`
`LIST CELL ATTRIBUTES hardDiskScrubInterval, hardDiskScrubStartTime`	Displays the current scrubbing schedule configuration.	`LIST CELL ATTRIBUTES hardDiskScrubInterval, hardDiskScrubStartTime`
`LIST METRICCURRENT WHERE name = 'CD_IO_BY_R_SCRUB_SEC'`	Monitors the real-time scrubbing I/O rate for each hard disk.	`LIST METRICCURRENT WHERE name = 'CD_IO_BY_R_SCRUB_SEC'`
`LIST ALERTHISTORY WHERE message LIKE '%scrubbing%'`	Checks logs for scrubbing start/finish/error messages.	`LIST ALERTHISTORY WHERE message LIKE '%scrubbing%'`

IV. Performance Impacts of the Scrubbing Process

While designed to proactively protect data integrity, Exadata Automatic Hard Disk Scrubbing does have an impact on system resources, particularly the I/O subsystem. Understanding and managing this impact is crucial.

A. Resource Consumption (CPU, I/O)

The primary resource consumed by the scrubbing process is Disk I/O. The operation involves reading sectors from the hard disks. On an otherwise idle system or disk, the scrubbing process can significantly increase disk utilization, potentially reaching close to 100% for the disk being scanned.

CPU consumption on the storage server (Cell) for the scrubbing check itself is generally low, as it’s largely an I/O-bound operation. However, if scrubbing detects an error and triggers a repair via ASM, that repair process (reading the good copy and writing it to the bad location) can consume additional resources (CPU and network) across cells and potentially database nodes, although the Exadata architecture aims to minimize this impact.

B. Designed Operating Window (Low I/O Utilization)

A key design principle to minimize the performance impact of Exadata scrubbing is that the process only runs when the storage server detects low average I/O utilization. This threshold is commonly cited as 25%.

The system automatically pauses or throttles scrubbing activity when I/O demand from the database workload exceeds this threshold. This mechanism aims to prevent scrubbing from significantly impacting production workloads.

However, there’s a nuance to the “25% utilization” threshold. It may not mean absolute idleness. There could be a persistent background I/O load running just below this threshold (e.g., 20-24%). Adding the scrubbing I/O on top of this existing load will increase the total I/O. While Exadata I/O Resource Management (IORM) prioritizes user I/O , even the minimal added load from scrubbing could potentially have a noticeable effect, especially for applications highly sensitive to very low latency. Therefore, while “low impact” is the goal, “zero impact” is not guaranteed.

C. Interaction with I/O Resource Management (IORM)

Exadata I/O Resource Management (IORM) plays a critical role in managing the performance impact of background tasks like scrubbing. IORM prioritizes and schedules I/O requests within the storage server based on configured resource plans.

IORM automatically prioritizes database workload I/O (e.g., user queries, OLTP transactions) over background I/O processes like scrubbing. This ensures minimal impact on application performance from scrubbing activity. IORM plans can be configured to manage resources among different databases or workloads, indirectly affecting the amount of resources available for background tasks like scrubbing.

D. Potential Performance Impact and Mitigation Methods

Despite being designed for low impact, it should be acknowledged that scrubbing can cause spikes in disk utilization and potentially increase latency, especially in situations where the system isn’t completely idle even when the “idle” threshold is met. The concern about performance impact, though often associated with general ASM scrubbing, can also apply to Exadata scrubbing.

To mitigate this potential impact, consider these strategies:

Scheduling: The most effective mitigation is to schedule the scrubbing process using hardDiskScrubStartTime and hardDiskScrubInterval during periods of genuinely low system activity (e.g., midnight, weekends).
Monitoring: Regularly assess when scrubbing runs and its actual impact in your specific environment using AWR and CellCLI metrics.
IORM Settings: Ensure IORM is configured appropriately for your workload priorities.
Adaptive Scheduling: Leverage Exadata’s adaptive scheduling feature. This automatically adjusts the frequency based on need, potentially reducing unnecessary runs on healthy disks.

E. Factors Affecting Scrubbing Duration

The time required to complete a scrubbing cycle depends on several factors:

Disk Size and Type: Larger capacity hard disks naturally take longer to scan. Estimates like 8-12 hours for a 4TB disk or 1-2 hours per terabyte when idle have been mentioned. Modern High Capacity (HC) drives are much larger (18TB in X9M , 22TB in X10M ), implying potentially much longer scrub times.
System Load: Since scrubbing pauses when user workload increases , the busier the system, the longer the total wall-clock time required to complete a scrub cycle. On a busy system, completing a cycle could take days.
Number of Errors Found: If many bad sectors are found, the time spent coordinating repairs with ASM can increase the total duration.
ASM Rebalance Activity: If scrubbing triggers a larger ASM rebalance operation, that separate process will consume its own resources and take time.
Configured Interval: While not affecting a single run’s duration, the interval determines how frequently the process starts.

It’s noteworthy that duration estimates in documentation (S2 vs S9) vary significantly. This highlights that estimates heavily depend on the Exadata generation (disk sizes/speeds), software version (potential efficiency improvements), and most importantly, the actual workload pattern and resulting “idle” time on the specific system. Relying on monitoring in your own environment is more accurate than general estimates. For instance, one observation noted a scrubbing rate of approximately 115MB/s per disk. At this rate, continuously scanning a 22TB disk (X10M ) would take roughly 54 hours. Given that scrubbing runs intermittently based on load , the actual completion time could be considerably longer.

V. Key Benefits of the Exadata Scrubbing Process

Exadata Automatic Hard Disk Scrubbing is a valuable feature that significantly contributes to the data integrity and high availability capabilities of the Exadata platform.

A. Proactive Detection of Latent Errors and Silent Data Corruption

Its most fundamental benefit is the proactive discovery of physical media errors before they are encountered during normal database operations. This prevents “silent” data corruption, where errors occur on disk but remain undetected until the data is read (which could be much later). By checking data blocks that haven’t been accessed recently , it ensures such hidden threats are uncovered.

B. Enhanced Data Integrity and Reliability

By detecting physical errors and enabling their repair, the scrubbing process directly contributes to the overall data integrity and reliability of the Exadata platform. This feature complements other protection layers like Oracle HARD (Hardware Assisted Resilient Data) checks , ASM mirroring , and database-level checks , providing robust defense against data corruption.

C. Automatic Repair Mechanism

A significant advantage is that the feature automates not just detection but also the initiation of the repair process. In typical bad sector scenarios, both error detection and the triggering of repair via ASM happen automatically, requiring no manual intervention. This reduces administrative overhead and ensures timely correction of detected issues.

D. Complements Other Exadata High Availability Features

Scrubbing is part of Exadata’s comprehensive Maximum Availability Architecture (MAA) strategy. It works alongside features like redundant hardware components , Oracle RAC for instance continuity , ASM for storage virtualization and redundancy , HARD for I/O path validation , and potentially Data Guard for disaster recovery.

This reinforces Exadata’s “defense in depth” approach to data protection. HARD checks the I/O path during writes ; database checks can verify logical structure ; ASM provides redundant copies of data ; and scrubbing proactively inspects the physical media at rest. No single feature covers all possible scenarios, but working together, they provide robust protection. Scrubbing forms a critical layer in this strategy, specifically targeting latent physical errors that might be missed by other mechanisms.

VI. Evolution of the Scrubbing Feature Across Exadata Versions

The Exadata Automatic Hard Disk Scrubbing feature has evolved along with the platform itself.

A. Feature Introduction

The Automatic Hard Disk Scrub and Repair feature was first introduced with Oracle Exadata System Software version 11.2.3.3.0. At that time, specific minimum database/Grid Infrastructure versions like 11.2.0.4 or 12.1.0.2 were required for the feature to function.

B. Adaptive Scrubbing Schedule

A significant enhancement arrived with Exadata System Software version 12.1.2.3.0: the Adaptive Scrubbing Schedule. With this feature, if the scrubbing process finds a bad sector on a disk, the Cell Software automatically schedules the next scrubbing job for that specific disk to run more frequently (typically weekly). This temporarily overrides the cell-wide hardDiskScrubInterval setting for that disk. If the subsequent, more frequent run finds no errors, the disk’s schedule reverts to the global hardDiskScrubInterval setting. This feature also requires specific minimum Grid Infrastructure versions to operate.

This adaptive approach makes scrubbing more efficient. Instead of frequently scanning all disks, it focuses more attention only on disks showing potential issues. This conserves I/O resources on healthy disks while providing quicker follow-up checks on suspect ones.

C. Other Related Developments (Post-12.1.2.3.0)

Available documentation primarily focuses on the introduction of the scrubbing feature and the adaptive scheduling enhancement. Detailed information about significant changes to algorithms, performance tuning (beyond IORM interaction), or reporting in later versions (e.g., post-12.x, 18.x, 19.x, 20.x, 21.x, 22.x, 23.x ) is not provided in the reviewed sources. Consulting the release notes for specific Exadata System Software versions might be necessary for details on newer developments.

D. Table 2: Evolution of Key Exadata Scrubbing Features

The following table summarizes the key milestones in the development of the Exadata scrubbing feature:

Exadata Software Version	Key Feature/Enhancement	Description
11.2.3.3.0	Automatic Hard Disk Scrub and Repair (Introduction)	Introduced the core feature for automatic, periodic inspection and initiation of repair via ASM.
12.1.2.3.0	Adaptive Scrubbing Schedule	Automatically increases scrubbing frequency (e.g., to weekly) for disks where bad sectors were recently detected.
Post-12.1.2.3.0	(Other Enhancements Unspecified)	(Specific major enhancements for later versions are not detailed in the provided documentation)

VII. Configuration and Best Practices

To maximize the benefits of the Exadata Automatic Hard Disk Scrubbing feature, proper configuration and adherence to Oracle’s Maximum Availability Architecture (MAA) principles are important.

A. Default Settings and Configuration Options

Default Schedule: By default, the scrubbing process is configured to run every two weeks (biweekly).
Configuration Options: The hardDiskScrubInterval (daily, weekly, biweekly, none) and hardDiskScrubStartTime (, now) attributes can be set via CellCLI.
No Intensity/Priority Setting: There is no direct CellCLI setting to control the “intensity” or “priority” of the scrubbing process itself. Its impact is primarily managed by the idle-time logic and IORM.

B. Recommended Scheduling Strategies for Production Environments

Use Defaults: For many environments, the default bi-weekly schedule and the automatic execution during low I/O periods are sufficient.
Customize Start Time: Rather than relying solely on now or random times, explicitly setting hardDiskScrubStartTime to known low-load periods (e.g., 2 AM Sunday morning) offers a more controlled approach.
Assess Workload: On very busy, 24/7 systems, evaluate if the biweekly interval allows enough time for the process to complete. If not, consider weekly, but closely monitor the performance impact. Disabling scrubbing (none) is generally not recommended unless there’s a specific, temporary reason, as it forfeits the proactive detection benefit.
Align with Maintenance Windows: Coordinate scrubbing schedules with other planned maintenance windows if possible, although the automatic throttling mechanism should prevent major conflicts.
Monitor Completion: Check logs to ensure scrubbing cycles complete successfully within the planned interval. If cycles consistently fail to complete due to high load, the scheduling strategy needs review.

C. Importance of ASM Redundancy

High Redundancy Recommendation: Using High Redundancy (3 copies) for ASM disk groups on Exadata is strongly recommended, especially for production databases.
Rationale: While scrubbing works with Normal Redundancy (2 copies), High Redundancy provides significantly better protection against data loss during the repair window (especially if an unrelated second failure occurs). Scrubbing’s repair capability depends on having a healthy mirror copy available.
Requirements: Properly implementing High Redundancy typically requires at least 5 failure groups (often 3 storage cells + 2 quorum disks on database servers for Quarter/Eighth Rack configurations).

D. Integration with Overall MAA Strategy

Scrubbing is just one part of the MAA best practices recommended by Oracle for Exadata :

Regular Health Checks: Run the exachk utility regularly (e.g., monthly) or rely on AHF (Autonomous Health Framework) to run it automatically to validate configuration against best practices, including storage and ASM settings.
Use Standby Database: While Exadata scrubbing and HARD checks protect against many issues, a physical standby database (Data Guard) on a separate Exadata system is critical for comprehensive protection against site failures, certain logical corruptions, and as a secondary validation source.
Monitoring: Implement comprehensive monitoring (OEM, AWR, CellCLI metrics, logs, Real-Time Insight) to track system health, performance, and background activities like scrubbing.
Testing: Validate recovery procedures and understand the behavior of features like scrubbing and ASM rebalance in your test environment.

E. Table 3: Exadata Scrubbing Configuration Attributes and Best Practices

This table consolidates key configuration parameters and actionable recommendations:

Parameter/Area	Configuration/Setting	Default	Recommendation
`hardDiskScrubInterval`	`daily`, `weekly`, `biweekly`, `none`	`biweekly`	Start with default. Consider `weekly` for busy systems if needed, monitoring impact. Avoid `none`.
`hardDiskScrubStartTime`	, `now`	None	Explicitly set to a known low-load window (e.g., weekend night).
ASM Redundancy	Normal (2 copies), High (3 copies)	Normal	Use High Redundancy for production disk groups to maximize repair success probability.
Monitoring	CellCLI Metrics, Cell Logs, AWR, ASM Views, `exachk`	None	Regularly monitor scrubbing activity, completion status, performance impact, and overall system health (`exachk`).
Scheduling Strategy	Workload-dependent	Idle-based	Schedule during predictably low-load times; ensure cycles complete.
MAA Integration	Part of overall HA	None	Integrate with Data Guard, regular health checks, and robust monitoring per MAA guidelines.

VIII. Conclusion

Oracle Exadata Automatic Hard Disk Scrub and Repair is a proactive defense mechanism crucial for maintaining data integrity and high availability on the Exadata platform. By periodically scanning hard disks on storage servers for physical errors, this feature detects latent corruptions, especially in infrequently accessed data, before they can impact applications.

The core strength of the scrubbing process lies in the integration between Exadata System Software and Oracle ASM. While the Cell Software detects the error, ASM manages the automatic repair process using mirrored copies. The effectiveness of this repair capability is directly tied to the correctly configured redundancy of ASM disk groups, particularly High Redundancy, which is strongly recommended for production environments.

From a performance perspective, the scrubbing process is designed to run during periods of low I/O utilization detected by the system and is managed by IORM. This aims to minimize the impact on production workloads. However, it remains important for administrators to monitor scrubbing activity via CellCLI metrics, alert logs, and AWR reports, and potentially adjust the schedule based on their environment’s specific workload patterns.

Introduced in Exadata 11.2.3.3.0 and enhanced with Adaptive Scheduling in 12.1.2.3.0, this feature is an integral part of Exadata’s multi-layered data protection strategy (including HARD checks, ASM mirroring, RAC, Data Guard, etc.). Properly configuring and operating Exadata Automatic Hard Disk Scrubbing is critical for preserving data integrity, preventing unexpected outages, and maximizing the value of the Exadata investment. For best results, scrubbing configuration and operation should be considered within the framework of Oracle MAA best practices, supported by regular system health checks (exachk) and comprehensive monitoring.

The post Comprehensive Guide to Oracle Exadata Automatic Hard Disk Scrubbing appeared first on Bugra Parlayan | Oracle Database & Exadata Blog.

Oracle Exadata X11M Platform: Comprehensive Technical Analysis and Comparison with Previous Generations

Bugra Parlayan — Sun, 06 Apr 2025 14:05:00 +0000

Oracle Exadata X11M: Hardware Architecture Overview

The core architecture of Oracle Exadata is based on a scale-out design featuring database servers, intelligent storage servers, and a high-speed, low-latency network fabric connecting these components. Exadata X11M combines this architecture with the latest hardware technologies and optimized software, creating a unique platform for Oracle Database workloads. The co-engineered nature of hardware and software aims to maximize performance at every layer of the system.

Database Server (Standard X11M)

Exadata X11M database servers are designed for compute-intensive database operations. Key hardware specifications include:

CPU: Equipped with two AMD EPYC™ 9J25 processors, each featuring 96 cores per socket (2.6 GHz base, up to 4.5 GHz boost), totaling 192 physical cores per server. These cores are stated to be up to 25% faster than their X10M counterparts.
RAM: Utilizes high-performance 6400MT/s DDR5 DIMMs. Memory capacity options per server are 512 GB (16x32GB), 1.5 TB (24x64GB), 2.25 TB (24x96GB), or 3 TB (24x128GB). This provides up to 33% more DRAM bandwidth compared to X10M.
System Storage: Standard configuration includes 2 x 3.84 TB NVMe devices, expandable to 4. Typically used for the OS and system software.
RDMA Network Fabric: Features one dual-port ConnectX-7 (CX7) RDMA Network Fabric adapter (PCIe 5.0) providing 2 x 100 Gb/s RoCE (RDMA over Converged Ethernet) ports in an active-active configuration for a total bandwidth of 200 Gb/s.
Client/Management Network: Includes dedicated 1GbE Base-T ports for Admin and ILOM. Two dual-port 10/25 GbE adapters (CX6-LX) are factory-installed for client connectivity. Optional adapters include additional 10/25 GbE, 100 GbE (CX6-DX), or Quad 10GBASE-T.

Storage Servers (HC and EF)

Exadata storage servers not only store data but also intelligently offload significant portions of SQL query processing from the database servers, reducing CPU load and network traffic. The X11M generation offers High Capacity (HC) and Extreme Flash (EF) storage server types:

CPU: Both HC and EF servers feature two AMD EPYC™ 9J15 processors (32 cores per socket, 2.95 GHz base, up to 4.4 GHz boost), providing 64 cores per server dedicated to SQL offload and system operations. These are up to 11% faster than X10M storage server CPUs.
RAM: Each server contains 1.5 TB of 6400MT/s DDR5 DRAM, offering 33% more bandwidth than X10M.
Exadata RDMA Memory (XRMEM): A significant portion of the RAM, 1.25 TB, is configured as XRMEM, an ultra-low latency cache tier. Accessible directly via RDMA, XRMEM sits between the database buffer cache and Flash Cache, enabling OLTP read latencies as low as 14µs. RDMA bypasses much of the OS and network stack overhead, accessing DRAM (XRMEM) which is inherently faster than flash, to achieve this low latency.
Exadata Smart Flash Cache: Utilizes performance-optimized PCIe 5.0 NVMe flash cards (F680 v2).
- HC Server: 4 x 6.8 TB cards (Total 27.2 TB cache).
- EF Server: 4 x 6.8 TB cards (Total 27.2 TB cache).
- This new generation flash provides up to 2.2X faster analytical I/O (scan throughput) compared to X10M. SQL Scan throughput from flash reaches 100 GB/s per server.
Persistent Storage:
- HC Server: 12 x 22 TB 7,200 RPM SAS HDDs (Total 264 TB raw capacity).
- EF Server (All-Flash): 4 x 30.72 TB capacity-optimized NVMe PCIe Flash drives (Total 122.88 TB raw capacity).
System Storage: 2 x 480 GB NVMe devices (likely for OS/Exadata software).
RDMA Network Fabric: Same as database server: 1 x dual-port CX7 adapter (PCIe 5.0), 2×100 Gb/s active-active RoCE (Total 200 Gb/s).

The synergy between faster AMD EPYC cores, higher-bandwidth DDR5 RAM, significantly faster PCIe 5.0 flash scans, and the large XRMEM cache accessed via the 200 Gb/s RoCE network forms the foundation for X11M’s overall workload acceleration. Performance gains stem from the interaction of these components (e.g., faster CPUs fed by faster memory/flash over a faster network) rather than any single element.

Table 1: Exadata X11M Server Component Specifications

Component	Database Server (X11M)	Storage Server (HC)	Storage Server (EF)
CPU	2 x AMD EPYC™ 9J25 (96 Cores/socket, 192 Total)	2 x AMD EPYC™ 9J15 (32 Cores/socket, 64 Total)	2 x AMD EPYC™ 9J15 (32 Cores/socket, 64 Total)
RAM (Total Size)	512GB / 1.5TB / 2.25TB / 3TB	1.5 TB	1.5 TB
RAM (Type/Speed)	DDR5 / 6400MT/s	DDR5 / 6400MT/s	DDR5 / 6400MT/s
XRMEM Size	N/A	1.25 TB	1.25 TB
Flash Cache (Type)	N/A	Perf. Opt. F680 v2 NVMe PCIe 5.0	Perf. Opt. F680 v2 NVMe PCIe 5.0
Flash Cache (Size)	N/A	4 x 6.8 TB (Total 27.2 TB)	4 x 6.8 TB (Total 27.2 TB)
Persistent Storage (Type)	System NVMe (2×3.84TB, exp. to 4)	SAS HDD (7200 RPM)	Cap. Opt. NVMe PCIe Flash
Persistent Storage (Size)	–	12 x 22 TB (Total 264 TB Raw)	4 x 30.72 TB (Total 122.88 TB Raw)
RDMA Network	1x Dual Port CX7 (PCIe 5.0), 2x100Gb/s Active-Active RoCE	1x Dual Port CX7 (PCIe 5.0), 2x100Gb/s Active-Active RoCE	1x Dual Port CX7 (PCIe 5.0), 2x100Gb/s Active-Active RoCE

Introducing the Exadata X11M-Z Variant

Alongside the standard high-performance configurations, the Exadata X11M family introduces a more economical entry-level option designated “X11M-Z”. This variant is designed for customers with smaller workloads or those seeking access to core Exadata capabilities at a lower cost point. The X11M-Z replaces the fixed “Eighth Rack” concept of previous generations, offering a more flexible starting configuration. Customers can start with X11M-Z servers and scale out by adding more Z servers or standard X11M servers as needs grow. This approach potentially broadens Exadata’s market reach by attracting customers with smaller initial needs or budgets who might otherwise opt for less optimized platforms or cloud services lacking Exadata features.

X11M-Z Database Server Specifications

CPU: Utilizes a single-socket configuration with 1 x 32-core AMD EPYC™ 9J15 processor (2.95 GHz base, up to 4.4 GHz boost).
RAM: Uses 6400MT/s DDR5 DIMMs. Memory options are 768 GB (12x64GB) or 1.125 TB (12x96GB).
System Storage: Standard 2 x 3.84 TB NVMe devices, expandable to 4.
RDMA Network Fabric: Same as standard X11M DB server: 1 x dual-port CX7 adapter (PCIe 5.0), 2×100 Gb/s active-active RoCE (Total 200 Gb/s).
Client/Management Network: Similar options to standard X11M DB server, but with only one optional field-installable adapter slot.

X11M-Z High Capacity (HC-Z) Storage Server Specifications

CPU: Single-socket configuration with 1 x 32-core AMD EPYC™ 9J15 processor.
RAM: 768 GB 6400MT/s DDR5 DRAM.
XRMEM: 576 GB of the RAM is allocated as XRMEM.
Flash Cache: Two 6.8 TB performance-optimized F680 v2 NVMe PCIe 5.0 flash cards.
Persistent Storage: Six 22 TB 7,200 RPM SAS HDDs.
System Storage: 2 x 480 GB NVMe devices.
RDMA Network Fabric: Same as standard HC/EF storage servers: 1 x dual-port CX7 adapter (PCIe 5.0), 2×100 Gb/s active-active RoCE (Total 200 Gb/s).

Positioning and Use Cases

The X11M-Z variant is ideal for smaller databases, development/test environments, or departmental applications that require core Exadata software features (Smart Scan, Storage Indexes, HCC, IORM) and performance characteristics but do not need the full scale of standard X11M. It offers a lower entry cost and a reduced power consumption footprint. The single-socket design inherently consumes less power than dual-socket standard servers, aligning with the overall theme of improved energy efficiency and sustainability in X11M. Offering a lower-power hardware option alongside software-based power management strengthens Exadata’s efficiency narrative.

Table 2: Exadata X11M vs. X11M-Z Key Hardware Differences

Component	Standard X11M Server	X11M-Z Server
Server Type	Database / Storage (HC/EF)	Database / Storage (HC-Z)
CPU Sockets	2	1
CPU Cores/Server	DB: 192 / Storage: 64	DB: 32 / Storage: 32
Max RAM/Server	DB: 3 TB / Storage: 1.5 TB	DB: 1.125 TB / Storage: 768 GB
XRMEM Size (Storage)	1.25 TB	576 GB
Flash Cache (Storage)	4 x 6.8 TB Cards (HC/EF)	2 x 6.8 TB Cards (HC-Z)
Disk Drives (HC)	12 x 22 TB	6 x 22 TB (HC-Z)

Generational Performance Leap: Exadata X11M vs. X10M

Exadata X11M delivers significant performance increases across all major workload types compared to the previous X10M generation. These gains stem from the combination of updated hardware components (CPU, memory, flash, network) and continuous software optimizations. Oracle offering these improvements at the same starting price as X10M enhances the platform’s value proposition.

AI Vector Search Acceleration

With the rise of AI-powered applications, AI Vector Search for semantic similarity search, RAG applications, recommendation systems, and anomaly detection has become critical. Exadata X11M is explicitly optimized for these workloads.

Exadata’s unique AI Smart Scan feature plays a central role by offloading vector distance calculations and top-K filtering to storage servers. Data is processed in XRMEM or Flash Cache on the storage servers, minimizing network traffic and database server load.

X11M provides the following concrete gains over X10M for AI Vector Search:

Persistent Vector Index (IVF) Searches: Up to 55% faster due to hardware acceleration and transparent storage offload.
In-Memory Vector Index (HNSW) Queries: Up to 43% faster, benefiting from faster database server CPUs and memory.
Software Optimizations (All Exadata Platforms):
- 4.7X more data filtering on storage servers (due to improved top-K efficiency).
- Up to 32X faster queries when searching BINARY vector dimensions. This offers significant speedup with minimal accuracy impact for some use cases.
- Offload of vector distance projection to storage servers.

OLTP Performance Enhancements

Low latency and high IOPS are crucial for financial transactions, e-commerce, and other critical OLTP systems. Exadata X11M delivers significant improvements:

IOPS: Delivers up to 25.2 Million 8K SQL read IOPS in a single rack. Competitive claims mention significantly higher IOPS compared to rivals like Pure Storage (33x) and Dell PowerMax (3.3x).
Latency: SQL 8K read latency is reduced to as low as 14 microseconds (14μs) thanks to XRMEM and RDMA. This is a 21% improvement over X10M’s 17µs. Cloud comparisons claim up to 70x lower latency than AWS RDS and Azure SQL.
Throughput: Up to 25% faster serial transaction processing and up to 25% higher concurrent transaction throughput compared to X10M, driven by faster cores.
Flash Performance: Single block reads from flash are up to 43% faster.

Analytics Workload Acceleration

Data warehouses, reporting, and large-scale analytics demand high scan throughput. Exadata X11M shows significant progress here:

Scan Throughput:
- Analytical I/O (scan throughput) on storage servers is up to 2.2X faster than X10M. This is enabled by faster PCIe 5.0 flash and potentially the ability to scan from both flash and XRMEM concurrently.
- Flash scan throughput reaches 100 GB/s per server.
- Scanning columnar data cached in XRMEM reaches 500 GB/s per server.
- Total scan throughput per rack can reach up to 8.5 TB/s (likely from XRMEM). This is a substantial increase from the 1 TB/s per rack claimed for X10M.
Query Processing: Up to 25% faster analytics query processing compared to X10M. This results from faster database server cores and faster storage offload processing.
Database In-Memory: Database In-Memory scans increase up to 500 GB/s. Exadata extends In-Memory columnar formats into Flash Cache and XRMEM, leveraging ultra-fast SIMD Vector instructions.

This balanced improvement across workloads reinforces Exadata’s suitability for diverse and consolidated environments. The significant gains allow more demanding workloads to coexist efficiently on the same platform. While hardware upgrades provide the raw potential, software optimizations like AI Smart Scan enhancements, RDMA protocols, and Exadata System Software unlock this potential. Some software optimizations, like the 32x faster binary vector queries, are available across Exadata platforms (with appropriate software), demonstrating software value independent of hardware, but achieving peak potential on the latest hardware like X11M.

Key Software and Efficiency Improvements

Beyond raw performance, Exadata X11M introduces significant software and hardware features focused on improving operational efficiency and resource utilization, helping to lower Total Cost of Ownership (TCO) and address modern data center concerns like sustainability and management overhead.

Advanced Power Management

Introduced with Exadata System Software 25.1 and specifically designed for X11M database servers, these capabilities help organizations meet energy efficiency goals and reduce operational costs. Key features include:

Core Disablement: When the active core count (pendingCoreCount) is set to 128 or lower, the system automatically powers off 64 unused CPU cores (32 per socket), saving approximately 80 Watts per server without impacting performance for the licensed cores. Cores can be re-enabled if needed.
Power Capping: Allows setting an overall power consumption target for the database server, useful for regulatory compliance or cooling management. Performance scales linearly with the power cap (e.g., a 10% power reduction results in roughly a 10% peak processing capacity reduction).
Low Power Mode: Enables scheduling automatic transitions to a low power mode during anticipated low-usage periods (e.g., nights, weekends). The system automatically exits low power mode if demand unexpectedly increases (based on CPU, I/O, or network thresholds) to maintain performance.

Combined with the ability to consolidate more workloads onto fewer systems due to X11M’s higher performance, these power management features can lead to significant savings in infrastructure, power, cooling, and data center space costs.

Storage Efficiency

Exascale Free Space Management (Exadata SW 25.1): This enhancement significantly reduces the amount of free space Exascale storage pools require compared to traditional ASM disk groups to successfully complete an automatic data rebalance after a storage device failure. For example, with 9+ HC servers, Exascale requires only 3% free space versus 9% for ASM. This increases usable storage capacity. Such improvements indicate the maturation of the Exascale architecture beyond basic pooling.
Hybrid Columnar Compression (HCC): A standard Exadata feature that significantly compresses data (often 5x-20x), especially for analytics, saving storage costs and improving performance by reducing I/O.
Exascale Thin Cloning: Redirect-on-write technology enables space-efficient clones, particularly useful for dev/test environments.

Other Notable Software Features (Exadata SW 25.1)

Exadata System Software 25.1 introduces other improvements relevant to the X11M platform:

Automatic Tuning of ASM Rebalance: Dynamically adjusts the asm_power_limit based on available I/O bandwidth and client database workload presence. It speeds up rebalance when resources are free and slows it down to prioritize user workloads, minimizing performance impact and reducing manual tuning needs. This contrasts with the traditional static power limit approach.
Simpler Package Management: Streamlines management of additional non-Exadata software packages during database server updates (that don’t change the major OS version), reducing maintenance window duration.
Exascale Volume Cloning: Ability to create clones directly from existing Exascale volumes.
Secure Fabric Default: Secure internal communication layer is now recommended and enabled by default.
Cache Observability Enhancements: Improved monitoring of Exadata cache performance via ecstat utility enhancements.
Faster Cisco Switch Upgrades: Reduces downtime associated with network switch software updates.

These efficiency-focused features demonstrate that X11M offers a holistic approach, addressing not just raw speed but also operational costs, management complexity, and environmental impact.

Comparative Summary: Exadata X11M vs. X10M

Exadata X11M represents a significant step up from its predecessor, X10M, offering substantial hardware and software enhancements that translate into tangible performance and efficiency gains. The table below summarizes key technical and performance differences:

Table 3: Feature and Performance Comparison (Exadata X11M vs. X10M)

Feature/Metric	Exadata X10M	Exadata X11M	Improvement
Hardware
DB Server CPU	2x AMD EPYC™ 9J14 (96 Cores/socket)	2x AMD EPYC™ 9J25 (96 Cores/socket)	Up to 25% faster core performance
Storage Server CPU	2x AMD EPYC™ (32 Cores/socket)	2x AMD EPYC™ 9J15 (32 Cores/socket)	Up to 11% faster core performance
Memory Type/Speed	DDR5 / 4800MT/s	DDR5 / 6400MT/s	33% more bandwidth
Flash Technology	PCIe 4.0 NVMe	PCIe 5.0 NVMe	Faster (Up to 2.2x for Analytics I/O)
Network Fabric Speed	2x100Gb/s RoCE (Active-Active)	2x100Gb/s RoCE (Active-Active)	Same nominal speed
XRMEM Latency	< 17µs	< 14µs	Up to 21% lower
AI Vector Search
IVF Search Speedup	Baseline	Up to 55% faster	Up to 55%
HNSW Search Speedup	Baseline	Up to 43% faster	Up to 43%
Binary Vector Query Speedup	Baseline (Software dependent)	Up to 32x faster (Software Opt.)	Up to 32x (Software Opt.)
OLTP
Max Read IOPS (per rack)	25.2 Million	25.2 Million	Equal (Likely network/protocol limited)
SQL Read Latency	< 17µs	< 14µs	Up to 21% lower
Serial Transaction Speedup	Baseline	Up to 25% faster	Up to 25%
Concurrent Throughput Increase	Baseline	Up to 25% more	Up to 25%
Analytics
Analytics I/O Speedup (Storage)	Baseline	Up to 2.2x faster	Up to 2.2x
Analytics Query Processing Speedup	Baseline	Up to 25% faster	Up to 25%
Max In-Memory Scan Speed (Server)	~227 GB/s (Implied/Est. for X10M)	500 GB/s (from XRMEM)	>2x increase (with XRMEM scan)
Efficiency
Advanced Power Management Features	No	Yes (Core disable, Capping, Low Power Mode)	New
X11M-Z Option	No (Eighth Rack existed)	Yes	New (More flexible entry-level)

]

Conclusion

Oracle Exadata X11M is a powerful engineered system representing a significant advancement over the previous X10M generation in performance, efficiency, and flexibility. The integration of the latest AMD EPYC processors, faster DDR5 memory, PCIe 5.0 flash, and Exadata RDMA Memory (XRMEM) provides a substantial hardware uplift.

These hardware improvements, combined with intelligent software optimizations like AI Smart Scan and features introduced in Exadata System Software 25.1 (Advanced Power Management, Automatic ASM Rebalance Tuning, enhanced Exascale storage management), further amplify X11M’s capabilities. The platform delivers demonstrable performance gains across critical workloads, including AI Vector Search (up to 55% faster IVF, 32x faster binary queries), OLTP (up to 25% higher throughput, latency down to 14µs), and Analytics (2.2x faster storage I/O, up to 500 GB/s In-Memory scans).

Crucially, these advancements are offered at the same price point as the previous generation, making X11M a compelling value proposition. Enhanced power management and consolidation potential contribute to lower TCO and improved sustainability, while the introduction of the X11M-Z variant makes the platform accessible to a broader range of customers.

With deployment flexibility across on-premises, OCI, Cloud@Customer, and major multicloud environments (Azure, AWS, Google Cloud), Exadata X11M allows organizations to run their critical Oracle Database workloads wherever needed without application changes.

In summary, Oracle Exadata X11M stands as one of the most advanced and strategic platforms available for organizations seeking peak performance, scalability, and efficiency for their Oracle Database workloads. Its strong emphasis on AI capabilities signals its readiness for future enterprise computing trends.

The post Oracle Exadata X11M Platform: Comprehensive Technical Analysis and Comparison with Previous Generations appeared first on Bugra Parlayan | Oracle Database & Exadata Blog.

Exadata 25ai Future : Automatic Tuning of ASM Rebalance Operations

Bugra Parlayan — Tue, 18 Mar 2025 13:12:00 +0000

The Oracle Exadata platform continuously evolves, delivering industry-leading performance, availability, and scalability for Oracle Database workloads. A key part of this evolution is the increasing role of automation in database infrastructure management. Oracle Automatic Storage Management (ASM), a volume manager and file system specifically designed for Oracle databases, plays a critical role in simplifying storage administration. One of ASM’s core functions is ensuring balanced data distribution across disks after storage configuration changes (like adding/removing disks) or hardware failures – a process known as ASM rebalance. Rebalancing is vital for maintaining data integrity and optimizing database performance.

Oracle Exadata System Software 25ai (specifically release 25.1.0) introduces several innovations. Among these is the “Automatic Tuning of ASM Rebalance Operations” feature. This document provides an in-depth look at this new capability, explaining its mechanism, comparing it to traditional methods, and highlighting its key benefits for Exadata performance and storage management.

1. Defining “Automatic Tuning of ASM Rebalance Operations” in Exadata 25ai

Definition: Introduced with Oracle Exadata System Software 25.1.0, “Automatic Tuning of ASM Rebalance Operations” is a capability that dynamically adjusts the speed or power of ASM rebalance operations based on the system’s real-time I/O (Input/Output) status and current database workloads. This feature allows ASM to intelligently manage the intensity of its data movement operations, considering the overall health and performance of the Exadata system.
Core Purpose: The primary goal is twofold:
1. Ensure ASM rebalance operations (e.g., restoring data redundancy after disk failure, redistributing data after adding storage, resynchronizing data post-rolling update) complete as quickly as possible.
2. Minimize the negative performance impact on critical database workloads (like OLTP transactions or analytical queries) running concurrently on the system while the rebalance is in progress.
Essentially, this feature aims for rebalance operations that are both efficient (fast) and “neighbor-friendly,” sharing system resources harmoniously with other essential processes.

2. How Automatic ASM Rebalance Tuning Works

The core principle involves the Exadata system software automatically and dynamically managing the ASM_POWER_LIMIT initialization parameter, which traditionally required manual tuning by a Database Administrator (DBA).

Dynamic asm_power_limit Adjustment:
- The ASM_POWER_LIMIT parameter dictates how many resources (primarily I/O bandwidth and CPU) a rebalance operation consumes, thus determining its speed. Traditionally, DBAs set a static value. A high value (e.g., 11, or up to 1024 in newer versions ) speeds up rebalancing but can negatively impact other workloads. A low value is less intrusive but slows down the rebalance.
- The automatic tuning feature in Exadata 25ai replaces this static approach. The system software continuously adjusts the asm_power_limit based on real-time system conditions.
Monitored System Conditions: The tuning mechanism primarily monitors:
- I/O Performance and Available Bandwidth: Exadata software constantly observes the available I/O bandwidth and overall I/O performance of the storage subsystem. If significant I/O capacity is free (low I/O contention), the software automatically increases asm_power_limit to accelerate the rebalance.
- Client Database Workloads: The system detects active client database workloads requiring I/O resources. When such workloads are present, the software automatically lowers asm_power_limit to protect their performance. This allows the rebalance to run less aggressively in the background, freeing up resources for priority tasks.
Underlying Algorithms/Enhancements: Oracle documentation states this feature is enabled by specific “enhancements and algorithms” added in Exadata System Software 25.1.0. However, the precise technical details of these algorithms (specific metrics used, decision thresholds, adjustment frequency) are not publicly disclosed, considered part of Exadata’s proprietary optimizations.

This feature represents a shift from reactive manual tuning (adjusting asm_power_limit after observing issues ) to proactive, automated optimization. It continuously balances the critical resource of I/O bandwidth between the need for rebalancing and the demands of ongoing database operations. This aligns with Oracle’s broader automation and autonomous database strategy , reducing manual intervention and increasing the self-managing capabilities of the Exadata platform, particularly in storage management.

3. Solving Traditional ASM Rebalance Challenges

While effective, the traditional manual approach to ASM rebalancing presented several challenges:

Challenges of Manual Methods:
- The asm_power_limit Dilemma: Administrators faced a constant trade-off: complete the rebalance quickly (especially after critical events like disk failure) or slow it down to protect production workload performance. Finding the optimal static value often required trial-and-error and continuous monitoring. Incorrect settings could lead to unnecessarily long rebalances (increasing risk exposure) or unacceptable application slowdowns.
- Potential Performance Impact: High asm_power_limit values generate intense I/O activity , potentially starving normal database operations of required bandwidth and causing slowdowns.
- “Overbalancing” Inefficiency: Performing storage changes like adding then dropping disks in separate steps could trigger two full rebalance operations, nearly doubling the I/O load and duration compared to a single combined operation. This was a common inefficiency in manual procedures.
- Difficulty Adapting to Variable Workloads: A static asm_power_limit couldn’t dynamically adapt to changing workload profiles (e.g., batch processing vs. OLTP). Rebalance might run slower than possible during idle periods or cause contention during peak hours.
- Unawareness of Higher Power Limits: Not all administrators might realize that ASM_POWER_LIMIT can go up to 1024 (not just 11) in newer ASM versions, potentially running rebalances slower than necessary.
Solutions Provided by Automatic Tuning: The Exadata 25ai automatic tuning feature addresses these issues:
- Eliminates Manual Tuning: Dynamically adjusts asm_power_limit, removing the need for administrators to manually find and manage this complex balance.
- Dynamic Balancing Act: Automatically balances rebalance speed (increasing when resources are free) against workload priority (slowing rebalance when critical workloads are detected).
- Optimal Resource Utilization: Aims for the most efficient use of system resources (especially I/O) based on the current situation, speeding up rebalance during quiet times and protecting workloads during busy periods.
- Consistency: Strives to minimize the impact of rebalancing on other operations, contributing to more predictable and consistent system performance regardless of workload.

Automatic tuning also reduces operational risks associated with manual configuration errors (incorrect asm_power_limit, inefficient “overbalancing” steps). By automating these decisions , the likelihood of such errors and their potential negative impact on performance or rebalance duration is significantly lowered.

4. Key Benefits of Automatic ASM Rebalance Tuning

The “Automatic Tuning of ASM Rebalance Operations” feature offers tangible advantages for Exadata administrators and users:

Minimized Performance Impact: This is the primary benefit. The system automatically throttles rebalance activity when it detects critical database workloads, reducing I/O and CPU consumption. This ensures minimal degradation in application and end-user performance, crucial for latency-sensitive OLTP systems.
Enhanced Data Resilience and Distribution Health:
- Faster Rebalancing: When system resources (especially I/O bandwidth) are available, the feature accelerates the rebalance by increasing asm_power_limit. This significantly shortens the time needed to restore data redundancy after a disk or storage server failure, reducing the window of vulnerability to subsequent failures and improving overall data resilience.
- Efficient Data Distribution: Speeds up the process of evenly distributing data across disk groups after configuration changes (like adding/removing storage). This promotes efficient use of storage resources and prevents potential performance bottlenecks.
Improved Manageability:
- Reduced Administrative Burden: Eliminates the need for DBAs and system administrators to constantly monitor, evaluate, and manually adjust asm_power_limit during rebalance operations. This boosts operational efficiency and frees up administrators for more strategic tasks.
- Simplified Operations: Greatly simplifies rebalance management, especially in dynamic environments with frequent storage changes or highly variable workload profiles.
Synergy with Exadata Exascale:
- Ideal for Dynamic, Large-Scale Environments: The Exadata Exascale architecture introduces more flexible, shared, and dynamic management of storage and compute resources. Automatic rebalance tuning helps manage the inherent complexity of storage operations in such large-scale, potentially multi-tenant environments. It works synergistically with other Exascale storage efficiency features like “Improved Free Space Management” to create a more autonomous and efficient storage layer.
- Complements Exascale Efficiency Goals: A core goal of Exascale is to improve storage resource utilization and reduce costs. Automatic rebalance tuning directly supports this by ensuring rebalance operations themselves use resources efficiently. By preventing unnecessary performance dips and reducing operational overhead, it reinforces Exascale’s overall value proposition of performance, efficiency, and agility. The dynamism of Exascale might lead to more frequent or complex rebalance scenarios, making manual management even more challenging. Automatic tuning naturally adapts to this dynamism, aiding Exascale in achieving its goals.

5. Configuration and Monitoring

Configuration: Current Oracle Exadata System Software 25.1.0 documentation does not provide specific commands or interface options to enable, disable, or fine-tune the “Automatic Tuning of ASM Rebalance Operations” feature. This strongly suggests the feature is enabled by default on compatible Exadata systems and likely requires no direct user configuration, unlike other offload controls like CELL_OFFLOAD_PROCESSING.
Monitoring:
- No Direct Monitoring Mechanism: The documentation does not describe specific V$ views or metrics to directly monitor the internal workings of the automatic tuning mechanism itself (e.g., the dynamically calculated asm_power_limit value or influencing factors).
- Indirect Monitoring (Observing Effects): The feature’s presence and effectiveness can be observed indirectly using standard ASM and system monitoring tools:
  - V$ASM_OPERATION / GV$ASM_OPERATION: These core ASM views show the status (STATE), type (OPERA), current power level (POWER), progress (SOFAR, EST_WORK), and estimated time remaining (EST_MINUTES) for ongoing rebalance operations. Observing fluctuations in POWER (if reflected dynamically), EST_RATE, or EST_MINUTES during a rebalance, especially correlating with changes in system load, could indicate automatic tuning at work. For instance, EST_MINUTES might increase when workload increases (rebalance slows) and decrease when workload drops.
  - I/O Performance Metrics: Exadata-specific tools like ecstat or standard OS (iostat) and database (AWR, ASH) monitoring tools can track overall I/O performance (IOPS, throughput MB/s, latency). OCI Database Management service also offers ASM performance monitoring. Observing changes in these metrics during rebalance (e.g., increased rebalance I/O) while ensuring the I/O performance (especially latency) of critical workloads remains stable would demonstrate the effectiveness of automatic tuning.
  - Workload Performance: The most crucial indicator is the performance of critical applications and database workloads during rebalance operations. Key Performance Indicators (KPIs) like end-user response times and transaction throughput should not degrade significantly, proving the feature is achieving its primary goal.

While Oracle likely enables this feature by default due to confidence in its stability, standard monitoring practices remain essential. Administrators should continue to use tools like V$ASM_OPERATION and monitor overall system/workload I/O performance during rebalances to verify expected behavior and detect any potential anomalies.

Manual vs. Automatic ASM Rebalance Tuning Comparison

Feature	Traditional Manual Tuning	Exadata 25ai Automatic Tuning
`asm_power_limit` Setting	Static, manually set by DBA	Dynamic, automatically adjusted by Exadata software
Core Mechanism	Operates at a fixed parallelism/resource limit	Continuously adjusts limit based on system I/O & workload
Factors Considered	DBA experience, general system expectations	Real-time I/O bandwidth, active client workloads
Primary Goal	Manually balance speed vs. low impact	Maximize speed AND minimize impact automatically
Workload Impact	Can be significant at high limits	Automatically reduced when workload detected
Rebalance Speed	Tied to static limit, potentially suboptimal	Automatically increased when resources available
Administrative Effort	Requires ongoing monitoring & potential tuning	Minimal/None
Adaptability	Cannot adapt to variable workloads without intervention	Automatically adapts to changing I/O & workload conditions

Conclusion

The “Automatic Tuning of ASM Rebalance Operations” feature in Exadata System Software 25ai marks a significant advancement for Oracle’s flagship database machine platform. It makes the critical, yet potentially disruptive, ASM rebalance process smarter, more efficient, and considerably more sensitive to concurrent system workloads.

By dynamically adjusting the ASM_POWER_LIMIT based on real-time I/O conditions and database workload demands , the feature intelligently accelerates rebalancing when resources permit and throttles it during peak usage times, minimizing performance impact.

This automatic tuning directly enhances Exadata’s overall performance, availability, and manageability. It improves system resilience by speeding up redundancy restoration , reduces administrative overhead by eliminating manual tuning , and lowers operational risks associated with manual configuration errors.

“Automatic Tuning of ASM Rebalance Operations” aligns perfectly with Oracle’s broader strategy towards greater automation and autonomous capabilities within the Exadata and database ecosystem. By automating this crucial aspect of storage management, it contributes to making Exadata a more efficient, self-managing platform, particularly beneficial in dynamic, large-scale Exadata Exascale environments. This feature is a valuable innovation reinforcing Exadata’s leadership in database performance and ease of management.

The post Exadata 25ai Future : Automatic Tuning of ASM Rebalance Operations appeared first on Bugra Parlayan | Oracle Database & Exadata Blog.

AI Smart Scan on Oracle Exadata: Accelerating AI Vector Search for RAG and Similarity Search

Bugra Parlayan — Wed, 12 Feb 2025 12:50:00 +0000

1. Introduction: Exadata and the Rise of In-Database AI

1.1. Oracle Exadata: High-Performance Database Platform

Oracle Exadata stands as a premier enterprise database platform, engineered to run Oracle Database workloads with exceptional performance, availability, and security across all scales and criticalities. Its architecture integrates high-performance database servers, intelligent storage servers, and an ultra-fast, low-latency internal network fabric (typically RDMA over InfiniBand). The core philosophy is the co-design of hardware and software, enabling unique optimizations for database operations at both compute and storage layers.

Exadata is optimized for both Online Transaction Processing (OLTP) and Data Warehousing (DW)/Analytics. It has evolved to support modern workloads like in-memory analytics, Artificial Intelligence (AI), and Machine Learning (ML), facilitating efficient mixed-workload consolidation. Its scale-out design allows balanced expansion of compute, storage, and network resources to meet growing demands.

1.2. The Trend Towards In-Database AI

Integrating AI and ML capabilities directly into database platforms is a significant trend. This approach minimizes or eliminates the need to move data to separate systems for analysis or model training, reducing complexity, latency, and security risks associated with data movement. Processing data in place allows AI/ML algorithms to leverage the database’s transactional capabilities, security models, and consistency guarantees.

Oracle addresses this through its Converged Database strategy, managing diverse data types (relational, JSON, graph, spatial, and now vector) and workloads within a single database engine. This aims to eliminate data silos and management complexity associated with specialized databases.

1.3. Report Focus: AI Smart Scan for Vector Processing

This technical report provides an in-depth analysis of the AI Smart Scan feature on the Oracle Exadata platform, specifically its vector processing capabilities. It defines AI Smart Scan within the Exadata context, explains how it accelerates Oracle AI Vector Search, details the offloading mechanism for vector tasks to Exadata storage servers (especially within the Exascale architecture), and documents claimed performance gains.

The report highlights the importance of this feature for modern database AI applications, particularly Similarity Search and Retrieval-Augmented Generation (RAG) for Generative AI.

2. Understanding AI Smart Scan in the Exadata Context

2.1. Evolution from Traditional Smart Scan (SQL Offload)

Understanding AI Smart Scan requires knowledge of its predecessor, the traditional Smart Scan (or SQL Offload), a cornerstone of Exadata. This technology pushes data-intensive SQL processing from database servers to intelligent storage servers.

In conventional architectures, full table scans move all data blocks across the network to the database server for filtering (WHERE clauses) and projection (SELECT list), consuming network bandwidth and database server CPU.

Exadata Smart Scan optimizes this by sending SQL filters and column projections to the storage servers. Storage servers apply these filters as data is read, returning only relevant rows and columns to the database server. This dramatically reduces data transfer and database server CPU usage, boosting database performance.

2.2. Defining AI Smart Scan

AI Smart Scan is a set of Exadata-specific optimizations designed to accelerate the AI Vector Search capabilities introduced in Oracle Database 23ai. It extends the Smart Scan philosophy to AI vector operations, specifically offloading compute-intensive vector distance calculations and Top-K nearest neighbor filtering to the storage servers.

Introduced with Oracle Exadata System Software 24.1 and requiring Oracle Database 23ai or later , AI Smart Scan leverages the processing power of Exadata storage servers and the high-speed internal network to optimize vector-based queries.

2.3. Core Objectives: Performance and Efficiency

The primary goal of AI Smart Scan is to achieve orders-of-magnitude performance improvements for AI Vector Search queries, especially on large datasets. This is accomplished by addressing the computational intensity of vector processing.

By offloading distance calculations and Top-K filtering to the Exadata storage servers where data resides , AI Smart Scan achieves:

Reduced Database Server Load: Frees up database server CPU resources for other tasks, improving overall system throughput.
Minimized Network Traffic: Only filtered results (e.g., top K vectors) are sent back, significantly reducing data movement, crucial for high-dimensional vectors.
Low Latency and High Throughput: Processing data closer to the source, combined with Exadata’s low-latency RDMA network, results in faster query responses.

AI Smart Scan represents a logical extension of Exadata’s core principle: moving processing closer to the data. It adapts the proven SQL offload architecture to the demanding requirements of modern AI workloads, particularly the compute-heavy nature of vector processing. The claim of “orders of magnitude” performance gains signifies a fundamental architectural advantage, positioning Exadata as a high-performance platform for vector search, competitive with specialized vector databases.

3. Accelerating AI Vector Search with AI Smart Scan

3.1. Oracle AI Vector Search Fundamentals

Oracle AI Vector Search is an integrated database capability enabling semantic search based on meaning, not just keywords. It uses vector embeddings – multi-dimensional numerical representations – to capture the semantic meaning of structured and unstructured data (text, images, audio). Semantically similar items have vectors closer in the vector space.

Key use cases include:

Semantic Search: Searching documents, products by meaning.
Recommendation Systems: Suggesting similar items based on user preferences.
Anomaly Detection: Identifying outliers.
Image/Video Search: Finding visually similar content.
Retrieval-Augmented Generation (RAG): Enhancing Large Language Model (LLM) accuracy with relevant enterprise data.

Oracle Database 23ai provides:

VECTOR Data Type: Native storage for vector embeddings.
Vector Indexes: Optimized indexes (HNSW for in-memory, IVF for disk-based) to accelerate similarity searches.
SQL Operators/Functions: New SQL capabilities (VECTOR_DISTANCE) for performing similarity searches and combining them with other data types.

3.2. AI Smart Scan’s Role in Acceleration

AI Smart Scan is central to optimizing AI Vector Search query performance, especially for large-scale vector data scans. It addresses the compute-intensive nature of finding nearest neighbors in vast vector datasets.

Acceleration mechanisms include:

Compute Offload: Intensive vector distance calculations and Top-K filtering are executed on storage servers, not the database server.
Parallel Processing: Exadata’s scale-out architecture allows these offloaded operations to run in parallel across multiple storage servers.
Data Reduction: Only filtered results (top K vectors) are sent back to the database server, minimizing network traffic.
Hardware Optimization: AI Smart Scan leverages Exadata’s ultra-fast storage tiers (XRMEM, Smart Flash Cache) and low-latency RDMA network.

These combined mechanisms deliver low-latency responses and high-throughput processing for AI Vector Search queries. Oracle’s strategy of integrating AI Vector Search into the database and accelerating it with AI Smart Scan embodies the “bring AI to the data” approach , simplifying architectures compared to using separate vector databases and offering significant efficiency gains, especially for existing Oracle users. The focus on handling “massive volumes” and “high concurrency” positions this technology for demanding, mission-critical enterprise AI workloads.

4. Offloading Vector Processing to Exascale Storage Servers

4.1. The Offload Mechanism Explained

AI Smart Scan pushes the most CPU-intensive vector search steps—vector distance calculations and Top-K filtering—down to the Exadata storage servers.

Vector Distance Calculation Offload: Computations using metrics like Cosine Similarity or Euclidean Distance are performed directly on storage servers, distributing the CPU load.
Top-K Filtering Offload: Identifying the ‘K’ nearest neighbors to a query vector also happens at the storage layer. Only these top candidates (or potential improvements) are returned, preventing unnecessary network transfer of irrelevant vectors.

This storage offload dramatically reduces network traffic between database and storage servers, conserving bandwidth and lightening the load on the database server, especially critical for high-dimensional vectors.

4.2. Leveraging Exadata Hardware for Vector Processing

AI Smart Scan’s effectiveness is tightly coupled with Exadata’s hardware:

Exadata RDMA Memory (XRMEM) & Smart Flash Cache: AI Smart Scan processes vector data at “memory speed” using these ultra-fast storage tiers, offering much lower latency than traditional storage. RDMA (Remote Direct Memory Access) allows direct data transfer to database server memory, bypassing network stacks for further latency reduction and throughput gains.
Scale-out Architecture: Offloaded vector operations are parallelized across all available storage servers in Exadata’s scale-out design, leveraging numerous CPU cores for rapid processing of large datasets.

4.3. Enhancements in Exadata System Software 25.1

Oracle continuously refines AI Smart Scan. Release 25.1.0 introduced key improvements :

Enhanced Top-K Filtering: Storage servers maintain a running Top-K set locally, only sending results that improve the current best set back to the database server. This significantly reduces network traffic and improves performance, especially for large K values.
INT8 and BINARY Vector Support: AI Smart Scan now supports these compact, efficient formats alongside high-precision FLOAT types. BINARY vectors offer up to 32x smaller size and 40x faster distance computation compared to FLOAT32, with minimal impact on search quality in some tests. INT8 offers 4x compression with negligible quality difference in evaluations. This caters to diverse accuracy vs. performance needs.
Vector Distance Projection Offload: When a query selects the vector distance (SELECT VECTOR_DISTANCE(...)), this calculation is now also offloaded to storage servers, further reducing network traffic by avoiding the transfer of large vectors just to compute the distance on the database server.

Support for INT8/BINARY formats broadens AI Smart Scan’s applicability beyond high-accuracy scenarios to use cases prioritizing performance and efficiency. Offloading distance projection demonstrates Oracle’s commitment to refining the offload mechanism for maximum network optimization.

4.4. Exascale Architecture and Vector Offload

Oracle Exadata Exascale is a next-generation architecture merging Exadata’s performance with cloud elasticity and cost-effectiveness. Its loosely-coupled design, separating compute and storage into shared resource pools , provides an ideal foundation for AI Smart Scan offload.

The Exascale intelligent storage cloud hosts the storage servers targeted by AI Smart Scan. Shared storage pools and the RDMA fabric enable efficient distribution and parallel execution of offloaded vector tasks. Exascale combines the elasticity needed for AI workloads with the raw performance delivered by AI Smart Scan, making Exadata attractive for dynamic, cloud-native AI applications.

5. Measured Performance Gains for AI Vector Search

Oracle reports significant performance improvements for AI Vector Search on Exadata using AI Smart Scan.

5.1. General Acceleration Claims

Marketing and technical documents often cite acceleration of up to 30X or up to 32X compared to non-offloaded architectures or potentially older Exadata versions. These figures highlight the fundamental benefit of the offload mechanism.

5.2. Exadata X11M Platform Gains

Compared to the previous generation X10M, the latest Exadata X11M platform delivers specific gains :

Persistent Vector Index (IVF) Searches: Up to 55% faster due to intelligent storage offload.
In-Memory Vector Index (HNSW) Queries: Up to 43% faster.

These improvements reflect the combined effect of newer hardware (AMD EPYC processors) and software optimizations on X11M.

5.3. Software Optimizations (All Platforms)

Optimizations in Exadata System Software 25.1 benefit all modern Exadata platforms:

Data Filtering: 4.7X more data filtering capacity in storage servers.
BINARY Vectors: Queries using the newly supported BINARY format can run up to 32X faster than with FLOAT32 vectors, due to smaller size and faster distance computation.
BINARY Distance Computation: The distance calculation itself for BINARY vectors can be up to 40X faster than for FLOAT32.

5.4. Summary Table of Performance Claims

Performance Claim	Comparison Point / Context	Relevant Hardware / Software
Up to 30X faster AI Vector Search	Traditional architecture / Non-offload	Exadata (General) / ESS 24ai+
Up to 32X faster AI Vector Search	Traditional architecture / Previous X10M	Exadata Exascale / ESS 24ai+
Up to 55% faster IVF searches	Exadata X10M platform	Exadata X11M / ESS 25.1+
Up to 43% faster HNSW queries	Exadata X10M platform	Exadata X11M / ESS 25.1+
4.7X more data filtering	Previous software versions	All Exadata Platforms / ESS 25.1+
Up to 32X faster BINARY query	Queries with FLOAT32 vectors	All Exadata Platforms / ESS 25.1+
Up to 40X faster BINARY distance calc	Distance calc with FLOAT32 vectors	All Exadata Platforms / ESS 25.1+

The variety in performance claims highlights that actual gains depend on multiple factors: Exadata hardware generation, software version, vector type (FLOAT, INT8, BINARY), dimensionality, index type (IVF, HNSW), dataset size, and query complexity. Users should evaluate these nuances for their specific scenarios. The strong focus on AI Vector Search performance in recent Exadata X11M and ESS 25.1 announcements underscores its strategic importance for Oracle, positioning Exadata as a key platform for the growing AI/ML market.

6. Importance for Modern Database Applications and AI

Exadata’s accelerated AI Vector Search via AI Smart Scan has significant implications for modern applications, especially those driven by AI.

6.1. Enabling In-Database AI Applications

AI Smart Scan facilitates running AI logic directly within the Oracle database where the data resides, offering key advantages :

Reduced Data Movement: Minimizes costly, complex, and potentially insecure data transfers to external AI platforms.
Enhanced Performance: Processing data locally with offload capabilities significantly improves query latency for AI algorithms.
Improved Data Security: Sensitive data remains within the database boundary, protected by Oracle’s robust security features.
Simplified Architecture: Reduces the need for complex integrations between disparate data stores and AI tools.
Consistency: AI operations can benefit from database ACID guarantees.

6.2. Powering RAG (Retrieval-Augmented Generation) Architectures

Retrieval-Augmented Generation (RAG) enhances LLM responses by grounding them in external, often private or real-time, data. RAG mitigates LLM limitations like knowledge cut-offs and potential inaccuracies (“hallucinations”).

AI Smart Scan-accelerated AI Vector Search is ideal for the crucial retrieval step in RAG. It efficiently finds semantically relevant information (documents, records) within the Oracle database based on a user’s query (converted to a vector). This retrieved context is then fed to the LLM, enabling it to generate more accurate, relevant, and trustworthy responses based on specific enterprise data. This is vital for building reliable generative AI applications like chatbots and internal knowledge systems.

6.3. Driving Similarity Search Use Cases

AI Vector Search excels at finding semantically similar items beyond simple keyword matching. AI Smart Scan makes large-scale similarity searches efficient, enabling applications like:

Product/Content Recommendations: Suggesting items similar to user preferences.
Visual Search: Finding similar images/videos.
Document Similarity: Locating related documents in large corpora.
Fraud/Anomaly Detection: Identifying patterns similar to known fraud or deviations from normal behavior.
Customer Support: Matching queries to relevant knowledge base articles.
Bioinformatics/Medicine: Comparing medical images or symptoms to known cases.

In all these scenarios, AI Smart Scan’s offload capabilities ensure efficient execution on large datasets. This integration of high-performance vector processing within the Converged Database is a key differentiator for Oracle, simplifying architectures and potentially lowering TCO compared to using separate specialized databases. For RAG, performing the retrieval step securely within the database before potentially sending filtered context to an LLM is a critical advantage for protecting sensitive enterprise data.

7. Conclusion and Summary

7.1. Key Findings Summarized

AI Smart Scan Defined: A critical, Exadata-specific optimization accelerating AI Vector Search (introduced in Oracle DB 23ai) by offloading compute-intensive vector distance calculations and Top-K filtering to intelligent storage servers.
Offload Mechanism: Leverages Exadata hardware (RDMA, XRMEM, Flash Cache) to perform vector operations at memory speed near the data, drastically reducing network traffic and database server load.
Continuous Improvement (ESS 25.1): Enhancements include more efficient Top-K filtering, support for performant INT8/BINARY vector formats, and offloading of vector distance projection.
Performance Gains: Oracle claims significant speedups (up to 30X/32X) over non-offloaded methods. Exadata X11M offers further gains (up to 55% faster IVF, 43% faster HNSW vs. X10M), and software optimizations provide boosts like 32X faster queries with BINARY vectors. Actual results vary based on configuration and workload.
Importance for Modern AI: Enables efficient in-database AI, simplifies architectures, enhances security, and is crucial for high-performance RAG retrieval and various large-scale similarity search applications.

7.2. Future Outlook and Strategic Significance

Oracle’s ongoing investment signals the strategic importance of database AI. Future enhancements to AI Smart Scan might include broader operation offload and deeper integration with cloud architectures like Exascale.

The shift towards in-database AI promises more agile, efficient, and secure solutions by processing data where it resides. Exadata AI Smart Scan is central to Oracle’s Converged Database strategy, offering a compelling alternative to specialized vector databases by combining Exadata’s proven enterprise capabilities with cutting-edge AI workload acceleration.

The post AI Smart Scan on Oracle Exadata: Accelerating AI Vector Search for RAG and Similarity Search appeared first on Bugra Parlayan | Oracle Database & Exadata Blog.

Oracle Exadata’yı Hızlı Yapan Nedir? Bölüm 2

Bugra Parlayan — Mon, 22 Mar 2021 14:05:00 +0000

Active Storage – Cell Offload
Oracle Exadata Node üzerinde storage yada Cell üzerine Cell Offload Processing adını verdiği veri aktarma teknolojisine sahiptir. Bu teknoloji mevcut süreçlerin Storage üzerinde çalıştığı anlamını taşır. Piyasadaki diğer ürünler veri işlemeyi sunucu üzerinde gerçekleştirdiği için bu teknolojiye göre daha az performans gösterir. Ayrıca Oracle Exadata diğer veri tabanı sunucularına nazaran yüksek performanslar elde edebilmek için bir çok iş yükünü Cell üzerinde çalıştrmaktadır. Bu sayede diğer All-Flash diskler Exadata performansına erişemez.

Varsayılan olarak açık gelen bu özellik CELL_OFFLOAD_PROCESSING parametresi ile düzenlenmektedir. Cell Offload teknolojisinin bazı özellikleri aşağıdaki gibidir.

SQL Offload
XML & JSON Offload
RMAN Backup (BCT) Filtering
Data file vs. REDO I/O Segregation
Encryption/Decryption Offload
Fast Data File Creation

Massively Parallel Processing (MPP)
MPP teknolojisi çok sayıda işlemicininin yada birden fazla sunucunun aynı amaca hizmeti için tasarlanmış bir teknolojidir. Exadata üzerinde shared memory özelliğini kullandığınızda iş yükleri için tüm kaynaklar seferber edilebilir.

MPP teknolojisi yatay olarak büyüyebilir, Shared everything yapısına göre kayıp oluşmamaktadır. Shared nothing yapısına göre tek bir veri tabanı kavramı kullanıldığı için avantaj sağlar.

Bloom Filters
Exadata, Storage üzerindeki JOIN işlemlerini doğrudan işlemek yerine bunları kolaylaştıran Bloom filtrelerini kullanır. Filtre ölçüleri, ilgili verileri filtrelemek için Exadata Storage katmanına aktarılır. Sürecin son adımı RAC üzerinde yapılırken, asıl iş gücü Bloom Filtreleme mimarisine göre Cell üzerinde gerçekleşir.

Fast Node & Cell Death Detection
Tipik bir Cluster yapısı, ethernet üzerinden standart mimarisini kullanır. Bu tüm platformlarda desteklenmesi ve yüksek performans sağlamasına rağmen bazı dezavantajlar içerir. Exadata ilk sürümlerinden bügüne Cluster içeirisndeki beklemeleri engellemek ve Node’lar arasındaki hataları ekarte etmek için Maximum Availability Architecture ( MMA ) temel edinmiştir.

Yukarıdaki grafikte RAC üzerinde kopan bir node için bekleme süresin göstermektedir. MMA teknolojisine göre bunun belirlenmesi 30 saniye ve altında olurken, exadata dışındaki klasik sistemlerde 120 saniyeye kadar çıkabilmektedir.

Yukarıda da belirttiğimiz gibi Exadata entegre bir çözümdür. RAC üzerinde düşen node’ları belirlemek için Infiniband hızını kullanır. Aşağıdaki örnek bir grafik verilmiştir.

Large Write Caching & TEMP Performance

Exadata 128kb üstü verileri Flash Cache üzerinde saklayarak bu teknolojiden tam olarak yayarlanır. Bu mimari herzaman üretilen kapasite ile paralele yürümüştür. Bu sayede yeni çıkan her Exadata makinasında Flash Cache boyutuda aynı ölçüde artar.

Adaptive SQL Optimization
Bu özellikle sadece Exadata makinasına has olmamasına rağmen , Exadata için kritik bir öneme sahiptir.DWH ortamlarının çoğunun Exadata üzerine taşındığı günümüzde SQL ifadelerinin optimizasyonu son derece önemlidir. Exadata SQL çalışma planını her zaman üst düzey verimlilikte optimize eder. Planda bir değişiklik algılanırsa bunu hızlı bir şekilde düzenleme yeteneğine sahiptir.

Optimizing storage use and I/O through compression
Oracle Exadata Hybrid Columd adını verdiğini benzersiz bir sıkıştırma teknolojisi sunar. HCC teknolojisi ile büyük veri tabanlarında yüksek miktarda kapasite tasarrufu sağlanır. Bu teknoloji veri sıkıştırmada yenilikçi bir yaklaşımdır.

Mission Critical HA
Oracle Exadata yüksek erişilebilirlik mimarisine tam uyumlu olarak tasarlanmıştır.Her türlü arızlara, insan hatalarına ve kesintilere karşı yedekli olarak tasarlanmıştır.Bununla birlikte Dataguard mimarisine entegre olarak herhangi bir durumda oluşabilecek felaketlere karşı yüksek koruma sağlanmaktadır.

2 bölümde Exadata’yı hızlı yapan belirgin özellikleri dilimiz döndüğünce anlatmaya çalıştık. Diğer yazılarımızda görüşmek üzere.

The post Oracle Exadata’yı Hızlı Yapan Nedir? Bölüm 2 appeared first on Bugra Parlayan | Oracle Database & Exadata Blog.

Oracle Exadata’ yı Hızlı Yapan Nedir? Bölüm 1

Bugra Parlayan — Mon, 22 Mar 2021 14:04:00 +0000

Değerli Dostlar,

Bildiğiniz üzere Oracle Exadata makinasının X8M sürümü duyuruldu. Bir çok çevrede bu ürünün neden bu kadar pahalı yada verdiğimiz paranın hakkını veriyor mu? sorularını sık sık duyuyoruz. 2020 yılına girişimizle beraber , Ocak ayının 3. haftasında Oracle duayenlerinden Cem zorba özel bir etkinlikte ürün hakkında nefis bilgiler sundu. Bende yaptığım araştırmaları ve tecrübeleri dilim döndüğünce sizlerle paylaşmak istedim. Bu makalede Oracle Exadata’yı bu kadar iyi ve rakipsiz yapan şey nedir onu irdeleyeceğiz.

Oracle Exadata’nın günümüzde en çok kıyaslamayı IBM Power serisi ile yaşadığını görüyoruz. Bununla beraber Exadata Dell / EMC yada Purge Storage gibi All-Flash diskleri bulunan ürünlerlede karşılaştırıldığını görebilirsiniz.Bir çok karşılaştırma testinde Exadata makinasının ( özellikle makina olarak belirtiyorum ) bu ürünlere açık ara fark yarattığını fark edeceksiniz. Peki bu nasıl oluyor ?

Bütünleşik Donanım ve Yazılım
Öncelikle belirtmeliyim ki ilk fark aslında mimarinin temelinde ortaya çıkıyor. Exadata gerek yazılımı gerekse donanımı ile bütünleşik bir ürün olarak ortaya çıkarken rakipleri bir bileşen olarak kendini göstermektedir. Exadata makinası ilk duyurulduğu dönemlerde HP ürünlerini kabin içerisinde kullanırken, Sun Microsystem satın alması ile artık kendi stratejisi ile ürettiği ürünleri kullanmaktadır ve bileşenleri aşağıdaki gibidir.

Clustered Servers
Cluster Interconnect Network
Storage Network
Active Storage
Software

Yapının bütünleşik olması bize elma ile elmayı, armut ile armutu karşılaştırmamız gerektiğini gösterir. Örneğin Exadata makinasını bir IBM sunucu ile karşılaştırırsanız Exadata Cell yani storage’leri yok sayarsınız. Yine Exadata makinasını Full-Flash storage ürünleri ile karşılaştırırsanız bu seferde üzerinde bulunan sunucuları yok sayarsanız. Exadata makinası bir bütündür ve bütünü oluşturan parçalar sinerji oluşturarak bize fazlasını sağlar. Bu sebeple bir karşılaştırma yaparken aynı çizgide bütünleşik bir mimari ile karşılaştırılması gerekir.

Oracle Database ile tam entegrasyon
Oracle Database tartışmasız olarak dünyanın en iyi veri tabanı olduğu ortadadır ve neredeyse bütün platformlarda performanslı olarak çalışabilir. Özellikle açık kaynak sistemleri desteklemesi ile kazandığı işletim sistemi tecrübesi ve uyguladığı innavasyon günümüz sistemlerinin ilerisinde bir noktaya taşıyınca bütünleşik bir mimarinin gerekliliği ortaya çıktı. Exadata’nın ortaya çıkışı Oracle Database ve Oracle Linux ile birlikte tamamen bütünleşik bir yapı oluşturdu. Bu sayede birbiri ile konuşan tam stabil bir ürün ortaya çıktı.

Yüksek Kullanılabilirlik, Yedeklilik ve Ölçeklendirme
Exadata’nın yüksek kullanılabilirlik ve ölçeklenebilir yapısı ayrıca tamamen yedekli olarak çalışmaktadır. Sistem mimarisindeki bu gerçek, Exadata makinasındaki tasarımın temelini oluşturur. Örneğin Exadata üzerinde en az üç adet Exadata Cell bulunur. Ürünün üç adet olması bize öncelikli olarak yedekliliği ve kesinti yapmadan yama geçme yeteneği katar. Oracle Exadata üzerinde kesintisiz storage patch geçebilir yada üzerindeki mevcut storage’leri dönüşümlü olarak bakıma alabilirsiniz. Aynı zamanda üzerinde buluan 2 adet Node sayesinde Cluster yapısı gelmektedir.

Büyük Bellek Özellikleri ( DRAM )
Exadata büyük ve hız gerektiren veritabanı sunucuları arasında farklı olarak DRAM özelliğine sahiptir. X7-2 modeli ile beraber Nod başına 1.5TB DRAM ile birlikte gelir. Bununla her Nod yatay olarak 12TB kapasiteye kadar genişleyebilir. DRAM sayesinde veritabanı işlemleri fiziksel diskten önce bellek üzerinde işenir ve bu performans sağlar.

Exadata’yı özel kılan nedir ?
Yukarıda Oracle Exadata makinasındaki en belirgin özellikleri en basit hali ile aktarmaya çalıştım. Bununla birlikte alt özelliklerede değinmek gerekir.

SQL Offload
Active Storage – Cell Offload
Massively Parallel Processing (MPP) Design
Bloom Filters
Storage Indexes
High Bandwidth, Low Latency Storage Network
High Bandwidth, Low Latency Cluster Interconnect
Fast Node Death Detection
Smart Flash Cache
Write-Back Flash Cache
Large Write Caching & Temp Performance
Smart Flash Logging
Smart Fusion Block Transfer
NVMe Flash Hardware
Columnar Flash Cache
In-Memory Fault Tolerance
Adaptive SQL Optimization
Exadata Aware Optimizer Statistics
DRAM Cache in Storage

SQL Offload
Exadata ilk çıkışı ile beraber yaptığı yenilik SQL Offload olarak görücüye çıktı. Yukarıda bahsettiğimiz gibi Exatada’nın bütünleşik bir mimari olması ve Cell adını verdiğini storage yapısı rutin sorgulardaki işleyişi ters düz etti. Normal şartlarda veri tabanında istenilen veriler sunucu üzerinde işlenirken Exadata üzerinden çağırılan veriler cell üzerinden döner ve bu performans üzerinde ciddi kazanımlar sağlar.

Farklı bir değişle çağırdığınız sorgular Node üzerindeki değil, storage üzerindeki işlemcide işlenir.

High Bandwidth, Low Latency Storage Network
Exadata üzerinde barındırdığı ürünler arasında Infiniband teknolojisi ile haberleşir.Mevcut ağ yapısı Exadata makinasının konumlandığı ağ üzerinden izole şekilde çalıştığı için public olarak tanımlanmaktadır. Bu sayede güvenli bir iletişim sağlanır. Oracle tavsiyelerinde bu ağın genişletilmesini önermez.

Exadata içerisindeki Infiniband ağı saniyede 40 Gigabit olarak çalışır.DWH ortamlarında ve Büyük veri çalışmalarında bant genişliği önemlidir. Infiniband sayesinde Exadata makinası yüke binmeden veri trafiğinin üstesinden gelebilir.

Bununla birlikte Oracle veritabanı için REDO LOG kavramı hayati bir önem taşımaktadır. Redo Log yazma işlemleri I/O Wait ve commit işlemlerine duyarlı olduğu için storage üzerindeki işlemlerin maksimum seviyede olması istenir. Infiniband sayesinde bu süreç problemsiz giderilir.

DRAM Cache in Storage

Oracle Exadata veri erişiminde öncelikle DRAM teknolojisini kullanır. Mevcut veriler bu sayede ilk olarak Disk yerine DRAM üzerindeki önbellekten okunmaktadır. Exadata Smart Flash Cache hızına ek olarak 2.5 kat daha fazla verim elde edilir.

Yukarıdaki görsel örnekte görüldüğü üzere Exadata bellek yapısında minumum I/O ile yüksek performans sağlamaktadır. DRAM cache özelliğini daha geniş incelemek için aşağıdaki video takip edilebilir.

In-Memory Fault Tolerance
Oracle In-Memory özelliği duyurulduğu 12C sürümü ile birlikte bütün platformlarda desteklenmektedir. Bu özelliği Oracle Exadata makinasında çalıştırmanız bazı avantajları bulunur. In-Memory özelliğini Exadata üzerinde çalıştırmanız en önemli avantajı yine Exadata’ya özgü bellek içi hata töleransıdır.

In-Memory Fault Tolerance verilerin Exadata üzerinde koşan Oracle veri tabanı node’ları arasında etkili bir şekilde mirror edilebileceği anlamına gelir. Bu özellik geliştiricilere DRAM ön belleğini daha agrasif kullanma avantajlarını sağlatabilir.

Oracle Exadata makinasının fiziksel ve yapısal özelliklerini tanıttığımız bu yazı dizimizin sonuna geldik. Bölüm 2 de görüşmek üzere.

The post Oracle Exadata’ yı Hızlı Yapan Nedir? Bölüm 1 appeared first on Bugra Parlayan | Oracle Database & Exadata Blog.