Asm Health Checker Found 1 New Failures -

Troubleshooting the "ASM Health Checker Found 1 New Failures" Alert

If you are managing an Oracle Database environment using Automatic Storage Management (ASM), encountering the alert "ASM health checker found 1 new failures" can be a jarring experience. This message is usually triggered by the Oracle Health Monitor (HM), a framework designed to detect and analyze components within the database and ASM instances.

When this alert surfaces in your alert log or monitoring dashboard (like Enterprise Manager), it means ASM has identified a specific issue that could potentially impact the availability or performance of your storage layer.

Here is a deep dive into what this error means, how to diagnose it, and the steps to resolve it. 1. Understanding the ASM Health Checker

The ASM Health Checker is part of the broader Oracle Health Monitor. It runs periodic checks—and can be triggered manually—to assess the integrity of:

ASM Metadata (Disk headers, File Directory, Alias Directory) Disk Group health Process responsiveness

When a "new failure" is reported, Oracle has logged a diagnostic entry into its ADR (Automatic Diagnostic Repository). The alert doesn't tell you the problem directly; it tells you that a report is waiting for your review. 2. Immediate Diagnostic Steps

To fix the failure, you first have to identify it. You can do this via the Command Line Interface (CLI) using ADRCI. Step A: Access ADRCI Log in to your grid infrastructure server and run: adrci Use code with caution. Step B: Set the Home Path

Check which home is reporting the error (usually the ASM home):

show homes set homepath diag/asm/+asm/+asm1 -- (Adjust based on your SID) Use code with caution. Step C: List the Failures

Run the following command to see the specific failure identified: list failure Use code with caution.

This will provide a Failure ID, the severity (CRITICAL or HIGH), and a brief description of what went wrong. 3. Common Causes for ASM Failures

While the "1 new failure" could technically be anything, it usually falls into one of these three categories: A. Disk Corruption or Metadata Inconsistency

The most common cause is an inconsistency in the ASM metadata. This can happen due to an unexpected power loss, a bug in the storage firmware, or "lost writes." The Fix: Run an internal ASM check. ALTER DISKGROUP CHECK ALL; Use code with caution. B. Offline Disks or Path Issues

If a path to a physical disk is lost (due to HBA failure or cable issues), ASM might mark the disk as "OFFLINE." If the diskgroup is still mounted but missing a member, the Health Checker will flag it.

The Fix: Check v$asm_disk to ensure all disks are ONLINE and HEADER_STATUS is MEMBER. C. Resource Exhaustion

Sometimes the failure is not about the disks themselves, but about the ASM instance’s ability to manage them—such as running out of processes or memory in the SGA. 4. How to Resolve the Failure

Once you’ve identified the Failure ID in ADRCI, you can ask Oracle for a repair advice: Advise on Failure: advise failure ; Use code with caution.

This will generate a report explaining the impact and recommending a script or manual action to fix it.

Execute Repair:If Oracle provides a repair script, you can run: repair failure; Use code with caution.

Note: Always back up your metadata and ensure you have a valid backup before running automated repair scripts on production storage. 5. Clearing the Alert

After the underlying issue is resolved (e.g., the disk is back online or the metadata is repaired), you need to "close" the failure in the ADR so the health checker stops reporting it. Inside ADRCI:

set homepath list failure -- Get the ID # After verifying the fix: change failure closed; Use code with caution.

The "ASM health checker found 1 new failures" alert is a call to action to check your storage integrity. By using ADRCI to drill down into the specific failure ID, you can move from a vague warning to a concrete resolution plan. asm health checker found 1 new failures

Pro Tip: Regularly monitor your v$asm_operation view. If you see long-running "REBAL" (rebalance) operations following a failure, ensure your ASM_POWER_LIMIT is set high enough to complete the recovery quickly without impacting database I/O.

Do you have the ADRCI output or the specific Failure ID from your logs? I can help you interpret the exact cause.

The alert "ASM Health Checker found 1 new failures" is a critical notification from Oracle's Automatic Storage Management (ASM) health monitoring system. It typically appears in the ASM alert logs or via automated email notifications when a storage-related incident is detected. Failure Overview

This specific message indicates that the Fault Diagnosability Infrastructure has identified a new incident in the Automatic Diagnostic Repository (ADR). While "1 new failure" is a generic count, it often points to one of the following underlying issues:

Disk Group Instability: A disk may have failed, leading to a loss of redundancy or a disk group being forced to dismount.

Metadata Corruption: Corruption in ASM metadata blocks (typically within the first 250 blocks) detected during routine operations or rebalancing.

Rebalance Failures: An error occurring during the addition or removal of disks, often accompanied by background process (ARB0) alerts.

Resource State Changes: CRS (Cluster Ready Services) resources moving to an INTERMEDIATE or OFFLINE state due to storage latency or connectivity issues. Immediate Diagnostic Actions

To identify the exact cause, execute the following steps within your environment:

Check the ADRCI Utility:Use the ADR Command Interpreter (ADRCI) to list the details of the specific failure. adrci> list failure Use code with caution. Copied to clipboard

This command provides a unique Failure ID and a description of the problem.

Inspect ASM Alert Logs:Locate the log file (usually in the trace directory of your Oracle Base) to see the events leading up to the "1 new failure" message. Look for: ORA-15xxx errors (ASM-specific).

SUCCESS: ALTER DISKGROUP... followed by immediate GMON dumping or failure notes.

Run Data Recovery Advisor:If the failure involves data loss or disk group mounting issues, use RMAN to get a repair recommendation: RMAN> list failure; RMAN> advise failure; Use code with caution. Copied to clipboard

Query V$ Views:Verify the status of your disks and current operations:

Disk Status: SELECT name, path, mount_status, header_status, state FROM v$asm_disk;

Active Operations: SELECT operation, state, est_minutes FROM v$asm_operation; Common Remediation Steps KB88485 - My Oracle Support

Decoding the Alert: "ASM Health Checker Found 1 New Failures" – Causes, Fixes, and Prevention

If you manage Oracle Grid Infrastructure (GI) or a standalone Automatic Storage Management (ASM) instance, one notification can send a chill down your spine: "ASM health checker found 1 new failures."

This message, often found in your alert log, crsd.log, or email alerts from Enterprise Manager (EM12c/13c), indicates that the automated ASM Health Checker has detected a new issue affecting the integrity, availability, or performance of your ASM environment. Ignoring it is not an option; unresolved failures can lead to disk group mount issues, I/O latency, or even database crashes.

This article provides a 360-degree breakdown of this alert: what triggers it, how to diagnose the root cause, step-by-step repair procedures, and long-term prevention strategies.


What "ASM Health Checker found 1 new failures" means


Essay: “asm health checker found 1 new failures” — diagnosis, causes, and remediation

Introduction The terse message “asm health checker found 1 new failures” appears straightforward but carries significant operational weight: it signals that an ASM (Automatic Storage Management, or a similarly named subsystem) health-check routine has detected a failure. Whether that ASM is Oracle ASM, a cloud Autoscaling/Service Mesh monitor, or a custom “Application Service Monitor,” the phrasing implies an automated health-scan discovered one additional fault relative to its prior baseline. This essay examines the message’s possible meanings, root causes, investigative approach, risk implications, and systematic remediation and prevention strategies. The aim is to move from alarm to actionable resolution, and from reactive fixes to durable system hardening.

  1. Interpreting the message
  1. Immediate triage checklist (first 15–60 minutes)
  1. Root-cause analysis (systematic approach)
  1. Common root causes and how they manifest
  1. Remediation steps (concrete actions)
  1. Validation and recovery verification
  1. Post-incident actions (SRE-style)
  1. Design considerations for health checkers to reduce false positives and improve signal
  1. Risk assessment and business impact

Conclusion “asm health checker found 1 new failures” is more than a log line: it is an early warning. Responding effectively requires prompt triage, methodical diagnosis, and decisive remediation—combined with post-incident learning and engineering improvements to reduce recurrence. By classifying possible causes (storage, probe, resource, network, regression, auth), following a disciplined RCA approach, and implementing monitoring and automation best practices, teams can convert such alerts from frightening unknowns into manageable events and steadily improve system resilience.

Appendix: Minimal quick runbook (steps to execute immediately) Troubleshooting the "ASM Health Checker Found 1 New

  1. Capture the alert details and correlate logs/metrics.
  2. Identify the affected resource (disk/pod/node/service).
  3. Attempt a manual probe/connection to reproduce failure.
  4. If production-impacting, trigger failover/scaleup and notify on-call.
  5. Apply targeted remediation (replace disk, fix probe, rollback deployment).
  6. Verify health across multiple intervals; monitor for recurrence.
  7. Create postmortem and assign permanent fixes.

— End —

Troubleshooting Guide: ASM Health Checker Found 1 New Failure

If you are managing an Oracle database environment and receive the alert "ASM Health Checker found 1 new failure," it’s time to pay attention. While Oracle Automatic Storage Management (ASM) is robust, this specific notification indicates that the internal diagnostic framework has detected an issue that could potentially impact disk group availability or performance.

Here is a comprehensive breakdown of what this error means, how to diagnose it, and the steps to resolve it. 1. Understanding the ASM Health Checker (CHMA)

The ASM Health Checker is part of the Oracle Check Framework. It runs periodic checks on the ASM instance, disk groups, and metadata to ensure everything is operating within healthy parameters.

When it reports a "new failure," it means a specific "check" (such as disk connectivity, metadata consistency, or space usage) has moved from a PASS to a FAIL state. 2. Immediate Step: Identify the Failure

The alert itself is generic. To find out what actually failed, you need to query the ASM instance. Run this SQL command in your ASM instance:

SELECT check_name, failure_pri, status, repair_script FROM v$asm_healthcheck_status WHERE status = 'FAILED'; Use code with caution. Common culprits include:

Disk Offline: One or more disks in a disk group are no longer accessible.

Metadata Corruption: Inconsistencies in the ASM metadata (e.g., File Directory or Disk Directory).

Space Issues: A disk group is nearing 100% capacity, risking an instance crash.

Stale Quorum: Issues with voting files in a CRS/Grid Infrastructure environment. 3. Deep Dive into the Logs

To get the granular details, look at the ASM Alert Log. You can usually find this in your Oracle Base directory:$ORACLE_BASE/diag/asm/+asm/+asm1/trace/alert_+asm1.log

Search for the timestamp of the alert. You will often see a corresponding ORA- error code (like ORA-15078 or ORA-15032) that provides the exact technical reason for the health check failure. 4. How to Resolve the Failure Scenario A: Disk Connectivity Issues

If the health checker found a disk failure, check the OS-level connectivity. Command: lsdsk (within ASMCMD) or fdisk -l (Linux).

Fix: If a disk is "OFFLINE," try to online it using:ALTER DISKGROUP ONLINE DISK ; Scenario B: Metadata Inconsistency

If the health check indicates metadata issues, you may need to run a manual check on the disk group.

Action: Execute the CHECK command:ALTER DISKGROUP CHECK ALL;Note: This checks for consistency but does not fix errors. If errors are found, you may need to involve Oracle Support. Scenario C: Space Pressure

If the failure is related to "Insufficient Space," rebalance the disk group or add new disks immediately.

Action: Check free space:SELECT name, free_mb, total_mb, usable_file_mb FROM v$asm_diskgroup; 5. Clearing the Alert

Once you have fixed the underlying physical or logical issue, the Health Checker should automatically update during its next run. However, if the status remains "Failed" in the views, you can manually trigger a re-run of the health check or use ADRCI to purge the alert. Summary Checklist

Query v$asm_healthcheck_status to identify the specific check. Review the ASM Alert Log for specific ORA-error codes.

Verify Physical Disks at the OS level to ensure no hardware failure. What "ASM Health Checker found 1 new failures" means

Check Disk Group Capacity to ensure you haven't hit a "disk full" state.

By catching these "1 new failures" early, you prevent minor disk hiccups from turning into major database outages.

The error message "ASM Health Checker found 1 new failures" typically appears in the Oracle ASM alert logs when the system detects an issue with a disk or disk group

. This message indicates that a failure has been logged in the Automatic Storage Management (ASM) health check framework, often related to disk group dismounts, header corruption, or voting file issues. Oracle ASM Health Check Failure Report Report Field Description / Details Alert Message ASM Health Checker found 1 new failures System Component Oracle Automatic Storage Management (ASM) Detection Source ASM Alert Log (typically located at diag/asm/+asm//trace/alert_+asm.log Incident Status

(Requires immediate investigation to prevent data loss or service disruption) Potential Causes & Findings Disk Group Dismount

: A disk group may have been forced to dismount due to lost connectivity or multiple disk failures in a failure group. Disk Header Corruption

: The metadata (headers) on one or more ASM disks may be corrupted or in a "FORMER" or "PROVISIONED" status instead of "MEMBER". Voting File Issues

: If the ASM disk group hosts the Cluster Registry (OCR) or Voting Disks, a failure can cause node evictions or cluster instability. Storage Latency/I/O Timeouts

: The health checker may trigger a failure if it waits too long (e.g., >15 seconds) for I/O operations to complete on a specific disk. Oracle Forums Recommended Troubleshooting Steps

ASM Health Checker Found 1 New Failure: What It Means and How to Resolve It

The Automatic Storage Management (ASM) health checker is a crucial tool in Oracle databases that monitors the health and integrity of the storage infrastructure. When the ASM health checker reports a new failure, it's essential to understand the implications and take corrective actions to prevent data loss or system downtime. In this blog post, we'll discuss what an ASM health checker failure means, how to investigate the issue, and steps to resolve it.

What does an ASM health checker failure mean?

When the ASM health checker detects a problem, it logs an error message indicating that a failure has been detected. The message may look like this:

"ASM health checker found 1 new failure"

This message indicates that the ASM health checker has detected a single failure in the storage system. The failure could be related to various issues, such as:

Investigating the ASM health checker failure

To investigate the failure, follow these steps:

  1. Check the ASM alert log: The ASM alert log provides detailed information about the failure, including the error message, timestamp, and affected disk group. You can find the alert log in the $ORACLE_BASE/diag/asm/+ASM/<instance_name>/trace directory.
  2. Run the asmcmd command: The asmcmd command-line tool provides a comprehensive view of the ASM configuration and status. Run asmcmd with the lsdg option to list the disk groups and their status: asmcmd ls dg
  3. Check the disk group status: Use the asmcmd command with the dg option to check the status of the affected disk group: asmcmd dg <disk_group_name>

Resolving the ASM health checker failure

Once you've identified the root cause of the failure, take corrective actions to resolve the issue:

  1. Replace a failed disk: If the failure is due to a disk error, replace the disk and re-add it to the ASM disk group.
  2. Check and correct connectivity: Verify that the storage connections are stable and functioning correctly.
  3. Free up disk space: If the failure is due to insufficient disk space, free up space by deleting unnecessary files or expanding the disk group.
  4. Reconfigure ASM: If the failure is due to an ASM configuration error, reconfigure ASM with the correct settings.

Best practices to prevent ASM health checker failures

To minimize the likelihood of ASM health checker failures:

  1. Regularly monitor ASM alerts: Regularly check the ASM alert log and respond promptly to any errors or warnings.
  2. Perform routine maintenance: Regularly perform routine maintenance tasks, such as checking disk space and replacing failed disks.
  3. Test and validate ASM configurations: Test and validate ASM configurations to ensure they are correct and optimal.

By understanding the causes of ASM health checker failures and taking proactive steps to prevent them, you can ensure the reliability and performance of your Oracle database storage infrastructure.

An "ASM health checker found 1 new failures" message in Oracle (AHF/ORAchk) signals a logged incident in the Automatic Diagnostic Repository (ADR), often caused by disk connectivity issues, failed rebalances, or metadata corruption. Immediate investigation requires using ADRCI to identify the specific incident and checking V$ASM_DISK for failed or dropped disks. Detailed diagnostic procedures are available from Oracle Help Center at Oracle Help Center.

4. Mismatched Disk Group Compatibility

If compatible.asm, compatible.rdbms, or compatible.advm values are set incorrectly relative to the GI version, the health checker will report advisories as failures.