Friday, May 29, 2009

Fast-Start Failover on Oracle Error Conditions

Using Oracle 11g Data Guard configuration you can initiate Fast-Start Failover
based on different Oracle errors - Configure Conditions for Fast-start Failover

I decided to test it and created DG configuration using not only documentation
but running "ENABLE FAST_START FAILOVER" command, getting errors :) ,
looking into drc...log files (it is 11g, they are under trace directory),
correcting problems like: standby redo log files, flashback mode, increasing
Fast-Start Failover Lag Limit and finally got SUCCESS status for
"SHOW CONFIGURATION VERBOSE":
DGMGRL> show configuration verbose

Configuration
Name: bee
Enabled: YES
Protection Mode: MaxPerformance
Databases:
bee - Primary database
beesby - Physical standby database
- Fast-Start Failover target

Fast-Start Failover: ENABLED
Threshold: 30 seconds
Target: beesby
Observer: oel1.pythian
Lag Limit: 30 seconds
Shutdown Primary: TRUE
Auto-reinstate: TRUE

Current status for "bee":
SUCCESS

To see conditions which are configured to initiate FSF you need to execute
another command (I usually run help command and do copy/paste
of commands that I need):
DGMGRL> SHOW FAST_START FAILOVER;

Fast-Start Failover: ENABLED
Threshold: 30 seconds
Target: beesby
Observer: oel1.pythian
Lag Limit: 30 seconds
Shutdown Primary: TRUE
Auto-reinstate: TRUE

Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Offline YES

Oracle Error Conditions:
(none)

There were predefined logically named health conditions for Fast-Start Failover
and no Oracle Error Conditions.

Using help command and provided syntax I set FSF for ORA-00001 error (!)
that means failover should be initiated after violation of unique constraint
(That's terrific condition!)
DGMGRL> ENABLE FAST_START FAILOVER CONDITION 1;
Succeeded.

alert.log also mentioned it but in decoded manner:
OS Pid: 2964 executed alter system set events '1 trace name failover level 1'

where PID=2964 belonged to DMON process. The error now is on conditions list:
DGMGRL> SHOW FAST_START FAILOVER;

Fast-Start Failover: ENABLED
Threshold: 30 seconds
Target: beesby
Observer: oel1.pythian
Lag Limit: 30 seconds
Shutdown Primary: TRUE
Auto-reinstate: TRUE

Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Offline YES

Oracle Error Conditions:
ORA-00001: unique constraint (%s.%s) violated

The only thing left wass to raise the error - I created the table with PK and tried to insert
duplicated value, got the error:
SQL> insert into t2 values(1);
insert into t2 values(1)
*
ERROR at line 1:
ORA-00001: unique constraint (AG.SYS_C009503) violated


And failover was started:
alert.log on primary
...
OS Pid: 2964 executed alter system set events '1 trace name failover level 1'
Wed May 27 19:30:58 2009
Foreground: user-configured ORA-1 requesting FSFO
Wed May 27 19:31:03 2009
A user-configurable Fast-Start Failover condition was detected. The primary is shutting down due to ORA-00001: unique constraint (%s.%s) violated.
Database bee will not be automatically reinstated.
USER (ospid: 3587): terminating the instance
Instance terminated by USER, pid = 3587
...

alert.log on standby
...
Attempting Fast-Start Failover due to detection of condition ORA-00001: unique constraint (%s.%s) violated.
Database bee will not be automatically reinstated.
...
Failover succeeded. Primary database is now beesby.
...

observer output
...
19:31:05.81 Wednesday, May 27, 2009
Initiating Fast-Start Failover to database "beesby"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "beesby"
19:31:19.15 Wednesday, May 27, 2009
...

Wonderful, failover was done and now I had to reinstate the standby

And that was completed but through several steps (I did not use static configuration of global database name
in listener - that might was the cause of failure to accomplish it at once - I need to check it):

  1. startup new standby database in mount mode (bee)
  2. REINSTATE DATABASE bee (from dgmgrl connected as sys/sys from beesby environment)
  3. startup standby database in mount mode (bee)
  4. REINSTATE DATABASE bee (from dgmgrl connected as sys/sys from beesby environment)
=== 1 ===
SQL>
startup mount
...
Database mounted.

=== 2 ===
[oracle@oel1 beesby ~]$ dgmgrl sys/sys
DGMGRL for Linux: Version 11.1.0.7.0 - Production

Copyright (c) 2000, 2005, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL>
REINSTATE DATABASE bee
Reinstating database "bee", please wait...
Operation requires shutdown of instance "bee" on database "bee"
Shutting down instance "bee"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "bee" on database "bee"
Starting instance "bee"...
Unable to connect to database
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

Failed.
You are no longer connected to ORACLE
Please connect again.
Unable to start instance "bee"
You must start instance "bee" manually
Reinstatement of database "bee" failed
DGMGRL> exit

=== 3 ===
SQL> startup mount
...
Database mounted.

===4 ===
[oracle@oel1 beesby ~]$ dgmgrl sys/sys
DGMGRL for Linux: Version 11.1.0.7.0 - Production

Copyright (c) 2000, 2005, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL>
REINSTATE DATABASE bee
Reinstating database "bee", please wait...
Reinstatement of database "bee" succeeded


DGMGRL>
show configuration verbose

Configuration
Name: bee
Enabled: YES
Protection Mode: MaxPerformance
Databases:
beesby - Primary database
bee - Physical standby database
- Fast-Start Failover target

Fast-Start Failover: ENABLED
Threshold: 30 seconds
Target: bee
Observer: oel1.pythian
Lag Limit: 30 seconds
Shutdown Primary: TRUE
Auto-reinstate: TRUE

Current status for "bee":
SUCCESS


Have a good day!

Thursday, May 28, 2009

TNS names parsing in Data Guard configurations

Building DG configurations for Oracle 10g and 11g databases
I noticed that 11g does not do transformation of tns name
provided at stage of DG creation with regard to Oracle 10g

That might be useful if you need to tune some parameters
of SQL*Net - SDU, RECV_BUF_SIZE, SEND_BUF_SIZE

But some problems can be hidden if full string of connection
is not shown but dr....log can shed some light on them.


Have a good day!

Sunday, May 17, 2009

Top 10 topics for Data Guard on Metalink by usage

Metalink Knowledge Base Data Guard top 10 notes by usage

  1. 10.2 Data Guard Physical Standby Switchover(751600.1)
  2. Data Guard Support for Heterogeneous Primary and Standby Systems in Same Data Guard Configuration (413484.1)
  3. Applying Patchset with a 10g Physical Standby in Place (278641.1)
  4. The Gains and Pains of Nologging Operations (290161.1)
  5. Steps To Create Physical Standby Database (736863.1)
  6. Oracle10g: Data Guard Switchover and Failover Best Practices (387266.1)
  7. MAA - Creating a RAC Physical Standby for a RAC Primary (380449.1)
  8. Script to Collect Data Guard Primary Site Diagnostic Information (241374.1)
  9. MAA - Data Guard Redo Transport and Network Best Practices (387174.1)
  10. Creating a 10gr2 Data Guard Physical Standby database with Real-Time apply (343424.1)
As you can see the 2nd place is for Heterogeneous standbies and also MAA
best practices are definitely on the edge of usage.
I think it shows that standby and Data Guard configuration are used not only to
increase availability of systems but also to migrate from one platform to another
as for example it is possible to migrate from Windows 32-bit to Linux 64-bit
with Oracle 11g and minimal downtime.

Have a good day!

Friday, May 1, 2009

silent installation of agent 10.2.0.5

I recently installed agents 10.2.0.5 on SunOS (SPARC) 5.9 and Linux x86-64
silently and noted the same problem during secure command - the screen
had being erased if password was not sent as a parameter
in emctl secure agent command.

If a password is there - everything is fine, agent was secured (there is less
information about completion in output for that version of agent)

Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
Agent is already stopped... Done.
Securing agent... Started.
Securing agent... Successful.


But I have to mention that I spent several hours trying
to install agent silently (on Solaris 9) and it was failing:

./runInstaller -silent -responseFile /downloads/agent10.2.0.5/solaris/response/additional_agent.rsp \
FROM_LOCATION="/downloads/agent10.2.0.5/solaris/agent/stage/products.xml" \
BASE_DIR="/opt/app/oracle/agent10g" \
SHOW_ROOTSH_CONFIRMATION=true \
b_doAgentConfig=false \
SHOW_ROOTSH_CONFIRMATION=true \
b_doAgentConfig=false \
METALINK_USERNAME=""
Starting Oracle Universal Installer...

Checking installer requirements...

Checking operating system version: must be 5.8, 5.9 or 5.10. Actual 5.9
Passed

Checking Temp space: must be greater than 250 MB. Actual 14145 MB Passed
Checking swap space: must be greater than 500 MB. Actual 16220 MB Passed

All installer requirements met.

Preparing to launch Oracle Universal Installer
from /tmp/OraInstall2009-04-30_01-11-35AM.
Please wait ...
Please specify a Unique Oracle Base directory,
or remove previously installed products and their directories.
Directory name should contain only valid alphanumeric characters,
including '_', or '.' characters.


But the directory
/opt/app/oracle/agent10g did not even exist there
and there was no entry about that oracle home neither in inventory.xml
nor in oratab

Resolution was simple - I created a copy of original response file,
removed comments and empty rows, changed parameters accordingly
the environment and it worked!. The installation had been completed,
root.sh had been executed(new requirement came with 10.2.0.5)
and everything was completed.

/tmp/additional_agent.rsp

RESPONSEFILE_VERSION=2.2.1.0.0
UNIX_GROUP_NAME="oinstall"
FROM_LOCATION="../stage/products.xml"
BASEDIR="/opt/app/oracle"
b_upgrade=false
SHOW_WELCOME_PAGE=false
SHOW_NODE_SELECTION_PAGE=false
SHOW_CUSTOM_TREE_PAGE=false
SHOW_COMPONENT_LOCATIONS_PAGE=false
SHOW_SUMMARY_PAGE=false
SHOW_INSTALL_PROGRESS_PAGE=false
SHOW_REQUIRED_CONFIG_TOOL_PAGE=false
SHOW_CONFIG_TOOL_PAGE=false
SHOW_RELEASE_NOTES=false
SHOW_ROOTSH_CONFIRMATION=false
SHOW_END_SESSION_PAGE=false
SHOW_EXIT_CONFIRMATION=false
NEXT_SESSION=false
NEXT_SESSION_ON_FAIL=false
DEINSTALL_LIST={"oracle.sysman.top.agent","10.2.0.5.0"}
SHOW_DEINSTALL_CONFIRMATION=false
SHOW_DEINSTALL_PROGRESS=false
b_silentInstall=true
b_doAgentConfig=false
SHOW_XML_PREREQ_PAGE=false
SHOW_END_OF_INSTALL_MSGS=false
ACCEPT_LICENSE_AGREEMENT=true
TOPLEVEL_COMPONENT={"oracle.sysman.top.agent","10.2.0.5.0"}
SHOW_SPLASH_SCREEN=false
SELECTED_LANGUAGES={"en"}
COMPONENT_LANGUAGES={"en"}
DECLINE_SECURITY_UPDATES=true
METALINK_USERNAME=""
sl_OMSConnectInfo={"grid","4889"}

./runInstaller -silent -responseFile /tmp/additional_agent.rsp


Worth to mention that port should be http for OMS connection
even if OMS is secured (1159 by default)

I hope your installation will go smooth and quick

Have a good day!