Thursday, August 7, 2008

Killing health check

Do you think health check can kill?
Well... at least it can make oracle agent "not running" and left hung
agent processes.

It is not a common situation for all versions of agents and databases.
The problem affects oracle agent 10.2.0.4 and either 10.2.0.3 or
11.1.0.6 databases on Linux x86-64 platform.

The symptoms like this: you just upgraded you agents up to
10.2.0.4 version, saw fine ("green") status in Grid Control
and later (or next day) you see agents down and not running.
What happened?

Most likely, you hit the problem described in Note:566607.1
and agent opened too many hc_database_name.dat files
(which are in $ORACLE_HOME/dbs) and crashed.

That is because it was trying to open file using
a slightly different structure to access health check files
(1544 bytes instead of 1552 as it was before) but failed
and left file used. It continues until "too many files opened" error.

As a remedy patch for Bug 5872000 can be applied to database
or health check of instance can be disabled in Grid Control.

Probably the best one will be to apply patch but
it requires to bounce database and that makes me thinking
that I should prepare upgrades even of agents
carefully since it can even lead to patch a database!


Have a good day!

No comments: