About me

Name: Ivan Kartik
Location: Bratislava, Slovakia
I'm working as Senior Database Administrator in Bratislava, Slovak Republic. My interests are in RDBMS mainly Oracle, Unix (like) operating systems and in free time I am watching or playing ice-hockey, also I like to play golf.

[contact me]


Oracle (favourite) links

Oracle Technology Network
OTN Forums
Oracle Documentation
Ask Tom


Linux (favourite) links

Linux.com Portal
Linux section on OTN
The Linux Documentation Project


Favourite Blogs

Nicolas Gasparotto
Hans Forbrich
Jonathan Lewis
Frits Hoogland
H.Tonguç YIlmaz
Laurent Schneider
Christopher Jones
Jeff Hunter
Oracle WTF


My install articles

9i
Oracle 9i(R2) on Fedora 2,3,4,5,6
Oracle 9i(R2) on Enteprise Linux 4
Oracle 9i(R2) on SuSE 9.x,10.1
10g
Oracle 10g(R2) on EL and RH EL 3,4,5
Oracle 10g(R2) on Fedora 2,3,4
Oracle 10g(R2) on SuSE 9.x
Oracle 10g(R2) on Solaris 10 x86
11g
Oracle 11g(R1) on EL and RH EL 4,5
Oracle 11g(R1) on SLES10 and OpenSuSE
Oracle 11g(R2) on Solaris x86(64)


Downloads

rlwrap for Fedora (x86)
rlwrap for Redhat (x86)
rlwrap for SuSE (x86)
rlwrap for Redhat (x86_64)
rlwrap for Suse (x86_64)
rlwrap for Solaris 10 (x86)
readline for Solaris 10 (x86)
rlwrap for Solaris (SPARC 64)
readline for Solaris (SPARC 64)


Archives

January 2010
December 2009
November 2009
February 2009
January 2009
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
September 2007
August 2007
July 2007
April 2007
March 2007
February 2007
December 2006
November 2006
October 2006
September 2006
July 2006



Just in case of you will be facing this...

We hitted a nasty bug on 10.2.0.4 RAC and HP-UX 11.23 (Itanium). Once for a some time number of opened files reaches a system limit (defined by nfile).
Current number of opened files you can check by using "glance" command ("system tables report" view) or by using "lsof" command which provides more detailed output.
According to output from lsof we found that racgimon process has plenty of opened file descriptors on

$ORACLE_HOME/dbs/hc_.dat
and this number increases every 60 seconds. At the same time new error message is written to
$ORACLE_HOME/log/dwh1/racg/imon_.log
file.

Here is text of error message:

2009-01-15 22:24:13.124: [ RACG][15][28099][15][ora...inst]:
GIMH: GIM-00104: Health check failed to connect to instance.
GIM-00090: OS-dependent operation:mmap failed with status: 12
GIM-00091: OS failure message: Not enough space
GIM-00092: OS failure occurred at: sskgmsmr_13

Is there a patch? Yes, this bug is known (see Doc ID 739557.1) and of course there is a patch for this bug: #7298531.

Now you may ask so why you did this post?
Answer is pretty simple because this patch may not work due some circumstances.
And during try (due those circumstances) to use apply this patch CRS will not start after patch has been applied.
Also rollback of this patch is not possible without tweaking of prerootpatch.sh script because this script expects correctly running CRS.
In our case we still waiting for a working patch for our environment.

Is there any workaround?
Yes, at least two workarounds are possible.

1. Racgimon killer
RAC Global Instance Monitor aka racgimon process is responsible for clusterwide health check. If this process will die it will be respawned/restarted again.
Using this workaround we set the treshold for opened file descriptors by racgimon and if this limit was reached racgimon will be killed thus all file descriptors used by racgimon will be closed. So all we need is to create shell script and schedule it in cron (one execution for day is quite sufficient).


#!/usr/bin/bash

FD_TRESHOLD=20000 # Treshold for file descriptors opened by racgimon
LSOF=/usr/local/bin/lsof # location of lsof command
FD_CURRENT=`$LSOF -c racgimon | wc -l`
RACGIMON_PIDS=`ps -aef | grep racgimon | grep -v grep | awk '{print $2}'`

if [ $FD_CURRENT -gt $FD_TRESHOLD ]; then
for pid in $RACGIMON_PIDS; do
kill -9 $pid
done
fi


2. Remove instances from CRS applications
Stop running instance and execute following command:

srvctl remove instance -d DBNAME -i INSTANCE1

Start the instance.
Repeat this step for other instances in the cluster.

Note: From this moment, you can't use "srvctl" to start your instances, your instances will not startup automatically after reboot and instance(s) will disapear from "crs_stat" output.

Posted on Thursday, January 22, 2009 Comments [4]