Ivan Kartik - Oracle and Linux Blog

Installation article for 11g release 2 on Solaris x86_64

I was installing new (second) release of Oracle 11g database on Solaris x86 (Intel/AMD compatible) 64 bit version. Installation was fain, I had been facing just one error (.../root.sh: /usr/xpg4/bin/grep: not found) during execution of root.sh script at the end of the installation. As I was unable to find right package and during review of script (I found out there is not used grep -E) so I created symlink as workaround. As usual I made installation article which You can find on right menu of this page.

Rest in peace good old HTML version of Metalink!

Oracle had anounced retirement of old (HTML) version of Metalink what really scared me. I was asking my colleagues and friends on DBA position which version are they using or prefer. All answers (except one) were same: HTML. I understand that Oracle fell in love with "Open Laszlo" technology times ago but new "My Oracle Support" (a.k.a Flash version) is really piece of #%%^#$#$. I used old good HTML version until it's official death. Now I have no choice as since last weekend HTML version is working no longer and I must use "My Oracle Support". And I need to say it's a painful experience. My last login to this site took "only" 19 min. (I have really good internet connection) so "Benefits" listed on page as "Improved System Stability" or "Faster Problem Resolution" becomes funny to me. In case you were using direct link (http://www.oracle.com/technology/support/metalink/index.html) to Oracle Certification Matrix (which was accessible without need to login to Metalink) in your posts then keep in mind this application had disappeared too.

Exploring and playing with NAS Raidon SL3620-2S-LB2

I needed some lowcost NAS for Oracle especially for a shared storage for RAC. So I was searching for some product which would meet all my requirements such as Gigabit Ethernet, RAID 0 and 1 capability, EXT2, 3 support and of course support for SATA drives. I found Raidon SL3620-2S-LB2 and I bought it because it met all of my needs.

I putted in two SATA 750GB drives and created RAID 1 without problems as Web based console is very intuitive. I setted up the NFS mount for RAC but unfortunatelly I was unable to set specific parameters for RAC such as no_root_squash using Web console. This parameter is needed to allow to change ownership for files and directories (more info). So If you are installing RAC on NFS and you aren't able to create voting files(disks) during CRS installation or you are getting "chown: changing ownership of `...': Operation not permitted" then missing no_root_squash will be probaly the reason. So I checked (using nmap) which ports are open on NAS.


# nmap -nsS 192.168.1.1|egrep 'PORT|open'
PORT     STATE SERVICE
80/tcp   open  http
111/tcp  open  rpcbind
917/tcp  open  unknown
2006/tcp open  invokator
2049/tcp open  nfs

So no Telnet nor SSH service is running on my NAS. So I used one of my utilities to explore all directories and files on web server. Gotcha! I found http://192.168.1.1/cgi/telnet/telnet.cgi which is not able to find in Web Console. This page (script) can be used to enable or disable telnet daemon. Now we check the nmap output again:


# nmap -nsS 192.168.1.1|egrep 'PORT|open'
PORT     STATE SERVICE
23/tcp   open  telnet
80/tcp   open  http
111/tcp  open  rpcbind
917/tcp  open  unknown
2006/tcp open  invokator
2049/tcp open  nfs

Heureka, there is telnet service running. Now I could try to logon using telnet with credentials for "admin".


$ telnet 192.168.1.1
Trying 192.168.1.1...
Connected to 192.168.1.1.
Escape character is '^]'.

mystore login: admin
Password:

BusyBox v1.00-rc3 (2007.05.09-07:20+0000) Built-in shell (ash)
Enter 'help' for a list of built-in commands.

mystore> uname -a
Linux mystore 2.6.15 #136 Wed Jun 27 13:16:02 CST 2007 armv4l unknown

According to banner it is clear that this NAS is running on Linux (2.6.15 kernel) and well known BusyBox project http://www.busybox.net on ARM processor. So I could to use "vi" to modify /etc/exports file to add parameters for NFS export. But first I needed to login as root user which has the same password as admin.


mystore> cat /etc/exports
# /mnt/IDE1 *(rw,no_root_squash,no_all_squash,sync)
/mnt/md1/storage *(rw,no_root_squash)

Due this change I was able to install RAC succesfully. Of course I'm going deeper because I want to install SSH (I hate telnet) and ISCSI target. If my installation will be succesfull then I'll try to compile latest kernel to add OCFS2 and Brtfs (experimental but included since 2.6.28) support (I don't know yet whether it is possible to replace kernel). My next update may come soon...

New HW for my blog

My blog was unaccessible few days due HW failure (HDD has gone after power failure(...yes we are using UPS :-D )). Me and my friends made a decision to say bye to our old HW and say hello to new one. Old machine specification:


# cat /proc/cpuinfo |egrep 'processor|model\ name'
processor       : 0
model name      : AMD Athlon(tm) XP 2400+

# free|grep Mem|awk {'print $1$2'}
Mem:1034836

New machine specification:


# cat /proc/cpuinfo |egrep 'processor|model\ name'
processor       : 0
model name      : Intel(R) Xeon(TM) CPU 2.40GHz
processor       : 1
model name      : Intel(R) Xeon(TM) CPU 2.40GHz
processor       : 2
model name      : Intel(R) Xeon(TM) CPU 2.40GHz
processor       : 3
model name      : Intel(R) Xeon(TM) CPU 2.40GHz

# free|grep Mem|awk {'print $1$2'}
Mem:3115900

Personally I like AMD but you can see the difference...

Just in case of you will be facing this...

We hitted a nasty bug on 10.2.0.4 RAC and HP-UX 11.23 (Itanium). Once for a some time number of opened files reaches a system limit (defined by nfile). Current number of opened files you can check by using "glance" command ("system tables report" view) or by using "lsof" command which provides more detailed output. According to output from lsof we found that racgimon process has plenty of opened file descriptors on

$ORACLE_HOME/dbs/hc_.dat
and this number increases every 60 seconds. At the same time new error message is written to
$ORACLE_HOME/log/dwh1/racg/imon_.log
file. Here is text of error message:

2009-01-15 22:24:13.124: [    RACG][15][28099][15][ora...inst]:
GIMH: GIM-00104: Health check failed to connect to instance.
GIM-00090: OS-dependent operation:mmap failed with status: 12
GIM-00091: OS failure message: Not enough space
GIM-00092: OS failure occurred at: sskgmsmr_13
Is there a patch? Yes, this bug is known (see Doc ID 739557.1) and of course there is a patch for this bug: #7298531. Now you may ask so why you did this post? Answer is pretty simple because this patch may not work due some circumstances. And during try (due those circumstances) to use apply this patch CRS will not start after patch has been applied. Also rollback of this patch is not possible without tweaking of prerootpatch.sh script because this script expects correctly running CRS. In our case we still waiting for a working patch for our environment. Is there any workaround? Yes, at least two workarounds are possible. 1. Racgimon killer RAC Global Instance Monitor aka racgimon process is responsible for clusterwide health check. If this process will die it will be respawned/restarted again. Using this workaround we set the treshold for opened file descriptors by racgimon and if this limit was reached racgimon will be killed thus all file descriptors used by racgimon will be closed. So all we need is to create shell script and schedule it in cron (one execution for day is quite sufficient).

#!/usr/bin/bash

FD_TRESHOLD=20000           # Treshold for file descriptors opened by racgimon
LSOF=/usr/local/bin/lsof    # location of lsof command
FD_CURRENT=`$LSOF -c racgimon | wc -l`
RACGIMON_PIDS=`ps -aef | grep racgimon | grep -v grep | awk '{print $2}'`

if [ $FD_CURRENT -gt $FD_TRESHOLD ]; then
    for pid in $RACGIMON_PIDS; do
        kill -9 $pid
    done
fi
2. Remove instances from CRS applications Stop running instance and execute following command:

   srvctl remove instance -d DBNAME -i INSTANCE1
   
Start the instance. Repeat this step for other instances in the cluster. Note: From this moment, you can't use "srvctl" to start your instances, your instances will not startup automatically after reboot and instance(s) will disapear from "crs_stat" output.

Newer posts → Home ← Older posts