Ivan Kartik - Oracle and Linux Blog

ACFS issue with latest UEK3 kernels on OEL 7/RHEL 7

One of my colleagues has been facing to weird behaviour of ACFS during installation of two nodes RAC on RHEL 7 and OEL7 (started on RHEL, then tried OEL). ACFS volume creation (during mkfs execution) one of the nodes has either hung or thrown error like this:

mkfs.acfs: version                   = 12.1.0.2.0
mkfs.acfs: on-disk version           = 39.0
mkfs.acfs: volume                    = /dev/asm/goldengate-32
mkfs.acfs: CLSU-00100: operating system function: ioctl failed with error data: 1
mkfs.acfs: CLSU-00101: operating system error message: Operation not permitted
mkfs.acfs: CLSU-00103: error location: OI_0
mkfs.acfs: ACFS-00546: failed to change on-disk signature
mkfs.acfs: ACFS-01004: /dev/asm/goldengate-32 was not formatted.

After couple of days of his struggling I decided to help him to find out the problem because my colleague is very experienced thus I knew that he has done everything correctly (well, definitely it wasn't his first rodeo). Moreover I have done similar installation just few days before, without any problem. And that was interesting. Firstly I checked logs and of course configuration (udev, multipath, etc.), it had been correct and traced (using one of my closest friends since 2000) that process with this result:

2216  stat("/dev/asm/goldengate-34", {st_mode=S_IFBLK|0770, st_rdev=makedev(251, 17409), ...}) = 0
2216  open("/dev/ofsctl", O_RDWR)       = 9
2216  ioctl(9, 0xffffffffc1387015, 0x7ffdbee52880) = -1 EPERM (Operation not permitted)

 And this was really weird as permissions for these devices are defined by Udev configuration which is created during installation.

Gathered all important information about environment I've created an action plan. Installation of the same environment on my virtual machine. Well I'd rather say almost the same as I decided to not install the clustered environment in first step and of course used hardware was different and hypervisor has been as well. Version of all software was pretty much same as in original (my colleague's) environment, except one so important thing - Linux kernel. I decided to use original (non UEK) kernel first.
Installation went smoothly, ACFS had smooth configuration and later functionality too, so two options (or unanswered question) remained - whether it's kernel related or RAC related problem where the first option was my preference according to previous tracing.
So I've installed exact version (3.8.13-118.6.2.el7uek) of Unbreakable Kernel provided by Oracle just as my colleague did. And here is the result (Note that it was an intention to perform all steps manually):

$ uname -a
Linux el1 3.8.13-118.6.2.el7uek.x86_64 #2 SMP Thu May 19 13:15:51 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux
$ acfsdriverstate supported
ACFS-9200: Supported

# acfsroot install
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9154: Loading 'oracleoks.ko' driver.
ACFS-9154: Loading 'oracleadvm.ko' driver.
ACFS-9154: Loading 'oracleacfs.ko' driver.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
ACFS-9309: ADVM/ACFS installation correctness verified.

# lsmod | grep ora
oracleacfs           3498177  0
oracleadvm            594197  0
oracleoks             503994  2 oracleacfs,oracleadvm

$ asmcmd volinfo --all
no volumes found
$ asmcmd volcreate -G GG GOLDENGATE -s 4G
$ asmcmd volinfo --all
Diskgroup Name: GG

         Volume Name: GOLDENGATE
         Volume Device: /dev/asm/goldengate-34
         State: ENABLED
         Size (MB): 4096
         Resize Unit (MB): 64
         Redundancy: UNPROT
         Stripe Columns: 8
         Stripe Width (K): 1024
         Usage:
         Mountpath:

$ mkfs -t acfs /dev/asm/goldengate-34
mkfs.acfs: version                   = 12.1.0.2.0
mkfs.acfs: on-disk version           = 39.0
mkfs.acfs: volume                    = /dev/asm/goldengate-34
mkfs.acfs: CLSU-00100: operating system function: ioctl failed with error data: 1
mkfs.acfs: CLSU-00101: operating system error message: Operation not permitted
mkfs.acfs: CLSU-00103: error location: OI_0
mkfs.acfs: ACFS-00546: failed to change on-disk signature
mkfs.acfs: ACFS-01004: /dev/asm/goldengate-34 was not formatted.

Bingo! I've got the same problem on my test machine, so I was able to reproduce the problem. Now it's clear that is not related to a clustered configuration. Ok, let's see the trace output:

2061  stat("/dev/asm/goldengate-34", {st_mode=S_IFBLK|0770, st_rdev=makedev(251, 17409), ...}) = 0
2061  open("/dev/ofsctl", O_RDWR)       = 9
2061  ioctl(9, 0xffffffffc1387015, 0x7ffd26305310) = -1 EPERM (Operation not permitted)

and permissions:

$ ll /dev/asm/.asm_ctl_spec
brwxrwx--- 1 root dba 251, 0 Jun  1 23:13 /dev/asm/.asm_ctl_spec
$ ll /dev/asm/goldengate-34
brwxrwx--- 1 root dba 251, 17409 Jun  1 23:24 /dev/asm/goldengate-34
$ ll /dev/ofsctl
brw-rw-r-- 1 root dba 250, 0 Jun  1 23:24 /dev/ofsctl

Output from tracing was the same and permissions are the same as defined in Udev rules and also the are correct. So according to this result I decided to remove ACFS related configuration and check the configuration and functionality of ACFS again using several older UEK3 kernels. Here is the result from the test of one of them (3.8.13-118.4.2.el7uek.x86_64):

$ uname -a
Linux el1 3.8.13-118.4.2.el7uek.x86_64 #2 SMP Tue Mar 22 20:46:48 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux
$ acfsdriverstate supported
ACFS-9200: Supported

# acfsroot install
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9154: Loading 'oracleoks.ko' driver.
ACFS-9154: Loading 'oracleadvm.ko' driver.
ACFS-9154: Loading 'oracleacfs.ko' driver.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
ACFS-9309: ADVM/ACFS installation correctness verified.
# lsmod | grep ora
oracleacfs           3498177  0
oracleadvm            594197  0
oracleoks             503994  2 oracleacfs,oracleadvm

$ asmcmd volinfo --all
no volumes found
$ asmcmd volcreate -G GG GOLDENGATE -s 4G
$ asmcmd volinfo --all
Diskgroup Name: GG

         Volume Name: GOLDENGATE
         Volume Device: /dev/asm/goldengate-34
         State: ENABLED
         Size (MB): 4096
         Resize Unit (MB): 64
         Redundancy: UNPROT
         Stripe Columns: 8
         Stripe Width (K): 1024
         Usage:
         Mountpath:

$ mkfs -t acfs /dev/asm/goldengate-34
mkfs.acfs: version                   = 12.1.0.2.0
mkfs.acfs: on-disk version           = 39.0
mkfs.acfs: volume                    = /dev/asm/goldengate-34
mkfs.acfs: volume size               = 4294967296  (   4.00 GB )
mkfs.acfs: Format complete.

# mount -t acfs /dev/asm/goldengate-34 /mnt/
# mount | grep mnt
/dev/asm/goldengate-34 on /mnt type acfs (rw,relatime,device,rootsuid,ordered)
# touch /mnt/testfile
# ll /mnt/testfile
-rw-r--r-- 1 root root 0 Jun  2 00:52 /mnt/testfile

 It works, we were able to format, mount and use the ACFS volume thus now we know that downgrade the UEK3 kernel is a workaround which solves the issue with ACFS until time when bug will be fixed by newer release of UEK3 kernel.

 A bonus case: What if the Sys Admin will upgrade kernel thus we use the "buggy" version of UEK3 kernel on already configured ACFS volumes? I did several repeating tests on already configured and previously working ACFS volume. Let see what will happen:

1st run:

$ uname -a
Linux el1 3.8.13-118.6.2.el7uek.x86_64 #2 SMP Thu May 19 13:15:51 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux

# acfsload start
ACFS-9391: Checking for existing ADVM/ACFS installation.
ACFS-9392: Validating ADVM/ACFS installation files for operating system.
ACFS-9393: Verifying ASM Administrator setup.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9154: Loading 'oracleoks.ko' driver.
ACFS-9154: Loading 'oracleadvm.ko' driver.
ACFS-9154: Loading 'oracleacfs.ko' driver.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
ACFS-9322: completed

$ asmcmd volenable -G GG GOLDENGATE
$ asmcmd volinfo --all
Diskgroup Name: GG

         Volume Name: GOLDENGATE
         Volume Device: /dev/asm/goldengate-34
         State: ENABLED
         Size (MB): 4096
         Resize Unit (MB): 64
         Redundancy: UNPROT
         Stripe Columns: 8
         Stripe Width (K): 1024
         Usage: ACFS
         Mountpath: /mnt

# mount -t acfs /dev/asm/goldengate-34 /mnt/
# mount | grep mnt
/dev/asm/goldengate-34 on /mnt type acfs (rw,relatime,device,rootsuid,ordered)
# touch /mnt/testfile2
# ll /mnt/testfile*
-rw-r--r-- 1 root root 0 Jun  2 00:52 /mnt/testfile
-rw-r--r-- 1 root root 0 Jun  2 01:18 /mnt/testfile2

Result of 1st run - working.

2nd run:

# Kernel panic - not syncing: Holding spin lock
rid: 1925, comm: mount Tainted: PF 0 3.8.13-118.6.2.e17uek.x86_64 #2
all Trace:
[<ffffffff81574fb0>] panic+Oxc8/0x1d7
[<ffffffffa0484a7a>] __KsPanic+0x9a/Oxa0 [oracleoks]
[<ffffffffa072275d>] ? OfsDoMountRecovery+Oxldd/Ox2b0 [oracleacfs]
[<ffffffffa0722a5d>] OfsDoPhaselRecovery+0x22d/Ox4e0 [oracleacfs]
[<ffffffffa0722e28>] OfsWaitForRecoveryToComplete+0x118/0x3c0 [oracleacfs]
[<ffffffffa0667446>] OfsSetupVolume+Oxc6/0x2ff0 [oracleacfs]
[<ffffffffa048507c>] ? KsSleepEvent+Ox8c/Oxce [oracleoks]
[<ffffffff81081e90>] ? wake_up_bit+Ox30/0x30
[<ffffffffa04851bc>] ? KsCreateSystemThread+Ox10c/Ox1b0 [oracleoks]
[<ffffffffa066b6dd>] OfsMountVolume+0x136d/Ox27e0 [oracleacfs]
[<ffffffffa0766e8f>] ofs_fill_sb+0x6f/Ox7e0 [oracleacfs]
[<ffffffff8118af58>] mount_bdev+0x1b8/0x200
[<ffffffffa0766e20>] ? ofs_parse_flags+0x140/0x140 [oracleacfs]
[<ffffffffa0763ec7>] ofs_mount+Ox107/0x2f0 [oracleacfs]
[<ffffffff8118b8e9>] mount_fs+0x39/0x1b0
[<ffffffff811a5dc7>] ? alloc_vfsmnt+Oxd7/0x1b0
[<ffffffff811a5f3f>] vfs_kern_mount+Ox5f/Oxf0
[<ffffffff811a8250>] do_mount+0x220/0xaf0
[<ffffffff8114343b>] ? strndup_user+Ox4b/Oxf0
[<ffffffff811a8ba3>] sys_mount+0x83/0xc0
[<ffffffff81587179>] system_call_fastpath+0x16/0x1b

Result of the 2nd run was kernel panic, machine hung of course.

3rd run:

# acfsload start
ACFS-9391: Checking for existing ADVM/ACFS installation.
ACFS-9392: Validating ADVM/ACFS installation files for operating system.
ACFS-9393: Verifying ASM Administrator setup.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
acfsutil plogconfig: CLSU-00100: operating system function: ioctl failed with error data: 22
acfsutil plogconfig: CLSU-00101: operating system error message: Invalid argument
acfsutil plogconfig: CLSU-00103: error location: OI_0
acfsutil plogconfig: ACFS-03500: Unable to access kernel persistent log entries.
ACFS-9225: Failed to start OKS persistent logging.
ACFS-9322: completed

Result of 3rd run: finished with CLSU-00100: operating system function: ioctl failed with error data: 22 , CLSU-00101: operating system error message: Invalid argument, CLSU-00103: error location: OI_0 error messages. After several tries of manual startup it has started successfully.

I did seven more runs and final result was 4x working, 5x  kernel panic, 1x error message during startup, so there is 60% chance that your system won't work.

Final conclusions

  • problem is not related to clustered environment
  • based on several tests (not shown in this article) 11gR2 (11.2.0.4 + APR 2016 PSU) and 12cR1 (12.1.0.2 + Apr 2016 PSU) are affected by this issue
  • problem is related to specific UEK3 kernel versions, to be precise: 3.8.13-118.6.1.el7uek and 3.8.13-118.6.2.el7uek
  • (currently) last properly working version of UEK3 is 3.8.13-118.4.2.el7uek
  • supported version of default kernels (non UEK) is one of possible solution/workaround
  • using older UEK3 kernel is a temporary workaround, not solution until the bug will be fixed. There are two reasons to have updated kernel (security, bugfix)

 

Update:

Updated kernel 3.8.13-118.8.1.el7uek.x86_64 has been released and this bug doesn't occur with this version (and probably later versions which will come in future).

 Hope that helps...

 

Configuring automatic startup of Oracle Database under systemd on RHEL 7/OEL 7/CentOS 7

There are several methods how to start Oracle Database automatically during/after OS boot. You can use Oracle CRS, other clusterware or init implemented in Linux. Starting RHEL 7 SysV init has been replaced by systemd or simply said systemd is the new init system.

In these days not only group of Linux users became polarized but also whole Linux world became polarized as well. Mostly Redhat based distributions have adopted systemd, other distributions are reluctant to implement systemd and either are continuing to use SysV init or migrated to another solutions e.g. Upstart. Despite this fact most of the Oracle certified Linux distributions (such as Redhat or SuSE and their (even not certified) clones) are using the systemd.

This post briefly shows how to configure systemd service for automatic start of Oracle Databases and Listener and these steps are applicable for Redhat Enterprise Linux 7, Oracle Enterprise Linux 7, CentOS 7 or SuSE Linux Enteprise Server 12 and Fedora 15 (or later).

Typically systemd startup configuration consists of two parts:

  • unit file - using ".service" suffix (in case of service), typically stored in /usr/lib/systemd/system or /etc/systemd/system directory for units provided by installed packages or /usr/lib/systemd/user or /etc/systemd/user directory for units installed by administrator
  • environment file (optional) - typically stored in /etc/sysconfig directory on RHEL and it's clones. We don't need it in our case.

Creating the unit for automatic startup/shutdown of Oracle Database manually

Logon as root user create and edit /etc/systemd/system/oracle-rdbms.service and add following content:

# /etc/systemd/system/oracle-rdbms.service
#   Invoking Oracle scripts to start/shutdown Instances defined in /etc/oratab
#   and starts Listener

[Unit]
Description=Oracle Database(s) and Listener
Requires=network.target

[Service]
Type=forking
Restart=no
ExecStart=/opt/oracle/12102/bin/dbstart /opt/oracle/12102
ExecStop=/opt/oracle/12102/bin/dbshut /opt/oracle/12102
User=oracle

[Install]
WantedBy=multi-user.target

 Note that this configuration assumes that our ORACLE_HOME is /opt/oracle/12102. It's recommended to use PIDFile while using "forking" type but we don't need it. As you can see well known scripts (shipped with Oracle Database) are executed for startup/shutdown using path to Oracle Home in order to specify the Oracle Home for Listener process. As shown these scripts are executed under "oracle" user account/privileges. More over service can be started once network is cofigured (started) and service starts in multi-user level (more less equivalent of runlevel 3 in SysV init)

Now we have to reload systemd in order to register unit file (as root) and enable the service.

systemctl daemon-reload
systemctl enable oracle-rdbms

 So, now the startup service should be created and enabled but to be sure we can check it by following command (Note: first line is OS command, other lines is the output):

systemctl status oracle-rdbms
oracle-rdbms.service - Oracle Database(s) and Listener
   Loaded: loaded (/etc/systemd/system/oracle-rdbms.service; enabled)

According to output our service is enabled succesfully and should be started on next OS boot. To start the service without reboot of machine you can use following command:

systemctl start oracle-rdbms

 

Creating the unit for automatic startup/shutdown of Oracle Database using script

I have created a simple script which automatically performs above tasks for creating and enabling startup service. This scripts contains simple checks (as I've tried to make the script bulletproof), then lists available Oracle homes that exist on OS and then asks to specify Oracle home from which the Listener will be started. Note that it's important to specify Oracle home for the highest version of Oracle software as Listener will be handling connections for all Oracle homes.

#!/usr/bin/bash

# This script configures systemd startup service for Oracle Databases and Listener
# Ivan Kartik http://ivan.kartik.sk

if [ `whoami` != "root" ]; then
   echo "root login required!"
   exit
fi

if [ `uname -s` != "Linux" ]; then
   echo "This is not Linux!"
   exit
fi

if [ `ps -e|grep " 1 ?"|cut -d " " -f15` != "systemd" ]; then
   echo "Systemd is not present, use Init scripts instead!"
   exit
fi

echo "List of existing Oracle Homes:"
echo "------------------------------"
cat `cat /etc/oraInst.loc|grep inventory_loc|cut -d '=' -f2`/ContentsXML/inventory.xml|grep "HOME NAME"|cut -d '"' -f 4
echo 

echo "Enter ORACLE_HOME of Oracle Listener [$ORACLE_HOME]:"
read NEWHOME

case "$NEWHOME" in
     "")         ORAHOME="$ORACLE_HOME" ;;
     *)          ORAHOME="$NEWHOME" ;;
esac

if [ -z $ORAHOME ]; then
   echo "Error: Missing value!"
   exit
fi

if [ -f $ORAHOME/bin/lsnrctl ]; then
   echo '# /etc/systemd/system/oracle-rdbms.service
# Ivan Kartik http://ivan.kartik.sk
#   Invoking Oracle scripts to start/shutdown Instances defined in /etc/oratab
#   and starts Listener

[Unit]
Description=Oracle Database(s) and Listener
Requires=network.target

[Service]
Type=forking
Restart=no
ExecStart='$ORAHOME'/bin/dbstart '$ORAHOME'
ExecStop='$ORAHOME'/bin/dbshut '$ORAHOME'
User=oracle

[Install]
WantedBy=multi-user.target' > /etc/systemd/system/oracle-rdbms.service

    systemctl daemon-reload
    systemctl enable oracle-rdbms
    echo "Done! Service oracle-ordbms has been configured and will be started during next boot."
    echo "If you want to start service now, execute: systemctl start oracle-rdbms"
else
   echo "Error: No Listener script under specified ORACLE_HOME: $ORAHOME"
   exit
fi
 

You either can copy/paste this code or download here: http://ivan.kartik.sk/scripts/oracle_systemd_service.sh

Final check of service and started databases (Note: Output from systemctl status has been shortened):

# cat /etc/oratab |grep :Y
ORA12CR1:/opt/oracle/12102:Y
ORA11GR2:/opt/oracle/11204:Y

# systemctl status oracle-rdbms
oracle-rdbms.service - Oracle Database(s) and Listener
   Loaded: loaded (/etc/systemd/system/oracle-rdbms.service; enabled)
   Active: active (running) since Mon 2015-11-15 14:51:13 CET; 54s ago
  Process: 425 ExecStart=/opt/oracle/12102/bin/dbstart /opt/oracle/12102 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/oracle-rdbms.service
           ├─ 452 /opt/oracle/12102/bin/tnslsnr LISTENER -inherit
           ├─1155 ora_pmon_ORA12CR1
           ├─1171 ora_vktm_ORA12CR1
           ├─1177 ora_gen0_ORA12CR1
           ├─1181 ora_mman_ORA12CR1
           ├─1183 ora_diag_ORA12CR1
           ├─1185 ora_dbrm_ORA12CR1
           ├─1195 ora_ckpt_ORA12CR1
           ├─1197 ora_smon_ORA12CR1
           ├─1199 ora_reco_ORA12CR1
           ├─1201 ora_lreg_ORA12CR1
           ├─1289 ora_mman_ORA11GR2
.....   
           ├─1291 ora_dbw0_ORA11GR2
           ├─1293 ora_lgwr_ORA11GR2
           ├─1295 ora_ckpt_ORA11GR2
           ├─1297 ora_smon_ORA11GR2
           ├─1299 ora_reco_ORA11GR2
           ├─1301 ora_mmon_ORA11GR2
           ├─1303 ora_mmnl_ORA11GR2
           ├─1350 ora_qmnc_ORA11GR2
           └─1450 ora_q001_ORA11GR2

Nov 15 14:50:57 oel01 dbstart[425]: Processing Database instance "ORA12CR1": log file /opt/oracle/12102/startup.log
Nov 15 14:51:07 oel01 dbstart[425]: Processing Database instance "ORA11GR2": log file /opt/oracle/11204/startup.log
Nov 15 14:51:13 oel01 systemd[1]: Started Oracle Database(s) and Listener.

For little comparison of difference commands or usage regarding SysV init and systemd, here is very nice cheat sheet created by guys from Linoxide.com it's downloadable here: http://images.linoxide.com/systemd-vs-sysVinit-cheatsheet.pdf

Systemd Homepage: http://www.freedesktop.org/wiki/Software/systemd/

 

Home