Ivan Kartik - Oracle and Linux Blog

Oracle Database 12cR2 online documentation available

Following the Cloud-first strategy Oracle has announced the general availability of Oracle Database 12c Release 2 via the new Oracle Exadata Express Cloud Service. Software for on-premises installation has not been released yet but if you want to learn what's new or what is coming in the 12cR2 you can browse documentation which is available at: http://docs.oracle.com/en/database/

ACFS issue with latest UEK3 kernels on OEL 7/RHEL 7

One of my colleagues has been facing to weird behaviour of ACFS during installation of two nodes RAC on RHEL 7 and OEL7 (started on RHEL, then tried OEL). ACFS volume creation (during mkfs execution) one of the nodes has either hung or thrown error like this:

mkfs.acfs: version                   = 12.1.0.2.0
mkfs.acfs: on-disk version           = 39.0
mkfs.acfs: volume                    = /dev/asm/goldengate-32
mkfs.acfs: CLSU-00100: operating system function: ioctl failed with error data: 1
mkfs.acfs: CLSU-00101: operating system error message: Operation not permitted
mkfs.acfs: CLSU-00103: error location: OI_0
mkfs.acfs: ACFS-00546: failed to change on-disk signature
mkfs.acfs: ACFS-01004: /dev/asm/goldengate-32 was not formatted.

After couple of days of his struggling I decided to help him to find out the problem because my colleague is very experienced thus I knew that he has done everything correctly (well, definitely it wasn't his first rodeo). Moreover I have done similar installation just few days before, without any problem. And that was interesting. Firstly I checked logs and of course configuration (udev, multipath, etc.), it had been correct and traced (using one of my closest friends since 2000) that process with this result:

2216  stat("/dev/asm/goldengate-34", {st_mode=S_IFBLK|0770, st_rdev=makedev(251, 17409), ...}) = 0
2216  open("/dev/ofsctl", O_RDWR)       = 9
2216  ioctl(9, 0xffffffffc1387015, 0x7ffdbee52880) = -1 EPERM (Operation not permitted)

 And this was really weird as permissions for these devices are defined by Udev configuration which is created during installation.

Gathered all important information about environment I've created an action plan. Installation of the same environment on my virtual machine. Well I'd rather say almost the same as I decided to not install the clustered environment in first step and of course used hardware was different and hypervisor has been as well. Version of all software was pretty much same as in original (my colleague's) environment, except one so important thing - Linux kernel. I decided to use original (non UEK) kernel first.
Installation went smoothly, ACFS had smooth configuration and later functionality too, so two options (or unanswered question) remained - whether it's kernel related or RAC related problem where the first option was my preference according to previous tracing.
So I've installed exact version (3.8.13-118.6.2.el7uek) of Unbreakable Kernel provided by Oracle just as my colleague did. And here is the result (Note that it was an intention to perform all steps manually):

$ uname -a
Linux el1 3.8.13-118.6.2.el7uek.x86_64 #2 SMP Thu May 19 13:15:51 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux
$ acfsdriverstate supported
ACFS-9200: Supported

# acfsroot install
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9154: Loading 'oracleoks.ko' driver.
ACFS-9154: Loading 'oracleadvm.ko' driver.
ACFS-9154: Loading 'oracleacfs.ko' driver.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
ACFS-9309: ADVM/ACFS installation correctness verified.

# lsmod | grep ora
oracleacfs           3498177  0
oracleadvm            594197  0
oracleoks             503994  2 oracleacfs,oracleadvm

$ asmcmd volinfo --all
no volumes found
$ asmcmd volcreate -G GG GOLDENGATE -s 4G
$ asmcmd volinfo --all
Diskgroup Name: GG

         Volume Name: GOLDENGATE
         Volume Device: /dev/asm/goldengate-34
         State: ENABLED
         Size (MB): 4096
         Resize Unit (MB): 64
         Redundancy: UNPROT
         Stripe Columns: 8
         Stripe Width (K): 1024
         Usage:
         Mountpath:

$ mkfs -t acfs /dev/asm/goldengate-34
mkfs.acfs: version                   = 12.1.0.2.0
mkfs.acfs: on-disk version           = 39.0
mkfs.acfs: volume                    = /dev/asm/goldengate-34
mkfs.acfs: CLSU-00100: operating system function: ioctl failed with error data: 1
mkfs.acfs: CLSU-00101: operating system error message: Operation not permitted
mkfs.acfs: CLSU-00103: error location: OI_0
mkfs.acfs: ACFS-00546: failed to change on-disk signature
mkfs.acfs: ACFS-01004: /dev/asm/goldengate-34 was not formatted.

Bingo! I've got the same problem on my test machine, so I was able to reproduce the problem. Now it's clear that is not related to a clustered configuration. Ok, let's see the trace output:

2061  stat("/dev/asm/goldengate-34", {st_mode=S_IFBLK|0770, st_rdev=makedev(251, 17409), ...}) = 0
2061  open("/dev/ofsctl", O_RDWR)       = 9
2061  ioctl(9, 0xffffffffc1387015, 0x7ffd26305310) = -1 EPERM (Operation not permitted)

and permissions:

$ ll /dev/asm/.asm_ctl_spec
brwxrwx--- 1 root dba 251, 0 Jun  1 23:13 /dev/asm/.asm_ctl_spec
$ ll /dev/asm/goldengate-34
brwxrwx--- 1 root dba 251, 17409 Jun  1 23:24 /dev/asm/goldengate-34
$ ll /dev/ofsctl
brw-rw-r-- 1 root dba 250, 0 Jun  1 23:24 /dev/ofsctl

Output from tracing was the same and permissions are the same as defined in Udev rules and also the are correct. So according to this result I decided to remove ACFS related configuration and check the configuration and functionality of ACFS again using several older UEK3 kernels. Here is the result from the test of one of them (3.8.13-118.4.2.el7uek.x86_64):

$ uname -a
Linux el1 3.8.13-118.4.2.el7uek.x86_64 #2 SMP Tue Mar 22 20:46:48 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux
$ acfsdriverstate supported
ACFS-9200: Supported

# acfsroot install
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9154: Loading 'oracleoks.ko' driver.
ACFS-9154: Loading 'oracleadvm.ko' driver.
ACFS-9154: Loading 'oracleacfs.ko' driver.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
ACFS-9309: ADVM/ACFS installation correctness verified.
# lsmod | grep ora
oracleacfs           3498177  0
oracleadvm            594197  0
oracleoks             503994  2 oracleacfs,oracleadvm

$ asmcmd volinfo --all
no volumes found
$ asmcmd volcreate -G GG GOLDENGATE -s 4G
$ asmcmd volinfo --all
Diskgroup Name: GG

         Volume Name: GOLDENGATE
         Volume Device: /dev/asm/goldengate-34
         State: ENABLED
         Size (MB): 4096
         Resize Unit (MB): 64
         Redundancy: UNPROT
         Stripe Columns: 8
         Stripe Width (K): 1024
         Usage:
         Mountpath:

$ mkfs -t acfs /dev/asm/goldengate-34
mkfs.acfs: version                   = 12.1.0.2.0
mkfs.acfs: on-disk version           = 39.0
mkfs.acfs: volume                    = /dev/asm/goldengate-34
mkfs.acfs: volume size               = 4294967296  (   4.00 GB )
mkfs.acfs: Format complete.

# mount -t acfs /dev/asm/goldengate-34 /mnt/
# mount | grep mnt
/dev/asm/goldengate-34 on /mnt type acfs (rw,relatime,device,rootsuid,ordered)
# touch /mnt/testfile
# ll /mnt/testfile
-rw-r--r-- 1 root root 0 Jun  2 00:52 /mnt/testfile

 It works, we were able to format, mount and use the ACFS volume thus now we know that downgrade the UEK3 kernel is a workaround which solves the issue with ACFS until time when bug will be fixed by newer release of UEK3 kernel.

 A bonus case: What if the Sys Admin will upgrade kernel thus we use the "buggy" version of UEK3 kernel on already configured ACFS volumes? I did several repeating tests on already configured and previously working ACFS volume. Let see what will happen:

1st run:

$ uname -a
Linux el1 3.8.13-118.6.2.el7uek.x86_64 #2 SMP Thu May 19 13:15:51 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux

# acfsload start
ACFS-9391: Checking for existing ADVM/ACFS installation.
ACFS-9392: Validating ADVM/ACFS installation files for operating system.
ACFS-9393: Verifying ASM Administrator setup.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9154: Loading 'oracleoks.ko' driver.
ACFS-9154: Loading 'oracleadvm.ko' driver.
ACFS-9154: Loading 'oracleacfs.ko' driver.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
ACFS-9322: completed

$ asmcmd volenable -G GG GOLDENGATE
$ asmcmd volinfo --all
Diskgroup Name: GG

         Volume Name: GOLDENGATE
         Volume Device: /dev/asm/goldengate-34
         State: ENABLED
         Size (MB): 4096
         Resize Unit (MB): 64
         Redundancy: UNPROT
         Stripe Columns: 8
         Stripe Width (K): 1024
         Usage: ACFS
         Mountpath: /mnt

# mount -t acfs /dev/asm/goldengate-34 /mnt/
# mount | grep mnt
/dev/asm/goldengate-34 on /mnt type acfs (rw,relatime,device,rootsuid,ordered)
# touch /mnt/testfile2
# ll /mnt/testfile*
-rw-r--r-- 1 root root 0 Jun  2 00:52 /mnt/testfile
-rw-r--r-- 1 root root 0 Jun  2 01:18 /mnt/testfile2

Result of 1st run - working.

2nd run:

# Kernel panic - not syncing: Holding spin lock
rid: 1925, comm: mount Tainted: PF 0 3.8.13-118.6.2.e17uek.x86_64 #2
all Trace:
[<ffffffff81574fb0>] panic+Oxc8/0x1d7
[<ffffffffa0484a7a>] __KsPanic+0x9a/Oxa0 [oracleoks]
[<ffffffffa072275d>] ? OfsDoMountRecovery+Oxldd/Ox2b0 [oracleacfs]
[<ffffffffa0722a5d>] OfsDoPhaselRecovery+0x22d/Ox4e0 [oracleacfs]
[<ffffffffa0722e28>] OfsWaitForRecoveryToComplete+0x118/0x3c0 [oracleacfs]
[<ffffffffa0667446>] OfsSetupVolume+Oxc6/0x2ff0 [oracleacfs]
[<ffffffffa048507c>] ? KsSleepEvent+Ox8c/Oxce [oracleoks]
[<ffffffff81081e90>] ? wake_up_bit+Ox30/0x30
[<ffffffffa04851bc>] ? KsCreateSystemThread+Ox10c/Ox1b0 [oracleoks]
[<ffffffffa066b6dd>] OfsMountVolume+0x136d/Ox27e0 [oracleacfs]
[<ffffffffa0766e8f>] ofs_fill_sb+0x6f/Ox7e0 [oracleacfs]
[<ffffffff8118af58>] mount_bdev+0x1b8/0x200
[<ffffffffa0766e20>] ? ofs_parse_flags+0x140/0x140 [oracleacfs]
[<ffffffffa0763ec7>] ofs_mount+Ox107/0x2f0 [oracleacfs]
[<ffffffff8118b8e9>] mount_fs+0x39/0x1b0
[<ffffffff811a5dc7>] ? alloc_vfsmnt+Oxd7/0x1b0
[<ffffffff811a5f3f>] vfs_kern_mount+Ox5f/Oxf0
[<ffffffff811a8250>] do_mount+0x220/0xaf0
[<ffffffff8114343b>] ? strndup_user+Ox4b/Oxf0
[<ffffffff811a8ba3>] sys_mount+0x83/0xc0
[<ffffffff81587179>] system_call_fastpath+0x16/0x1b

Result of the 2nd run was kernel panic, machine hung of course.

3rd run:

# acfsload start
ACFS-9391: Checking for existing ADVM/ACFS installation.
ACFS-9392: Validating ADVM/ACFS installation files for operating system.
ACFS-9393: Verifying ASM Administrator setup.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
acfsutil plogconfig: CLSU-00100: operating system function: ioctl failed with error data: 22
acfsutil plogconfig: CLSU-00101: operating system error message: Invalid argument
acfsutil plogconfig: CLSU-00103: error location: OI_0
acfsutil plogconfig: ACFS-03500: Unable to access kernel persistent log entries.
ACFS-9225: Failed to start OKS persistent logging.
ACFS-9322: completed

Result of 3rd run: finished with CLSU-00100: operating system function: ioctl failed with error data: 22 , CLSU-00101: operating system error message: Invalid argument, CLSU-00103: error location: OI_0 error messages. After several tries of manual startup it has started successfully.

I did seven more runs and final result was 4x working, 5x  kernel panic, 1x error message during startup, so there is 60% chance that your system won't work.

Final conclusions

  • problem is not related to clustered environment
  • based on several tests (not shown in this article) 11gR2 (11.2.0.4 + APR 2016 PSU) and 12cR1 (12.1.0.2 + Apr 2016 PSU) are affected by this issue
  • problem is related to specific UEK3 kernel versions, to be precise: 3.8.13-118.6.1.el7uek and 3.8.13-118.6.2.el7uek
  • (currently) last properly working version of UEK3 is 3.8.13-118.4.2.el7uek
  • supported version of default kernels (non UEK) is one of possible solution/workaround
  • using older UEK3 kernel is a temporary workaround, not solution until the bug will be fixed. There are two reasons to have updated kernel (security, bugfix)

 

Update:

Updated kernel 3.8.13-118.8.1.el7uek.x86_64 has been released and this bug doesn't occur with this version (and probably later versions which will come in future).

 Hope that helps...

 

New package dependency for OPatch utility

My colleague has got this error message:

$ opatch lsinventory
OPatch/opatch: line 750: bc: command not found
Invalid maximum heap size: -Xmxm
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

OPatch failed with error code 1

A "bc" is an arbitrary precision calculator language (witch is often required for math calculation in Bash scripts, when you are using floating-point numbers for example) and many of us are using the bc command as simple command-line calculator and now OPatch does it the as well. It is used just for setting maximum heap size for JRE (this is being done in setUpJREMemoryOPtions() function of opatch command) which in newer versions is using "bc" command. The idea behind this post is beacause many Linux/Unix (power) users don't know about "bc", surprisingly even some SysAdmins.

$ OPatch_old/opatch version
OPatch Version: 12.1.0.1.7

OPatch succeeded.

$ OPatch_new/opatch version
OPatch_new/opatch: line 750: bc: command not found
Invalid maximum heap size: -Xmxm
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

OPatch failed with error code 1

# rpm -ih bc-1.06.95-13.el7.x86_64.rpm
################################## [100%]
Updating / installing
################################## [100%]

$ OPatch_new/opatch version
OPatch Version: 12.1.0.1.12

OPatch succeeded.

To solve the issue simply execute (to install the bc package):

# yum install bc

and put the bc-*.rpm to your list of required packages by Oracle software.

 

Update 20.3.2017:

Recently I have tested Opatch utility shipped in Oracle Database 12c Release 2 (Opatch version 12.2.0.1.6). Following error message appears if "bc" package is not installed while opatch_env.sh exists in OPatch main directory.

./opatch lsinventory
./opatch: line 908: bc: command not found
Invalid maximum heap size: -Xmxm
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

OPatch failed with error code 1

If opatch_env.sh is missing in OPatch main directory "bc" package is not required. Anyway I still recommend to have "bc" package installed by default.

Upgrade to 12c: Put a spotlight on Database client

It's been almost one year I had touched for last time draft version of this article and thanks to couple days of free time I could finish it so I'm happy that it will not remain as unfinished symphony :-). Everybody talks today about upgrade Oracle Database to 12c or discussing many new features of 12c, no matter if they are interesting or not so interesting, really important or not so important, extra paid or not.
You've decided that you can rely on new version, it seems stable, you gained sufficient information about new and obsoleted/deprecated stuff and made couple of tests of upgrade process on your testing environment. Everything looks fine, so you proceed to real upgrade. Database is upgraded and everything seems to be OK but your applications and colleagues are reporting problems with access to freshly upgraded database. Well... database upgrade is not just about database upgrade.

Case 1: ORA-28040: No matching authentication protocol

This error is thrown when used version of authentication protocol by client does not meet minimum authentication protocol allowed for clients.
In 12cR1 comes with default settings for minimal version of authentication protocol set to 11, thus all clients which are using older version (older than 11g) will produce above error.

Workaround: Add following line to sqlnet.ora (on server side)

SQLNET.ALLOWED_LOGON_VERSION_CLIENT=8


Case 2: ORA-28040: No matching authentication protocol or ORA-03134: Connections to this server version are no longer supported

This error is thrown when you are using database link to older release for the same reason as in previous case.

Workaround: Add following line to sqlnet.ora

SQLNET.ALLOWED_LOGON_VERSION_SERVER=8

Case 3: Segmentation fault

Solution:
Password versions problem. Upgrade your client (at least to 10gR1 version) as it's obviously older than old.

If you hit one of above problems it means that you didn't pay (so much) attention (or you weren't so precise) to clients and 3rd party software that is connecting to your database and the quality of your preparation (I will discuss the preparation phase in one of future posts) for upgrade is... questionable because you probably didn't (and you definitely should) read this:

Client / Server Interoperability Support Matrix for Different Oracle Versions (Doc ID 207303.1)
and this:
Database Net Services Reference - https://docs.oracle.com/database/121/NETRF/sqlnet.htm#NETRF006

Note that above parameters in sqlnet.ora are just bloody workaround, it's not a solution. By setting of lower version by these parameters you've just bought a uncertain amount of time. Again I suggest to check the MOS Doc ID 207303.1.

Getting more information about Database client

In this part we take a look at some possibilities how to get more information about clients in order to identify who and what is using old client and which version client is used for particular session.
For that purpose we can use (G)V$SESSION_CONNECT_INFO (view based on X$KSUSECON) which since 11gR1 version is a great source of additional information about connections. This view (and X$ table) has been present even in earlier versions of Oracle Database (AFAIK I've seen it in 9i) but since 11gR1 provides more detailed thus useful information. For our report we will use it in combination with (G)V$SESSION view.

SYS@ORA11GR2:SQL>
SELECT
      /* a.sid, a.serial#, */ b.username,
      a.client_version, a.osuser,
      b.machine, b.module, b.program,
      a.client_charset , a.client_driver,
      substr(a.client_connection, 0, 4) client_conn,
      b.service_name,
      substr(b.server,0,1) server
FROM gv$session_connect_info a, gv$session b
WHERE a.sid = b.sid
      and a.serial# = b.serial#
      and b.module <> 'DBMS_SCHEDULER'
      and network_service_banner like 'TCP%'
 14  ORDER BY client_version, username;

USERNAME  CLIENT_VER OSUSER     MACHINE       MODULE                       PROGRAM                        CLIENT_CHARSET  CLIENT_DRI CLIE SERVICE_NAME  S
--------- ---------- ---------- ------------- ---------------------------- ------------------------------ --------------- ---------- ---- ------------- -
TEST1     11.1.0.7.0 oracle     lab01         JDBC Thin Client             JDBC Thin Client               Unknown         jdbcthin   Hete MYSERV        N
TEST1     11.2.0.2.0 updater    lab01         JDBC Thin Client             JDBC Thin Client               Unknown         jdbcthin   Hete ORA11GR2.TEST N
TEST2     11.2.0.2.0 oracle     lab01         JDBC Thin Client             JDBC Thin Client               Unknown         jdbcthin   Hete ORA11GR2.TEST D
TEST2     11.2.0.3.0 root       lab01         java@lab01 (TNS V1-V3)       java@lab01 (TNS V1-V3)         UTF8            jdbcoci    Hete ORA11GR2.TEST N
TEST3     11.2.0.4.0 kartik     MYDOM\LAB10   ngator.exe                   ngator.exe                     EE8MSWIN1250               Hete ORA11GR2.TEST D
TEST3     12.1.0.2.0 oracle     lab00         oracle@lab00 (TNS V1-V3)     oracle@lab00 (TNS V1-V3)       EE8ISO8859P2               Homo ORA11GR2.TEST N
TEST4     12.1.0.2.0 oracle     lab00         oracle@lab00 (TNS V1-V3)     oracle@lab00 (TNS V1-V3)       AL32UTF8                   Hete ORA11GR2.TEST D
TEST4     Unknown    oracle     oel01         httpd@oel01 (TNS V1-V3)      httpd@oel01 (TNS V1-V3)        Unknown                    Hete MYSERV        N
TEST5     Unknown    gtw        oel01         dbgtw.bin@oel01 (TNS V1-V3)  dbgtw.bin@oel01 (TNS V1-V3)    Unknown                    Homo ORA11GR2.TEST D
TEST5     Unknown    apache     oel02         httpd@oel02 (TNS V1-V3)      httpd@oel02 (TNS V1-V3)        Unknown                    Hete ORA11GR2.TEST N
TEST6     Unknown    apache     lab01         httpd@lab01 (TNS V1-V3)      httpd@lab01 (TNS V1-V3)        Unknown                    Hete ORA11GR2.TEST N
TEST6     Unknown    apache     oel02         httpd@oel02 (TNS V1-V3)      httpd@oel02 (TNS V1-V3)        Unknown                    Hete ORA11GR2.TEST N
TEST7     Unknown    kartik     mybook        SQL*Plus                     sqlplus@mybook (TNS V1-V3)     Unknown                    Homo ORA11GR2.TEST D

 According to above output we can easily identify servers which are using older version of Oracle Database client, older than 11gR1 to be precise. That session have the "Unknown" value in second column. And also we have other useful informations about client usage. For example, when we look at columns MACHINE, MODULE, PROGRAM, CLIENT_CONNECTION (CLIE column in above output) we know that sessions from OEL01 server are established from an Apache server which is running application (we can guess that there is some fastcgi, PHP, PL/SQL or other Apache module) and some other (probably standalone) application and both of them are using OCI connection. OK, this report is fine but it covers only current sessions but we can have clients/application/users which are connecting iregulary or occasionally.
So we need to be sure that we can identify the most of older clients as we can but this good concept for further step which will be automatic gathering of information.

For purpose of recording information about old clients we may use AFTER LOGON trigger.

CREATE TABLE systools.t_oclient_info
(
    version        varchar2(40),
    username       varchar2(30),
    osuser         varchar2(30),
    machine        varchar2(64),
    hostip         varchar2(15),
    module         varchar2(48),
    terminal       varchar2(30),
    oci_library    varchar2(27),
    client_charset varchar2(40),
    client_driver  varchar2(9),
    client_conn    varchar2(13),
    service_name   varchar2(64)
) TABLESPACE users;

CREATE OR REPLACE TRIGGER al_tr_log_oclient_info
AFTER LOGON ON DATABASE
WHEN (USER != 'SYS')
BEGIN
  INSERT INTO systools.t_oclient_info
  SELECT a.client_version,
         sys_context('USERENV', 'SESSION_USER') username,
         sys_context('USERENV', 'OS_USER') osuser,
         sys_context('USERENV', 'HOST') machine,
         sys_context('USERENV', 'IP_ADDRESS', 15),
         sys_context('USERENV', 'MODULE') module,
         sys_context('USERENV', 'TERMINAL') terminal,
         a.client_oci_library oci_library,
         a.client_charset,
         a.client_driver,
         substr(a.client_connection, 0, 4) client_conn,
         sys_context('USERENV', 'SERVICE_NAME') service_name
  FROM v$session_connect_info a
  WHERE  a.sid = sys_context('USERENV', 'SID')
         and a.network_service_banner like 'TCP%'
         and a.client_version = 'Unknown'
         and not exists (SELECT null
                          FROM systools.t_oclient_info
                          WHERE osuser = sys_context('USERENV', 'OS_USER')
                            and username = sys_context('USERENV', 'SESSION_USER')
                            and hostip = sys_context('USERENV', 'IP_ADDRESS', 15)
                            and module = sys_context('USERENV', 'MODULE'));
END;
/

Note that On-logon trigger can produce/add some overhead (in case of frequently connecting applications), so it's important to keep it as minimal as it's possible and do only necessary work. Logging is executed for all users except SYS, you might consider to reduce number of users if you are sure that no old client is connecting to particular user.
I've used information from SYS_CONTEXT instead of joining the V$SESSION view. The side effect of this change is that we lost PROGRAM column but at the same time we've got HOSTIP column with an IP address.
Also you should keep the T_OCICLIENT_INFO table as small as possible and that's the reason why I added the "not exists" condition (you can remove it, so no checking for existing entry, just pure insert for each session) and also you can limit number of rows in table by removing "OSUSER=sys_context('USERENV', 'OS_USER')" from conditions. Also keep on mind that only 10gR2 and older clients will be logged (as the client version number is returned for 11gR1). And finally note that this is a my way, perhaps it might not be a best way but from my point this was effective solution, as always you can choose your own way.

Now you need to give some time in order to log most (ideally and hopefully all) of the clients.

SYS@ORA11GR2:SQL> select * from systools.t_oclient_info;

VERSION    USERNAME  OSUSER     MACHINE   HOSTIP          MODULE                         TERMINAL   OCI_LIBRARY  CLIENT_CHARSET  CLIENT_DRI CLIE SERVICE_NAME
---------- --------- ---------- --------- --------------- ------------------------------ ---------- ------------ --------------- ---------- ---- -------------
Unknown    TEST7     kartik     mybook    192.168.56.1    sqlplus@mybook (TNS V1-V3)     pts/6      Unknown      Unknown                    Homo ORA11GR2.TEST
Unknown    TEST4     oracle     oel01     192.168.56.101  httpd@oel01 (TNS V1-V3)                                Unknown                    Hete MYSERV
Unknown    TEST5     gtw        oel01     192.168.56.101  dbgtw.bin@oel01 (TNS V1-V3)    pts/5      Unknown      Unknown                    Homo ORA11GR2.TEST
Unknown    TEST5     apache     oel02     192.168.56.102  httpd@oel02 (TNS V1-V3)                                Unknown                    Hete ORA11GR2.TEST
Unknown    TEST6     apache     oel02     192.168.56.102  httpd@oel02 (TNS V1-V3)                                Unknown                    Hete ORA11GR2.TEST
Unknown    TEST6     apache     lab01     192.168.56.111  httpd@lab01 (TNS V1-V3)                                Unknown                    Hete ORA11GR2.TEST

 Deep diving aka Oracle Client version identification and tracing for "Unknown"s

OK, we have got identified all clients older than 11gR1 and I hope (or believe) that previous part is sufficient for you. But there could be some reasons when you might need to know exact version of Oracle database client.
For instance you don't have access to remote computer which is connecting to your database and it might be very time consuming to gather this information or you have some legacy application
which can work with (or supports) the 12cR1 database but it's not so easy to replace the client or there could be another issue or reason typical for corporate environment (My story was that somebody didn't do his job by providing required information/action and I couldn't wait with the upgrade any longer so I had to know what to expect after upgrade thus prevent some unexpected behaviour due to old client).
By knowing of exact version we can better predict and address potential implications, issues or bugs (for example by checking existing bug database or knowledge base at MOS site) for that client version.

Question is how we can detect version remotely? What exactly we are looking for? And thanks to my little research (for security related reason) more than ten years ago I knew where to start. For better understanding we have to get answer for those two questions and to answer these questions we can use output from good old SQLNet (server) tracing. I've used 10gR2 client and 12cR1 database for that purpose. In produced trace file we will focus on handshake/negotiation phase where can find the needed hint(s) (Note I'm not going to explain whole structure of trace information nor detailed description of packets, it's not an topic and also it would need separate article for full topic coverage and that's not my intention).

undefined

So we've got first interesting line/entry:
nsconneg: vsn=313, lov=300, opt=0xc01, sdu=2048, tdu=32767, ntc=0x7f08

Where VSN represents the protocol version which can help to identify release of the client. So we can easily find that:

Version         Dec      Hex
-----------------------------
8               310     0x136
9iR1            311     0x137
9iR2            312     0x138
10gR1/10gR2     313     0x139
11gR1/11gR2     314     0x13a
12cR1           315     0x13b

But this method is not sufficient as:
1. This information is not reliable as the value may change. For instance 11.2.0.4 JDBC (in this case provided by Instant client package) contained following entry:
nsconneg: vsn=310, lov=300, opt=0xc41, sdu=8192, tdu=32767, ntc=0x4f98

2. This information doesn't provide exact version number.


So we have the look further and we can see that there is another interesting part wich comes with other two lines and this is the information we are looking for:
nau_rpv: Version received for Linux: Version 10.2.0.5.0 - UNKNOWN
nau_rpv: Connection version for Linux: Version 10.2.0.5.0 - UNKNOWN

Now let's check the trace file again and the corresponding packet in order to find out the location of this information.

undefined

If you look at the above picture you can see the TNS packet and the version in red rectangles. Also note the "DEADBEEF" sequence which is the way how we can identify the right packet in order to get this information. Using utility I've created years ago I translated the Hex value from red rectangles to Dec value and version value in order to check the version:

$ ./oravsn.sh a200500

HEX: A200500 VSNUM: 169870592 VER: 10.2.0.5.0

 And you can do the same thing by different way for example by using the Oracle Database:

SQL> select sys_op_version(to_number('a200500', 'xxxxxxxxx')) version from dual;

VERSION
----------------------------------------
10.2.0.5.0

SQL> select to_number('b200400', 'xxxxxxxxx') vsnum from dual;

     VSNUM
----------
 186647552

 Now we can use this information as input to search in the MOS bug database or knowledge base to find out existing or known issues associated with this version.

Maybe you ask why Oracle clients version 10gR2 and below have "Unknown" value instead of version number. Answer for this question can be found in trace file as well but we have to use 11gR1 client or later. For this purpose I've created another trace file using the 11gR2 client.

undefined

As you can see at the picture above there is SESSION_CLIENT_VERSION sent with the value 186647552 which is decimal representation of Oracle version.

$ ./oravsn.sh 186647552

HEX: b200400 VSNUM: 186647552 VER: 11.2.0.4.0

When you check SQLNet server trace files for clients older than 11gR1 you will find that this information is missing there and that's the reason for an "Unknown" value.

Automatic information gathering for "Unknown"s

Now we have to solve the question how to collect these information automatically and what tools we need to use for that purpose.
Firstly we need to think about the rules and known facts. For instance:
a) overhead must be as low as possible
b) we need to limit information gathering for specific hosts only
c) output should be as minimal as possible
d) Database is running on Linux

Based on above points it's clear that using of server side sqlnet tracing is pointless as it conflicts with a, b and c.
As the Database is running on Linux box with the 2.6 kernel so I chose the solution based on combination of iptables/netfilter (local firewall) and wireshark. Of course in case of different OS I would have to choose different solution.

So let's see the setup:

# iptables -N ORATRACE
# iptables -A ORATRACE -p tcp -m string --hex-string "|deadbeef|" --algo kmp --from 50 --to 110 -j NFLOG --nflog-group 15
# iptables -A ORATRACE -p tcp -j ACCEPT
# iptables -A INPUT -s 192.168.56.1 -p tcp -m state --state ESTABLISHED -m tcp --dport 1521 -j ORATRACE
# iptables -A INPUT -s 192.168.56.101 -p tcp -m state --state ESTABLISHED -m tcp --dport 1521 -j ORATRACE
# iptables -A INPUT -s 192.168.56.102 -p tcp -m state --state ESTABLISHED -m tcp --dport 1521 -j ORATRACE

I've created a new ORATRACE chain. All established connections from three specific hosts (logged in systools.t_oclient_info - see the first section of the article) which are connection to port 1521 (where the Oracle Listener is listening) are redirected to ORATRACE chain. Each packet which comes to ORATRACE chain is checked (using of Knuth–Morris–Pratt string searching algorithm) for presence of the "DEADBEEF" sequence. If this sequence has been found packet is logged using NFLOG. Note that this setting is not permanent use iptables-save in order to save this configuration.

Using a Wireshark tool we read the specific NFLOG group (15) and write it to file in Hexadecimal form as a text to file:

# tshark -i nflog:15 -x >> /home/oracle/oracle_clients.log &

Now we can periodically check the oracle_clients.log file and if there is existing entry for specific source IP address we can remove specific rule for that address from the iptables (You can eventually write and setup script for that purpose).
Now we check what we've got.

At first content of one of logged packet:

 24.460554 192.168.56.1 -> 192.168.56.103 TNS 292 Request, Data (6), SNS

0000  02 00 00 0f 08 00 01 00 08 00 01 02 05 00 0a 00   ................
0010  00 00 00 00 08 00 04 00 00 00 00 02 10 00 08 00   ................
0020  00 06 20 02 0a 00 27 00 00 00 45 81 06 00 0f 00   .. ...'...E.....
0030  00 01 00 00 06 00 11 00 00 0e 00 00 12 00 10 00   ................
0040  08 00 27 1e 87 94 0a 00 27 00 00 00 08 00 00 00   ..'.....'.......
0050  d4 00 09 00 45 00 00 d0 13 b4 40 00 40 06 34 bb   ....E.....@.@.4.
0060  c0 a8 38 01 c0 a8 38 67 c4 d5 05 f1 83 88 54 66   ..8...8g......Tf
0070  8d 96 78 7b 80 18 00 e5 0b 57 00 00 01 01 08 0a   ..x{.....W......
0080  04 f4 41 41 01 ae a9 05 00 9c 00 00 06 00 00 00   ..AA............
0090  00 00 de ad be ef 00 92 0a 20 04 00 00 04 00 00   ......... ......
00a0  04 00 03 00 00 00 00 00 04 00 05 0a 20 04 00 00   ............ ...
00b0  08 00 01 00 00 76 4f fd 8f 35 88 00 12 00 01 de   .....vO..5......
00c0  ad be ef 00 03 00 00 00 04 00 04 00 01 00 01 00   ................
00d0  02 00 01 00 03 00 00 00 00 00 04 00 05 0a 20 04   .............. .
00e0  00 00 02 00 03 e0 e1 00 02 00 06 fc ff 00 02 00   ................
00f0  02 00 00 00 00 00 04 00 05 0a 20 04 00 00 0c 00   .......... .....
0100  01 00 11 06 10 0c 0f 0a 0b 08 02 01 03 00 03 00   ................
0110  02 00 00 00 00 00 04 00 05 0a 20 04 00 00 03 00   .......... .....
0120  01 00 03 01

And now we use grep for shortening the output in order for better reading:

$ cat oracle_clients.log | egrep 'TNS|be ef'| grep -v 00c0
 21.740456 192.168.56.102 -> 192.168.56.103 TNS 292 Request, Data (6), SNS
0090  00 00 de ad be ef 00 92 09 20 08 00 00 04 00 00   ......... ......
 24.460554 192.168.56.1 -> 192.168.56.103 TNS 292 Request, Data (6), SNS
0090  00 00 de ad be ef 00 92 0a 20 04 00 00 04 00 00   ......... ......
 27.850304 192.168.56.101 -> 192.168.56.103 TNS 292 Request, Data (6), SNS
0090  00 00 de ad be ef 00 92 0a 10 05 00 00 04 00 00   ................

$ ./oravsn.sh a200400

HEX: a200400 VSNUM: 169870336 VER: 10.2.0.4.0

$ ./oravsn.sh a100500

HEX: a100500 VSNUM: 168822016 VER: 10.1.0.5.0

$ ./oravsn.sh 9200800

HEX: 9200800 VSNUM: 153094144 VER: 9.2.0.8.0

According to above output host 192.168.56.1 is connecting using 10.2.0.4 version of client, host 192.168.56.101 using 10.1.0.5 and third host 192.168.56.102 shows is using 9.2.0.8 version of client (which makes him #1 candidate for client upgrade).

 Final thoughts

Be aware of that as Oracle Database comes with new features by each release, it also comes with changes in SQLNet protocol (since 11gR1 it has changed significantly), changes of default settings (SDU for example) and changes in authentication. Keep on mind these facts when you plan to upgrade your Database so you have to put a spotlight on Database client as well. It's possible to upgrade clients after upgrade of Database but I suggest to do that as first step (before Database upgrade) whenever it's possible. Otherwise you should be at least aware of possible consequences and implications of using older client version.

 

A misleading error message...

Oracle Database might be sometimes very surprising by produced error messages. Here is one of them which has occurred during adding of datafile to existing tablespace.

Examine following command and it's output:

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SYS@EDMREP AS SYSDBA> alter tablespace DATA add datafile '/storage/edm/oradata318/EDMREP/data36.dbf' size 30G;
alter tablespace DATA add datafile '/storage/edm/oradata318/EDMREP/data36.dbf' size 30G
*
ERROR at line 1:
ORA-01119: error in creating database file '/storage/edm/oradata318/EDMREP/data36.dbf'
ORA-17502: ksfdcre:4 Failed to create file /storage/arclog/EDMREP/archivelog/2016_02_18/o1_mf_1_350404_1p9RIDy_X_.arc

 OK, we've got "ORA-01119: error in creating database file" followed by "ORA-17502: ksfdcre:4 Failed to create file". Let's check description for ORA-17502 error which says: "File creation failed due to either insufficient OS permission or the file already exists." Well... of course the file exists as it's archive redo log file (as DB is in archivelog mode) and also it's couldn't be insufficient OS permission problem as archive redo log file had been created successfully.

Well, at this point I already knew the reason (or I was almost sure what the real reason might be) of command failure but let's continue in (guess what) game. So let's take a look at permissions first and other information at OS level:

oracle@[charon ~]$ cd /storage/arclog/EDMREP/archivelog/2016_02_18/
oracle@[charon /storage/arclog/EDMREP/archivelog/2016_02_18]$ touch test
oracle@[charon /storage/arclog/EDMREP/archivelog/2016_02_18]$ ls -la test
-rw-r--r-- 1 oracle dba 0 Feb 18 10:06 test
oracle@[charon /storage/arclog/EDMREP/archivelog/2016_02_18]$ rm test
oracle@[charon ~]$ ls -la /storage/arclog/EDMREP/archivelog/2016_02_18/o1_mf_1_350404_1p9RIDy_X_.arc
-r--r----- 1 oracle dba 91896320 Feb 18 10:05 /storage/arclog/EDMREP/archivelog/2016_02_18/o1_mf_1_350404_1p9RIDy_X_.arc
oracle@[charon ~]$ df -h | grep arclog
/dev/vx/dsk/edmdata_datadg/ecm_arclog_vol   150G    40G       104G    28%    /storage/arclog

As shown above permissions are correct, destination directory is accessible, writeable and has sufficient of space available. Archive redo log file is present but this is expected behaviour of course. Logically and according to above checks obviously this error is completely misleading and took us away from real cause. Now back to real reason of the error.

oracle@[charon ~]$ df -h | grep oradata318
/dev/vx/dsk/edmdata_datadg/edm_oradata318_vol   900G   871G        29G    97%    /storage/edm/oradata318
oracle@[charon ~]$ df -h | grep oradata319
/dev/vx/dsk/edmdata_datadg/edm_oradata319_vol   900G   661G       238G    74%    /storage/edm/oradata319

Yes, this is the real cause of error message. There is no free space for 30 GB datafile in /storage/edm/oradata318 mount point, so we use next available.

SYS@EDMREP AS SYSDBA> alter tablespace DATA add datafile '/storage/edm/oradata319/EDMREP/data36.dbf' size 30G;

Tablespace altered.

As you can see it's obvious that text of ORA-17502 message was irrelevant and moreover completely misleading...

Oracle EM Cloud Control 12c: Oracle Agent Error UploadMaxDiskUsedPct: 98

Recently I've got a notification which said: "Agent has stopped monitoring. The following errors are reported : COLL_DISABLED|DISK_FULL". This message is pretty clear and straightforward. ORACLE_HOME for agent is located in /opt mount point.

[oracle@sol01 ~ ]$ df -h | grep opt
rpool/OPT              120G    109G        11G    99%    /opt

After the cleaning in order to make more space (note: default value for UploadMaxDiskUsedPct is 98 percent) agent usually resumes operations but this time it has remained in disabled state even if disk had sufficient amount of space (32 GB).

[oracle@sol01 ~ ]$ df -h | grep opt
rpool/OPT              120G    67G        32G    68%    /opt

[oracle@sol01 ~ ]$ emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Agent Version     : 12.1.0.2.0
OMS Version       : 12.1.0.2.0
Protocol Version  : 12.1.0.1.0
Agent Home        : /opt/oracle/agent12c/agent_inst
Agent Binaries    : /opt/oracle/agent12c/core/12.1.0.2.0
Agent Process ID  : 22454
Parent Process ID : 22163
Agent URL         : https://sol01.local:3875/emd/main/
Repository URL    : https://cloud.local:4903/empbs/upload
Started at        : 2016-02-08 17:59:34
Started by user   : oracle
Last Reload       : 2016-02-16 14:24:27
Last successful upload                       : 2016-02-16 14:14:40
Last attempted upload                        : 2016-02-16 14:14:40
Total Megabytes of XML files uploaded so far : 42.78
Number of XML files pending upload           : 0
Size of XML files pending upload(MB)         : 0
Available disk space on upload filesystem    : 1.30%
Collection Status                            : [COLLECTIONS_HALTED(
  UPLOAD SYSTEM Threshold (UploadMaxDiskUsedPct: 98) exceeded with 98.69987)]
Heartbeat Status                             : Ok
Last attempted heartbeat to OMS              : 2016-02-16 14:26:39
Last successful heartbeat to OMS             : 2016-02-16 14:26:39
Next scheduled heartbeat to OMS              : 2016-02-16 14:27:39

---------------------------------------------------------------
Agent is Running and Ready

In this case solution is simple and you don't have to restart agent, you can manually invoke upload instead...

[oracle@sol01 ~ ]$ emctl upload
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD upload completed successfully

...and then agent will resume operation.

[oracle@sol01 ~ ]$ emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Agent Version     : 12.1.0.2.0
OMS Version       : 12.1.0.2.0
Protocol Version  : 12.1.0.1.0
Agent Home        : /opt/oracle/agent12c/agent_inst
Agent Binaries    : /opt/oracle/agent12c/core/12.1.0.2.0
Agent Process ID  : 22454
Parent Process ID : 22163
Agent URL         : https://sol01.local:3875/emd/main/
Repository URL    : https://cloud.local:4903/empbs/upload
Started at        : 2016-02-08 17:59:34
Started by user   : oracle
Last Reload       : 2016-02-16 14:24:27
Last successful upload                       : 2016-02-16 14:29:40
Last attempted upload                        : 2016-02-16 14:29:40
Total Megabytes of XML files uploaded so far : 42.78
Number of XML files pending upload           : 0
Size of XML files pending upload(MB)         : 0
Available disk space on upload filesystem    : 32.53%
Collection Status                            : Collections enabled
Heartbeat Status                             : Ok
Last attempted heartbeat to OMS              : 2016-02-16 14:29:39
Last successful heartbeat to OMS             : 2016-02-16 14:29:39
Next scheduled heartbeat to OMS              : 2016-02-16 14:30:39

---------------------------------------------------------------
Agent is Running and Ready


 

Newer posts → Home ← Older posts