domingo, 22 de marzo de 2020

Error SLOS depend-msg [No such file or directory] when installing RAC in LXC Containers

I was testing the installation of Oracle 19c in LXC Containers, a while ago I was able to install Oracle RAC 12.1.0.2 and RAC 12.2.0.1 in LXC Containers, so I was thinking Oracle RAC 19c was going to be the same situation but...


During the ejecution of "root.sh" I was receiving the following errors:

2019-11-27 10:19:59.203 :    CSSD:2389941504: [     INFO] clssscInitGlobalCTX: PERF_TIME started CLSFA for Flex
2019-11-27 10:19:59.203 :    CSSD:2389941504: [     INFO] Starting CSS daemon in exclusive mode with a role of hub
2019-11-27 10:19:59.205 :    GPNP:2389941504: clsgpnp_Init: [at clsgpnp0.c:708] '/u01/app/19.3/grid' in effect as GPnP home base.
2019-11-27 10:19:59.205 :    GPNP:2389941504: clsgpnp_Init: [at clsgpnp0.c:774] GPnP pid=37618, cli=clsuGpnpg GPNP comp tracelevel=1, depcomp tracelevel=0, tlsrc:init, apitl:0, tstenv:0, devenv:0, envopt:0, flags=2003
2019-11-27 10:19:59.214 :    GPNP:2389941504: clsgpnpkwf_initwfloc: [at clsgpnpkwf.c:404] Using FS Wallet Location : /u01/app/19.3/grid/gpnp/rac235test8/wallets/peer/

2019-11-27 10:19:59.214 :    GPNP:2389941504: clsgpnpkwf_initwfloc: [at clsgpnpkwf.c:416] Wallet readable. Path: /u01/app/19.3/grid/gpnp/rac235test8/wallets/peer/

2019-11-27 10:19:59.287 :    CSSD:2389941504: [     INFO] clssscInitGlobalCTX: Environment is production
2019-11-27 10:19:59.287 :    CSSD:2389941504: [     INFO] (:CLSN00143:)clssscInitGlobalCTX: CSSD process cannot get real-timepriority
2019-11-27 10:19:59.288 :    CSSD:2389941504: [     INFO] clsssc_logose: slos [-2], SLOS depend-msg [No such file or directory], SLOS error-msg [2]
2019-11-27 10:19:59.288 :    CSSD:2389941504: [     INFO] clsssc_logose: SLOS other info is [process is not running in real-time. rc = 0].

2019-11-27 10:19:59.288 :    CSSD:2389941504: [     INFO] (:CLSN00143:)clssscInitGlobalCTX: set priority system call had failed
2019-11-27 10:19:59.288 :    CSSD:2389941504: [     INFO] (:CLSN00143:)clssscInitGlobalCTX: set priority system call had failed calling clssscExit
2019-11-27 10:19:59.288 :    CSSD:2389941504: [     INFO] (:CSSSC00011:)clssscExit: A fatal error occurred during initialization
2019-11-27 10:19:59.288 :    CSSD:2389941504: [     INFO] clssscagSendNLSToAgent: Sending msg id 1730, size 56, product CRS, facility CRS, to agent
2019-11-27 10:19:59.773 :    CSSD:2230044416: [     INFO] clssscagSelect: endpoint(0x290) authenticated with user(root)
2019-11-27 10:19:59.773 :    CSSD:2230044416: [     INFO] clssscagProcessInitialMsg: Handshake successful with agent 1


Some others from ocss.log:

 CRS-4000: Command Start failed, or completed with errors.
>End Command output
2019-11-20 15:54:03: The exlusive mode cluster start failed, see Clusterware alert log for more information
2019-11-20 15:54:03: Executing cmd: /u01/app/19.3/grid/bin/clsecho -p has -f clsrsc -m 119
2019-11-20 15:54:03: Executing cmd: /u01/app/19.3/grid/bin/clsecho -p has -f clsrsc -m 119
2019-11-20 15:54:03: Command output:
>  CLSRSC-119: Start of the exclusive mode cluster failed
>End Command output
2019-11-20 15:54:03: CLSRSC-119: Start of the exclusive mode cluster failed
2019-11-20 15:54:03: ###### Begin DIE Stack Trace ######
2019-11-20 15:54:03:     Package         File                 Line Calling
2019-11-20 15:54:03:     --------------- -------------------- ---- ----------
2019-11-20 15:54:03:  1: main            rootcrs.pl            355 crsutils::dietrap
2019-11-20 15:54:03:  2: crsinstall      crsinstall.pm        2439 main::__ANON__
2019-11-20 15:54:03:  3: crsinstall      crsinstall.pm        2334 crsinstall::perform_initial_config
2019-11-20 15:54:03:  4: crsinstall      crsinstall.pm        1026 crsinstall::perform_init_config
2019-11-20 15:54:03:  5: crsinstall      crsinstall.pm        1184 crsinstall::init_config
2019-11-20 15:54:03:  6: crsinstall      crsinstall.pm         446 crsinstall::CRSInstall
2019-11-20 15:54:03:  7: main            rootcrs.pl            552 crsinstall::new
2019-11-20 15:54:03: ####### End DIE Stack Trace #######

2019-11-20 15:54:03: ROOTCRS_BOOTCFG checkpoint has failed
2019-11-20 15:54:03:      ckpt: -ckpt -oraclebase /u01/app/grid -chkckpt -name ROOTCRS_BOOTCFG
2019-11-20 15:54:03: Invoking "/u01/app/19.3/grid/bin/cluutil -ckpt -oraclebase /u01/app/grid -chkckpt -name ROOTCRS_BOOTCFG"
2019-11-20 15:54:03: trace file=/u01/app/grid/crsdata/rac235test8/crsconfig/cluutil10.log
2019-11-20 15:54:03: Running as user oracle: /u01/app/19.3/grid/bin/cluutil -ckpt -oraclebase /u01/app/grid -chkckpt -name ROOTCRS_BOOTCFG
2019-11-20 15:54:03: Removing file /tmp/2Kp4_6j0kW
2019-11-20 15:54:03: Successfully removed file: /tmp/2Kp4_6j0kW
2019-11-20 15:54:03: pipe exit code: 0

2019-11-20 15:54:03: /bin/su successfully executed


I remember that When I was installing Oracle 12.1.0.2 I had exactly the same errors. Fortunately, at that time Oracle registered that as a bug and fixed the problem in the PSU 12.1.0.2.160719.
The fix was also include in Oracle RAC 12.2.0.1 so that's why I didn't hit the problem with this version. However it seems Oracle forgot to include this fix in 19c ( I didn't test 18c).

The problem is related to the containers itself, and after investigation it seems the following parameteres helped to fix the problem:


lxc.cgroup.cpu.rt_runtime_us = 950000
lxc.cap.drop = mac_admin mac_override sys_module sys_rawio

yum install chrony
systemctl enable chronyd.service
systemctl start chronyd.service



No hay comentarios:

Publicar un comentario

Oracle ACE Director Award - Deiby Gómez

Thanks #OracleACE Program for this awesome certificate recognizing the work I have done in the community for the last year. Looking forwa...