jueves, 23 de abril de 2020

Patching Oracle Cloud Classic Database Services kills the EMCC Agent

As you know, after successfully patching a database it is always needed to test all the applications that depend of the database.

I found this behavior after I patched a database that reside in an Oracle Cloud Classic Infrastructure.

All the applications were working fine and without any issues, everything was fine… until someone wanted to start the normal monitoring process though Oracle Enterprise Manager Cloud Control.

A support ticket reported that it seemed that the EMCC agent was not working, so, I thought: no problem, I will start the agent from the server as normal or maybe it needs an uploading and reloading data operation (normal issue when you work with EMMC agents).

I logon to the server using the oracle user, then I go to the path where the Agent was supposed to be installed and… SURPRISE! The EMCC agent was not there anymore, all the installation disappeared, I checked the Oracle Inventory and nothing, there was no trace that the agent existed before.

Immediately I started a forensic analysis trying to find the cause of the problem, of course, the patching process made the day before seems to be the guilty.

This is the actual scenario, symptoms, cause and workarounds:

Scenario:
  • Database version: 12.2.0.1
  • Oracle Home: /u01/app/oracle/product/12.2.0/dbhome_1/
  • Infrastructure: Oracle Cloud Classic Infrastructure
  • Tool used to install the patch: dbaascli command line tool but I it happens even if you use the GUI Console.
  • EMCC Agent version: 13c, but It happens with any EMCC Agent version.
Symptoms:
  • EMCC agent not working, the EMCC Console reported that the agent was not active.
  • Agent Home directory /u01/app/oracle/product/13.3/agent/agent_inst disappeared.
  • Oracle Inventory at /u01/app/OraInventory/Contents.xml content had only the database Home.
Cause:

As I told you, the patching process was made using the dbaascli command line tool. I wont go into details, you can check and article with all the process in the next link (LINK).

What I did was investigate the log files generated during the patch application and track all the operations done.

You can find a series of log file in the directory/var/opt/oracle/log/dbpatchm/



The default log file is the dbpatchm.log file, so I investigated that first. I found the next steps:

2020-02-26 18:21:50 - INFO: started copying needed config files into golden image
2020-02-26 18:21:50 - INFO: em home /u01/app/oracle/product/13.3/agent/ is being copied to /u01/download/app/oracle
2020-02-26 18:21:50 - Output from cmd cp -pR /u01/app/oracle/product/13.3/agent/ /u01/download/app/oracle run on localhost  is:
2020-02-26 18:25:55 - cmd took 245 seconds


It seems that the tool is copying the Agent Home to a temporary directory, that is fine.
Then I found a lot of lines detailing the patching process, nothing strange so far.
But what happens after patching?


2020-02-26 18:31:17 - INFO: em home /u01/app/oracle/product/13.3/agent/agent_inst is being attached to the inventory
2020-02-26 18:31:17 - Output from cmd cd /u01/app/oracle/product/12.2.0/dbhome_1/oui/bin; ./runInstaller -silent -attachHome ORACLE_HOME="/u01/app/oracle/product/13.3/agent/agent_inst" ORACLE_HOME_NAME="EM_HOME" run on localhost  is:
2020-02-26 18:31:17 -
The user is root. Oracle Universal Installer cannot continue installation if the user is root.
2020-02-26 18:31:17 - cmd took 0 seconds
2020-02-26 18:31:17 - WARN : non-zero status returned
  Command: cd /u01/app/oracle/product/12.2.0/dbhome_1/oui/bin; ./runInstaller -silent -attachHome ORACLE_HOME="/u01/app/oracle/product/13.3/agent/agent_inst" ORACLE_HOME_NAME="EM_HOME"
  Exit: 255
  Excerpt:
The user is root. Oracle Universal Installer cannot continue installation if the user is root.

I found that during the process, Oracle is creating a new directory for the software, the old stuff is moved to /u01/download including the agent home. After that it tries to copy back the home and register it again in the inventory, but the registration fails due, a in my opinion, a silly reason: in order to register the Agent to the inventory the operation must run with the Oracle User instead of root. But the process that invoke the task (patching the database) must be run using the root user. How is this possible? Oracle bug? Or maybe I am not following the right path for patching? You can draw your own conclusions.

I tested the process two more times in a test environment:
  • First, I manually stopped the Agent and then patched the database using dbaascli. Same result
  • Then, I patched using the GUI console through a web browser, stopping the agent first. Same result.
Workaround:
Frustrated, I can think in two methods to avoid the issue for future patching tasks:
  • Before patching, copy the installation directory to a new location and after that, copy back and re-attach the Agent Home in the inventory, manually. (this is the action the tool tries to perform but fails)
  • Reinstall the agent binaries to a different location out of /u01/app before patching. But maybe this has no sense because my original installation is based on the Oracle best practices.
Summary:

I decided to share this post because during my analysis I never found a related Oracle document or bug reporting similar issues. Now you know the cause of the problem, maybe you can thing in a better way to avoid it.


No hay comentarios:

Publicar un comentario

Oracle ACE Director Award - Deiby Gómez

Thanks #OracleACE Program for this awesome certificate recognizing the work I have done in the community for the last year. Looking forwa...