Monday, November 18, 2013

Resizing MGMT_TABLESPACE in Grid Control 12c

We noticed the MGMT_TABLESPACE in our 12c Grid Control instance was very big. Almost 380gb of data was stored in this tablespace.
Because of all the targets registered in this Grid Control instance and the frequent usage of this Grid Control, I did not think anything was wrong.
Untill someone looked into this and found out it was a bug in 12c...

Note 1502370.1 describes this bug and also the solution for this. I had to install a patch and truncate the em_job_metrics table. After applying the patch and truncating the table, there was only around 10gb of data left in the MGMT_TABLESPACE.
Because I already experienced something like this 5 years ago in a 10g OEM instance (
http://basklaassen.blogspot.com/2008/06/resizing-oem-database-10g.html), I knew that is was not easy to reclaim the space in this tablespace. And we had a a lot to reclaim, only 10gb data in this tablespace of 380gb...

I decided to check my Oracle support, maybe someone els already had the same problem. I found bug 17461366, making it impossible to reorg the mgmt_tablespace because of aq objects. Same problem I faced five years ago. Because I really wanted to reclaim the free space (370gb), I decided to follow my own notes. Allthough these steps ware performed on a 10g OEM environment and now I was working with a 12c Grid Control environment.

Just like 5 years ago, I could think of two ways to reclaim the space.
1. export mgmt_tablespace, drop the tablespace and import it again 
2. export sysman, drop repository and, run scripts from the oms_home and import sysman again

Export MGMT_TABLESPACE

Unfortunately, dropping the mgmt_tablespace was a mission impossible. The export succeeded without errors, but I was not able to drop the tablespace. After dropping some related objects that were showing in the errors I recieved, I decided this was not going to work. I restored the database and executed option 2.

Export Sysman user(s)

I decided to follow note 388090.1. This note describes a platform migration for a 10g Grid Control environment, but I could not find any document about a 12c Grid Control environment.

 Just to be sure, I also exported (expdp) the other sysman users (sysman_apm, sysman_mds, sysman_opss and sysman). Also set job_queue_processes to zero
 
The next step was to drop the current repository using the repmanger utility ($OMS_HOME/sysman/admin/emdrep/bin)
The repmanager also did not drop the mgmt_tablespace, I had to drop the tablespace myself. But after deleting the repository I was able to drop the tablespace. The drop repository did drop all the sysman users and other tablespaces

According to the note, before the impdp, I had to run some scripts from the $OMS_HOME/sysman/admin/emdrep/sql/core/latest/admin directory.
These script should create the sysman users and tablespaces. The following sql's were executed :
- admin_create_tablespaces.sql
- admin_create_repos_user.sql
- admin_pre_import.sql
- admin_sys_procs.sql
- admin_profiles.sql
- admin_grants_repos_user.sql
- admin_grants_view_user.sql

Now, you should be ready to inport the sysman scheme again (impdp). But the import showed a lot of errors. Mean reason, the sysman_ro user was not available and the mgmt_ad4j_ts tablespace was not created. Again I decided to drop the sysman user again and start over again. But this time, instead of running the scripts before the import I created a whole new repository using Repmanager.

The new repository did have the sysman_ro user and also the missing tablespace.
At this time I started the import againg, using the 'table_exist_action=replace' option. The repository creation already created the sysman objects, so I wanted the impdp to replace the already created tables with the tables from the dump. I noticed one table creation error in the logfile, the em_job_type_creds_info table was not created by impdp. I had to create this one after the impdp.
Also the other sysman users were not created by the repmanager. I had to create and import these users  (sysman_.mds, sysman_apm and sysman_opss). 

After the import I returned to the note again, to check for post import steps.
Again in $OMS_HOME/sysman/admin/emdrep/sql/core/latest/admin
- admin_recompile_invalid.sql
- admin_create_synonyms.sql
- admin_post_import.sql

Reset the job_queue_processes to its original  value and submit the EM dbms jobs
- admin_submit_dbms_jobs.sql


After compiling, there were still problems with aq objects.
I dropped the aq objects (exec dbms_aqadm.drop_queue_table) and created (dbms_aqadm.create_queue_table) them again. This solved the problemens and also cleared the invalid objects

At this point, there were no invalid objects anymore, so the oms could be started again to see what happens.
Oms started oke and I was able to login 12c Grid Control.

With 12c Grid Control running, I noticed the oms was bouncing every 12 minutes by itself.
I checked the  emctl.msg file in /gc_inst/em/EMGC_OMS1/sysman/log for errors and found the following error.

HealthMonitor Nov 15, 2013 12:57:48 PM PbsAdminMsgListener error: PbsAdminMsgListener thread timed out.
Critical error err=3 detected in module PbsAdminMsgListener
OMS will be restarted. A full thread dump will be generated in the log file






It seemed the em_cntr_queue was missing ? I checked and found out some other queues were also missing (not in the dump ?) I recreated the missing queues (check the queues in an ota Grid environment) and that solved the restart error. Now oms was not restarting by itself anymore.

Reclaimable free space 

After these steps (took me two days...) I reclaimed 370gb. The mgmt_tablespace is now 10gb (instead of the 380gb before). The total size of the entiry database was shrinked from more then 400gb to 22gb !!