Thursday, March 14, 2013

Automatic Shared Memory Management problem ?

From time to time one of out 10g databases (10.2.0.5) seems to 'hang'
Our monitoring shows a 'time out' on different checks and when trying to connect using sql, the sql session is hanging. No connection is possible.
A few days ago, something like this happened again. Instead of bouncing the database, I decided to look for clues to find out why the database was 'hanging'.
The server itself did not show anything. Nothing strange was going on at that time, also the alert.log of the database did not show anything. Nothing strange before the hang and nothing during the hang. Also the other 'dump' directories did not help me in any way.
At that time I noticed the database was responding again.

The next day I decided the run an AWR report using a snapshot before and one after the hanging. I noticed the wait event 'SGA: allocation forcing component growth' in the top 3 of wait events. I also remembered that sometimes the mmon proces is using 100% cpu during the hang period. I did not notice that this time, but decided to look closer at this event.

I found the following going on at the time of the 'hanging' of the database (using the v$sga_resize_ops)
We noticed the hanging around  4:05 pm....The v$sga_resize_ops shows..

WHEN                                COMPONENT                     OPER_TYPE
-------------------------           -------------------------              -------------
Mrt-12:16:08:11             DEFAULT buffer cache       SHRINK
Mrt-12:16:08:11             shared pool                          GROW
Mrt-12:16:08:11             DEFAULT buffer cache       SHRINK
Mrt-12:16:08:11             shared pool                          GROW
Mrt-12:16:08:11             DEFAULT buffer cache       SHRINK
Mrt-12:16:08:11             shared pool                          GROW
Mrt-12:16:08:11             DEFAULT buffer cache       SHRINK
Mrt-12:16:08:11             shared pool                          GROW
Mrt-12:16:08:11             DEFAULT buffer cache       SHRINK
Mrt-12:16:08:11             shared pool                          GROW
Mrt-12:16:08:11             DEFAULT buffer cache       SHRINK


The sga was resized  406 times (!!!) in one second. From 16:08:12 the resizing looked normal (?) again..

Mrt-12:16:08:11           shared pool                         GROW
Mrt-12:16:08:12           DEFAULT buffer cache     SHRINK
Mrt-12:16:08:12           shared pool                        GROW
Mrt-12:16:08:12           DEFAULT buffer cache     SHRINK
Mrt-12:16:08:12           shared pool                        GROW
Mrt-12:16:08:14           DEFAULT buffer cache     SHRINK


I have to check this next time one off our databases is hanging, but this does not look good to me.
I noticed some bugs on MOS, but with no solution. A workaround solution is to disable ASMM by setting the sga_target to 0.

No comments: