Three Turves: Monitoring Application Locks in Oracle

Oracle provides a variety of forms of locking protection for SQL operations to ensure that correct transactional semantics are provided to applications. Nine times out of ten when looking at issues related to locking, we will be interested in the locking facilities used by SQL, for example by monitoring row lock contention and tracking down which objects are the subject of the contention.

For some purposes, the locking provided by SQL operations does not meet the needs of the application. For these situations, Oracle provides the DBMS_LOCK package which provides application level locking.

This posting describes how to understand application level locking activity from AWR, ASH and other sources of information. It will help you to understand Oracle wait events like 'enq: UL - contention' and diagnose one possible cause of 'missing' response time in which some of the info provided by the Oracle AWR report may not be quite what you would expect.

Locking Strategies

There are two types of strategy that I've encountered when looking at application level locking. The first strategy uses DBMS_LOCK.REQUEST to try to acquire a lock and optionally specify the length of time for which the application is prepared to wait if another session already has the lock.

The second strategy is one that I would call 'polling' in which the application performs a rapid check to try to acquire a shared resource. This may use DBMS_LOCK with a very short timeout or it could (more likely) use a completely different implementation. If the resource cannot be acquired then the application waits using DBMS_LOCK.SLEEP for a period of time before retrying.

Trying it Out

I wanted to be able to run some predictable tests and then look at the results using various monotoring tools. For this purpose, I used some sample code that in the form of a PL/SQL package creates and then acquires an application lock called 'printer_lock' using DBMS_LOCK. I then started two sessions using SQL*Plus. In the first session, I created the lock and acquired it. In the second session I tried again to acquire the same lock which of course caused this session to wait because the lock was held by the other session.

I also added my own procedure to call DBMS_LOCK.SLEEP. In this case I didn't do any real 'locking' - I just wanted to see how a session calling this procedure would show up in the monitoring reports.

Visibility in Performance Reports

AWR can shed light on the wait time for application level locking using DBMS_LOCK.REQUEST. From this we can get a good understanding of whether our application might be getting delayed by application locks.

The first sections that contain information that we're interested are the 'wait class' and 'wait events'. Let's look at the more detailed 'wait events' section first. Here we will see the wait event 'enq: UL - contention' providing details of the total time waited while using APIs such as DBMS_LOCK.REQUEST (but not SLEEP) to acquire a lock. This info is also summarized into the wait class under 'Application'.

If the locking is being executed by PL/SQL code then AWR may also show us the top-level calling PL/SQL procedure in the 'SQL by Elapsed Time' section. The time spent waiting to acquire the lock will be counted in the elapsed time of this procedure.

The ASH report provides similar visibility, again highlighting the same wait event and the calling procedure. It can also provide details of the session that is waiting. Because ASH works by sampling it may give a slightly different picture from that shown by AWR and can be a better tool for some types of problem.

Because the meaning of any locks is application defined, neither AWR nor ASH provide any information regarding the resource that is the subject of any contention. We will look at how to diagnose this later.

For applications using a polling strategy, the visibility is not so good. In fact it would be very easy to miss it completely. The database performance reports seem to assume that time spent in DBMS_LOCK.SLEEP is of little of no interest to the audience of an AWR or ASH report. It is visible in the AWR report as the wait event 'PL/SQL lock timer', but you will need to look carefully to find this event because it appears very low down in the table (possibly right at the end), even if the time waited is very high.

The thing which is perhaps more surprising is that time spend in DBMS_LOCK.SLEEP is not counted in the elapsed time of the calling procedure. This means that for any procedure that sleeps in this way, you can't rely on the elapsed time reported by AWR. The same principle seems to apply to ASH - if the ASH sample catches the procedure sleeping like this then it doesn't seem to register it as a 'hit' on the calling procedure at all. This may make sense to someone looking at a problem from a database infrastructure perspective, but to an application developer looking at an AWR or ASH report it is very counter-intuitive.

System Tables and Views

As I said earlier, neither AWR nor ASH will shed much light on the resources that are the subject of application level locking conflicts. To get any detail on this we will need to query some system tables and views.

The first query looks for user locks. The most interesting columns are CTIME (which I think is the number of seconds since the lock was acquired), the SID which is the session currently holding the lock and ID1 which is used in the second query

SQL> select * from v$lock where TYPE = 'UL';
ADDR KADDR SID TY ID1 ID2 LMODE REQUEST -------- -------- ---------- -- ---------- ---------- ---------- ---------- CTIME BLOCK ---------- ---------- 1EF3E270 1EF3E29C 35 UL 1073741921 0 6 0 76 0
The second query looks up the lock name using the ID from the previous query. Clearly since we are dealing with application level locking, the meaning of the lock name is application-defined, so you will need to use your knowledge of the application to figure out what it means and whether you can do anything to improve the situation. Note that the 'expiration' column does not relate to the tenure of the lock by any session, but rather the lifetime of this particular application-level lock. It can be set when the lock is created by DBMS_LOCK.ALLOCATE_UNIQUE and defaults to 10 days.

SQL> select * from dbms_lock_allocated where lockid=1073741921;

NAME
--------------------------------------------------------------------------------

LOCKID EXPIRATIO
---------- ---------
printer_lock
1073741921 21-MAY-11

Naturally you could join the two queries, but I will leave that as an exercise for the reader.

Lock Housekeeping

The previous section mentioned the expiration time of an application level lock. In some circumstances this may also have a performance impact for your application. The reason is buried in the following sentence from the Oracle documentation…

"Subsequent calls to ALLOCATE_UNIQUE may delete expired locks to recover space."

The implication of this is that the housekeeping of the dbms_lock_allocated system table can also get mixed up with your application's execution. The system table is indexed only by name but the deletion statement used for housekeeping accesses the table by expiration date so the housekeeping query may cause a performance problem if your application creates a lot of locks with different names.

Here is an example AWR from an application that is being hit by this issue

Three Turves

Wednesday, 18 May 2011

Monitoring Application Locks in Oracle

No comments:

What's on my Desk?

About Me

Keywords

Blog Archive