Executable Programs
When a user requests a program to run, the operating system performs a number of operations to
transform the executable program on disk to a running program. First, the directories in the user’s current
PATH environment variable must be scanned to find the correct copy of the program. Then, the system
loader (not to be confused with the ld command, which is the binder) must resolve any external references
from the program to shared libraries.
To represent the user’s request, the operating system creates a process, which is a set of resources, such
as a private virtual address segment, required by any running program.
The operating system also automatically creates a single thread within that process. A thread is the current
execution state of a single instance of a program. In AIX Version 4 and later, access to the processor and
other resources is allocated on a thread basis, rather than a process basis. Multiple threads can be
created within a process by the application program. Those threads share the resources owned by the
process within which they are running.
Finally, the system branches to the entry point of the program. If the program page that contains the entry
point is not already in memory (as it might be if the program had been recently compiled, executed, or
copied), the resulting page-fault interrupt causes the page to be read from its backing storage.
Interrupt Handlers
The mechanism for notifying the operating system that an external event has taken place is to interrupt the
currently running thread and transfer control to an interrupt handler. Before the interrupt handler can run,
enough of the hardware state must be saved to ensure that the system can restore the context of the
thread after interrupt handling is complete. Newly invoked interrupt handlers experience all of the delays of
moving up the hardware hierarchy (except page faults). Unless the interrupt handler was run very recently
(or the intervening programs were very economical), it is unlikely that any of its code or data remains in
the TLBs or the caches.
When the interrupted thread is dispatched again, its execution context (such as register contents) is
logically restored, so that it functions correctly. However, the contents of the TLBs and caches must be
reconstructed on the basis of the program’s subsequent demands. Thus, both the interrupt handler and the
interrupted thread can experience significant cache-miss and TLB-miss delays as a result of the interrupt.
Waiting Threads
Whenever an executing program makes a request that cannot be satisfied immediately, such as a
synchronous I/O operation (either explicit or as the result of a page fault), that thread is put in a wait state
until the request is complete. Normally, this results in another set of TLB and cache latencies, in addition
to the time required for the request itself.
Dispatchable Threads
When a thread is dispatchable, but not actually running, it is accomplishing nothing useful. Worse, other
threads that are running may cause the thread’s cache lines to be reused and real memory pages to be
reclaimed, resulting in even more delays when the thread is finally dispatched.
Currently Dispatched Threads
The scheduler chooses the thread that has the strongest claim to the use of the processor. The
considerations that affect that choice are discussed in Performance Overview of the CPU Scheduler. When
the thread is dispatched, the logical state of the processor is restored to the state that was in effect when
the thread was interrupted.
Current Instructions
Most of the machine instructions are capable of executing in a single processor cycle, if no TLB or cache
miss occurs. In contrast, if a program branches rapidly to different areas of the program and accesses
data from a large number of different areas, causing high TLB and cache-miss rates, the average number
of processor cycles per instruction (CPI) executed might be much greater than one. The program is said to
exhibit poor locality of reference. It might be using the minimum number of instructions necessary to do its
Chapter 2. Performance Concepts 9