Compaq Backup and Recovery for Microsoft SQL Server 6.x Page 19
1997 Compaq Computer Corporation, All Rights Reserved Doc No 444A/0797
tape drive writes out the current data, stops and waits for the data stream to start again. Once it
does, the tape drive has to search for the end of data, reposition its write heads, and continue
writing. This overhead is known as the head positioning overhead.
4. File-based backup software, with software data compression turned on, will typically utilize the
CPU(s) heavily. To achieve optimal backup throughput in this case, make sure you have plenty of
CPU power (a higher speed Pentium or Pentium Pro processor, or dual processors if the
compression algorithm is multi-threaded). Use the Performance Monitor to determine your CPU
utilization level. If your CPU becomes saturated, then your performance can suffer greatly.
Hardware data compression, on the other hand, does not utilize the CPU(s) at all, and only the
system I/O imposes some overhead on the processors.
5. When using the on-line backup method, your backup throughput can be limited by SQL Server and
how fast it can perform reads from the disk subsystem. For SQL Server version 6.0 and above
however, we will see that in most cases this is no longer a limitation.
6. With DLT Tape drives, the block size used to write the data to the tape media can become the
limiting factor, so that the benefits of compression do not become apparent. We will discuss this
further in the next section.
7. It is extremely difficult to predict the backup duration and throughput without knowledge of the
compressibility of your data. Even once that is known, one can only predict, with a fair
inaccuracy. In highly volatile environments, where the data changes rapidly, it is even difficult to
use past experiences with the same database.
Consequently, for the rest of this document, we have chosen to present performance numbers with a
database that has a very low percentage of repeated data and unused space, thus a low compression
ratio
22
. Results with such a database should be treated as the “worst-case” scenario, and any database
that contains compressible data would yield higher backup throughputs. Where applicable and known,
we will provide additional considerations applicable for data sets with higher compression ratios.
An advantage of software compression tools is that they usually give the user a number representing
the amount of compression that was achieved on his/her data, expressed as a percentage or ratio. Some
backup applications are able to give such a number for hardware compression as well (e.g: Cheyenne
ARCserve). Without assistance from the application however, determining a precise compression ratio
for your data using hardware compression can be difficult at best. In order to measure the compression
achieved on our test database (see above) with the DLT tape drive, a special SCSI tape utility that
counts blocks sent vs. blocks written was used. Short of such a utility however, it may be necessary to
employ a bit of creativity.
One way to estimate compression is to back up a large, representative sample of your data to a single
tape drive, and measure the time, or throughput
23
. Next, disable compression on the drive and repeat
the operation. Divide the two numbers, and that is your estimated ratio of compression. For DLT
drives, this method will only work with applications that employ a large block size, such as SQL Server
6.5 but not 6.0 (see following section). When using this method with SQL Server 6.5, we found that
the number returned correlated quite closely to the compression ratio measured with the tape utility.
Another way to estimate compression is to back up your entire data to a single tape drive, while
monitoring the progress. One way to do this is using the STATS = percentage option of the DUMP
DATABASE command, which periodically returns the percentage of data pages dumped. Enough data
should be present to fill up the tape, and the amount backed up should be recorded when the tape runs
22
Our test database compressed approximately 1.13:1 (or ~11.75%) with software data compression, and
approximately 1.18:1 (or ~15.25%) with hardware data compression.
23
See the later section entitled Calculating Database Dump Throughput.