Whenever a document is repeatedly backed up, the 0s and 1s stay the same because the file is simply being duplicated.
The similarities between two files can be easily identified using block deduplication because the sequence of their 0s
and 1s remain exactly the same. In contrast to this, there are differences in online data. Online data has few exact
duplicates. Instead, online data files include files that may contain a lot of similarities between each file. For example, a
majority of files that contribute to increased data storage requirements come pre-compressed by their native
applications, such as:
• Images and video (such as the JPEG, MPEG, TIFF, GIF, PNG formats)
• Compound documents (such as .zip files, email, HTML, web pages, and PDFs)
• Microsoft Office application documents (including PowerPoint, MS-Word, Excel, and SharePoint)
노트: The DR Series system experiences a reduced savings rate when the data it ingests is already
compression-enabled by the native data source. It is highly recommended that you disable data
compression used by the data source, and especially for first-time backups. For optimal savings, the native
data sources need to send data to the DR Series system in a raw state for ingestion.
Block deduplication is not as effective on existing compressed files due to the nature of file compression because its 0s
and 1s change from the original format. Data deduplication is a specialized form of data compression that eliminates a
lot of redundant data. The compression technique improves storage utilization, and it can be used in network data
transfers to reduce the number of bytes that must be sent across a link. Using deduplication, unique chunks of data, or
byte patterns, can be identified and stored during analysis. As the analysis continues, other chunks are compared to the
stored copy and when a match occurs, the redundant chunk is replaced with a small reference that points to its stored
chunk. This reduces the amount of data that must be stored or transferred, which contributes to network savings.
Network savings are achieved by the process of replicating data that has already undergone deduplication.
By contrast, standard file compression tools identify short repeated substrings inside individual files, with the intent of
storage-based data deduplication being to inspect large volumes of data and identify large amounts of data such as
entire files or large sections of files that are identical. Once this has been done, this process allows for the system to
store only one copy of the specific data. This copy will be additionally compressed using single-file compression
techniques. For example, there may be cases where an email system may contain 100 or more emails where the same 1
Megabyte (MB) file is sent as an attachment and the following shows how this is handled:
• Without data deduplication, each time that email system is backed up, all 100 instances of the same attachment
are saved, which requires 100 MB of storage space.
• With data deduplication, only one instance of the attachment is actually stored (all subsequent instances are
referenced back to the one saved copy), with the deduplication ratio being approximately 100 to 1). The unique
chunks of data that represent the attachment are deduplicated at the block chunking level.
노트: The DR Series system does not support deduplication of any encrypted data, so there will be no
deduplication savings derived from ingesting encrypted data. The DR Series system cannot deduplicate
data that has already been encrypted because it considers that data to be unique, and as a result, cannot
deduplicate it.
In cases where self encrypting drives (SEDs) are used, when data is read by the backup application, it is decrypted by
the SED or the encryption layer. This works in the same way as if you were opening an MS-Word document that was
saved on a SED. This means that any data stored on a SED can be read and deduplicated. If you enable encryption in the
backup software, you will lose deduplication savings because each time the data is encrypted, the DR Series system
considers it to be unique.
Replication: Replication is the process by which the same key data is saved from multiple storage devices, with the goal
of maintaining consistency between redundant resources in data storage environments. Data replication improves the
level of fault-tolerance, which improves the reliability of maintaining saved data, and permits accessibility to the same
stored data. The DR Series system uses an active form of replication that lets you configure a primary-backup scheme.
During replication, the system processes data storage requests from a specified source to a specified destination (also
known as a target) that acts as a replica of the original source data.
14