Document version: 1.04 Last modified: January 30, 2009
NOTE: WinZip® users do not need to read or understand the information contained on this page. It is intended for developers of Zip file utilities.
Changes since the original version of this document are summarized in the Change History section below.
This document describes the file format that WinZip uses to create AES-encrypted Zip files. The AE-1 encryption specification was introduced in WinZip 9.0 Beta 1, released in May 2003. The AE-2 encryption specification, a minor variant of the original AE-1 specification differing only in how the CRC is handled, was introduced in WinZip 9.0 Beta 3, released in January, 2004. Note that as of WinZip 11, WinZip itself encrypts most files using the AE-1 format and encrypts others using the AE-2 format.
From time to time we may update the information provided here, for example to document any changes to the file formats, or to add additional notes or implementation tips.
Without compromising the basic Zip file format, WinZip Computing has extended the format specification to support AES encryption, and this document fully describes the format extension. Additionally, we are providing information about a no-cost third-party source for the actual AES encryption code--the same code that is used by WinZip. We believe that use of the free encryption code and of this specification will make it easy for all developers to add compatible advanced encryption to their Zip file utilities.
This document is not a tutorial on encryption or Zip file structure. While we have attempted to provide the necessary details of the current WinZip AES encryption format, developers and other interested third parties will need to have or obtain an understanding of basic encryption concepts, Zip file format, etc.
Developers should also review AES Coding Tips page.
WinZip Computing makes no warranties regarding the information provided in this document. In particular, WinZip Computing does not represent or warrant that the information provided here is free from errors or is suitable for any particular use, or that the file formats described here will be supported in future versions of WinZip. You should test and validate all code and techniques in accordance with good programming practice.
Contents
To perform AES encryption and decryption, WinZip uses AES functions written by Dr. Brian Gladman. The source code for these functions is available in C/C++ and Pentium family assembler for anyone to use under an open source BSD or GPL license from the AES project page on Dr. Gladman's web site. The AES Coding Tips page also has some information on the use of these functions. WinZip Computing thanks Dr. Gladman for making his AES functions available to anyone under liberal license terms.
Dr. Gladman's encryption functions are portable to a number of operating systems and can be static linked into your applications, so there are no operating system version or library dependencies. In particular, the functions do not require Microsoft's Cryptography API.
General information on the AES standard and the encryption algorithm (also known as Rijndael) is readily available on the Internet. A good place to start is http://www.nist.gov/public_affairs/releases/g00-176.htm.
AES-encrypted files are stored within the guidelines of the standard Zip file format using only a new "extra data" field, a new compression method code, and a value in the CRC field dependant on the encryption version, AE-1 or AE-2. The basic Zip file format is otherwise unchanged.
WinZip sets the version needed to extract and version made by fields in the local and central headers to the same values it would use if the file were not encrypted.
The basic Zip file format specification used by WinZip is available via FTP from the Info-ZIP group at ftp://ftp.info-zip.org/pub/infozip/doc/appnote-iz-latest.zip.
As for any encrypted file, bit 0 of the "general purpose bit flags" field must be set to 1 in each AES-encrypted file's local header and central directory entry.
Additionally, the presence of an AES-encrypted file in a Zip file is indicated by a new compression method code (decimal 99) in the file's local header and central directory entry, used along with the AES extra data field described below. There is no change in either the version made by or version needed to extract codes.
The code for the actual compression method is stored in the AES extra data field (see below).
For files encrypted using the AE-2 method, the standard Zip CRC value is not used, and a 0 must be stored in this field. Corruption of encrypted data within a Zip file is instead detected via the authentication code field.
Files encrypted using the AE-1 method do include the standard Zip CRC value. This, along with the fact that the vendor version stored in the AES extra data field is 0x0001 for AE-1 and 0x0002 for AE-2, is the only difference between the AE-1 and AE-2 formats.
NOTE: Zip utilities that support the AE-2 format are required to be able to read files that were created in the AE-1 format, and during decryption/extraction of files in AE-1 format should verify that the file's CRC matches the value stored in the CRC field.
Note: see the Zip file format document referenced above for general information on the format and use of extra data fields.
Offset | Size(bytes) | Content |
0 | 2 | Extra field header ID (0x9901) |
2 | 2 | Data size (currently 7, but subject to possible increase in the future) |
4 | 2 | Integer version number specific to the zip vendor |
6 | 2 | 2-character vendor ID |
8 | 1 | Integer mode value indicating AES encryption strength |
9 | 2 | The actual compression method used to compress the file |
Zip utilities that support AE-2 must also be able to process files that are encrypted in AE-1 format. The handling of the CRC value is the only difference between the AE-1 and AE-2 formats.
Value | Strength |
0x01 | 128-bit encryption key |
0x02 | 192-bit encryption key |
0x03 | 256-bit encryption key |
The encryption specification supports only 128-, 192-, and 256-bit encryption keys. No other key lengths are permitted.
(Note: the current version of WinZip does not support encrypting files using 192-bit keys. This specification, however, does provide for the use of 192-bit keys, and WinZip is able to decrypt such files.)
III. Encrypted file storage format
Additional overhead data required for decryption is stored with the encrypted file itself (i.e., not in the headers). The actual format of the stored file is as follows; additional information about these fields is below. All fields are byte-aligned.
Size (bytes) |
Content |
Variable | Salt value |
2 | Password verification value |
Variable | Encrypted file data |
10 | Authentication code |
Note that the value in the "compressed size" fields of the local file header and the central directory entry is the total size of all the items listed above. In other words, it is the total size of the salt value, password verification value, encrypted data, and authentication code.
The "salt" or "salt value" is a random or pseudo-random sequence of bytes that is combined with the encryption password to create encryption and authentication keys. The salt is generated by the encrypting application and is stored unencrypted with the file data. The addition of salt values to passwords provides a number of security benefits and makes dictionary attacks based on precomputed keys much more difficult.
Good cryptographic practice requires that a different salt value be used for each of multiple files encrypted with the same password. If two files are encrypted with the same password and salt, they can leak information about each other. For example, it is possible to determine whether two files encrypted with the same password and salt are identical, and an attacker who somehow already knows the contents of one of two files encrypted with the same password and salt can determine some or all of the contents of the other file. Therefore, you should make every effort to use a unique salt value for each file.
The size of the salt value depends on the length of the encryption key, as follows:
Key size | Salt size |
128 bits | 8 bytes |
192 bits | 12 bytes |
256 bits | 16 bytes |
This two-byte value is produced as part of the process that derives the encryption and decryption keys from the password. When encrypting, a verification value is derived from the encryption password and stored with the encrypted file. Before decrypting, a verification value can be derived from the decryption password and compared to the value stored with the file, serving as a quick check that will detect most, but not all, incorrect passwords. There is a 1 in 65,536 chance that an incorrect password will yield a matching verification value; therefore, a matching verification value cannot be absolutely relied on to indicate a correct password.
Information on how to obtain the password verification value from Dr. Gladman's encryption library can be found on the coding tips page.
This value is stored unencrypted.
Encryption is applied only to the content of files. It is performed after compression, and not to any other associated data. The file data is encrypted byte-for-byte using the AES encryption algorithm operating in "CTR" mode, which means that the lengths of the compressed data and the compressed, encrypted data are the same.
It is important for implementors to note that, although the data is encrypted byte-for-byte, it is presented to the encryption and decryption functions in blocks. The block size used for encryption and decryption must be the same. To be compatible with the encryption specification, this block size must be 16 bytes (although the last block may be smaller).
Authentication provides a high quality check that the contents of an encrypted file have not been inadvertently changed or deliberately tampered with since they were first encrypted. In effect, this is a super-CRC check on the data in the file after compression and encryption. (Additionally, authentication is essential when using CTR mode encryption because this mode is vulnerable to several trivial attacks in its absence.)
The authentication code is derived from the output of the encryption process. Dr. Gladman's AES code provides this service, and information about how to obtain it is in the coding tips.
The authentication code is stored unencrypted. It is byte-aligned and immediately follows the last byte of encrypted data.
For more discussion about authentication, see the authentication code FAQ below.
Beginning with WinZip 11, WinZip makes a change in its use of the AE-1 and AE-2 file formats. The file formats themselves have not changed, and AES-encrypted files created by WinZip 11 are completely compatible with version 1.02 the WinZip AES encryption specification, which was published in January 2004.
WinZip 9.0 and WinZip 10.0 stored all AES-encrypted files using the AE-2 file format, which does not store the encrypted file's CRC. WinZip 11 instead uses the AE-1 file format, which does store the CRC, for most files. This provides an extra integrity check against the possibility of hardware or software errors that occur during the actual process of file compression/encryption or decryption/decompression. For more information on this point, see the discussion of the CRC below.
Because for some very small files the CRC can be used to determine the exact contents of a file, regardless of the encryption method used, WinZip 11 continues to use the AE-2 file format, with no CRC stored, for files with an uncompressed size of less than 20 bytes. WinZip 11 also uses the AE-2 file format for files compressed in BZIP2 format, because the BZIP2 format contains its own integrity checks equivalent to those provided by the Zip format's CRC.
Other vendors who support WinZip's AES encryption specification may want to consider making a similar change to their own implementations of the specification, to get the benefit of the extra integrity check that it provides.
Note that the January 2004 version of the WinZip AE-2 specification, version 1.0.2, already required that all utilities that implemented the AE-2 format also be able to process files in AE-1 format, and should check on decryption/extraction of those files that the CRC was correct.
To reduce Zip file size, it is recommended that non-file entries such as folder/directory entries not be encrypted. This, however, is only a recommendation; it is permissible to encrypt or not encrypt these entries, as you prefer.
On the other hand, it is recommended that you do encrypt zero-length files. The presence of both encrypted and unencrypted files in a Zip file may trigger user warnings in some Zip file utilities, so the user experience may be improved if all files (including zero-length files) are encrypted.
If zero-length files are encrypted, the encrypted data portion of the file storage (see above) will be empty, but the remainder of the encryption overhead data must be present, both in the file storage area and in the local and central headers.
There is no requirement that all files in a Zip file be encrypted or that all files that are encrypted use the same encryption method or the same password.
A Zip file can contain any combination of unencrypted files and files encrypted with any of the four currently defined encryption methods (Zip 2.0, AES-128, AES-192, AES-256). Encrypted files may use the same password or different passwords.
Key derivation, as used by AE-1 and AE-2 and as implemented in Dr. Gladman's library, is done according to the PBKDF2 algorithm, which is described in the RFC2898 guidelines. An iteration count of 1000 is used. An appropriate number of bits from the resulting hash value are used to compose three output values: an encryption key, an authentication key, and a password verification value. The first n bits become the encryption key, the next m bits become the authentication key, and the last 16 bits (two bytes) become the password verification value.
As part of the process outlined in RFC 2898 a pseudo-random function must be called; AE-2 uses the HMAC-SHA1 function, since it is a well-respected algorithm that has been in wide use for this purpose for several years.
Note that, when used in connection with 192- or 256-bit AES encryption, the fact that HMAC-SHA1 produces a 160-bit result means that, regardless of the password that you specify, the search space for the encryption key is unlikely to reach the theoretical 192- or 256-bit maximum, and cannot be guaranteed to exceed 160 bits. This is discussed in section B.1.1 of the RFC2898 specification.
As opposed to using new version made by and version needed to extract values to signal AES encryption for a file, the new compression method is more likely to be handled gracefully by older versions of existing Zip file utilities. Zip file utilities typically do not attempt to extract files compressed with unknown methods, presumably notifying the user with an appropriate message.
In principle, the value of the salt should be different whenever the same password is used more than once, for the reasons described above, but this is difficult to guarantee.
In practice, the number of bytes in the salt (as specified by AE-1 and AE-2) is such that using a pseudo-random value will ensure that the probability of duplicated salt values is very low and can be safely ignored.
There is one exception to this: With the 8-byte salt values used with WinZip's 128-bit encryption it is likely that, if approximately 4 billion files are encrypted with the same password, two of the files will have the same salt, so it is advisable to stay well below this limit. Because of this, when using the same password to encrypt very large numbers of files in WinZip's AES encryption format (that is, files totalling in the millions, for example 2000 Zip files, each containing 1000 encrypted files), we recommend the use of 192-bit or 256-bit AES keys, with their 12- and 16-byte salt values, rather than 128-bit AES keys, with their 8-byte salt values.
Although salt values do not need to be truly random, it is important that they be generated in a way that the probability of duplicated salt values is not significantly higher than that which would be expected if truly random values were being used.
One technique for generating salt values is presented in the coding tips page.
The purpose of the authentication code is to insure that, once a file's data has been compressed and encrypted, any accidental corruption of the encrypted data, and any deliberate attempts to modify the encrypted data by an attacker who does not know the password, can be detected.
The current consensus in the cryptographic community is that associating a message authentication code (or MAC) with encrypted data has strong security value because it makes a number of attacks more difficult to engineer. For AES CTR mode encryption in particular, a MAC is especially important because a number of trivial attacks are possible in its absence. The MAC used with WinZip's AES encryption is based on HMAC-SHA1-80, a mature and widely respected authentication algorithm.
The MAC is calculated after the file data has been compressed and encrypted. This order of calculation is referred to as Encrypt-then-MAC, and is preferred by many cryptographers to the alternative order of MAC-then-Encrypt because Encrypt-then-MAC is immune to some known attacks on MAC-then-Encrypt.
Within the Zip format, the primary use of the CRC value is to detect accidental corruption of data that has been stored in the Zip file. With files encrypted according to the Zip 2.0 encryption specification, it also functions to some extent as a method of detecting deliberate attempts to modify the encrypted data, but not one that can be considered cryptographically strong. The CRC is not needed for these purposes with the WinZip AES encryption specification, where the HMAC-SHA1-based authentication code instead serves these roles.
The CRC has a drawback in that for very small files, such as files with four or fewer bytes, the CRC can be used, independent of the encryption algorithm, to determine the unencrypted contents of the file. And, in general, it is preferable to store as little information as possible about the encrypted file in the unencrypted Zip headers.
The CRC does serve one purpose that the authentication code does not. The CRC is computed based on the original uncompressed, unencrypted contents of the file, and it is checked after the file has been decrypted and decompressed. In contrast, the authentication code used with WinZip AES encryption is computed after compression/encryption and it is checked before decryption/decompression. In the very rare event of a hardware or software error that corrupts data during compression and encryption, or during decryption and decompression, the CRC will catch the error, but the authentication code will not.
WinZip 9.0 and WinZip 10.0 used AE-2 for all files that they created, and did not store the CRC. As of WinZip 11, WinZip instead uses AE-1 for most files, storing the CRC as an additional integrity check against hardware or software errors occurring during the actual compression/encryption or decryption/decompression processes. WinZip 11 will continue to use AE-2, with no CRC, for very small files of less than 20 bytes. It will also use AE-2 for files compressed in BZIP2 format, because this format has internal integrity checks equivalent to a CRC check built in.
Note that the AES-encrypted files created by WinZip 11 are fully compatible with January 2004's version 1.0.2 of the WinZip AES encryption specification, in which both the AE-1 and AE-2 variants of the file format were already defined.
Changes made in document version 1.04, January, 2009: Minor clarification regarding the algorithm used to generate the authentication code.
Changes made in document version 1.03, November, 2006: Minor editorial and clarifying changes have been made throughout the document. The following substantive technical changes have been made:
WinZip's AES encryption specification defines two formats, known as AE-1 and AE-2, which differ in whether the CRC of the encrypted file is stored in the Zip headers. While the file formats themselves remain unchanged, WinZip's usage of them is changing. Beginning with WinZip 11, WinZip uses the AE-1 format, which includes the CRC of the encrypted file, for many encrypted files, in order to provide an additional integrity check against hardware or software errors occurring during the compression/encryption or decryption/decompression processes. Note that AES-encrypted files created by WinZip 11 are completely compatible with the previous version of the WinZip encryption specification, January 2004's version 1.0.2.
Changes made in document version 1.02, January, 2004: The introductory text at the start of the document has been rewritten, and minor editorial and clarifying changes have been made throughout the document. Two substantive technical changes have been made:
Standard Zip files store the CRC of each file's unencrypted data. This value is used to help detect damage or other alterations to Zip files. However, storing the CRC value has a drawback in that, for a very small file, such as a file of four or fewer bytes, the CRC value can be used, independent of the encryption algorithm, to help determine the unencrypted contents of the file.
Because of this, files encrypted with the new AE-2 method store a 0 in the CRC field of the Zip header, and use the authentication code instead of the CRC value to verify that encrypted data within the Zip file has not been corrupted.
The only differences between the AE-1 and AE-2 methods are the storage in AE-2 of 0 instead of the CRC in the Zip file header,and the use in the AES extra data field of 0x0002 for AE-2 instead of 0x0001 for AE-1 as the vendor version.
Zip utilities that support the AE-2 format are required to be able to read files that were created in the AE-1 format, and during decryption/extraction of files in AE-1 format should verify that the file's CRC matches the value stored in the CRC field.
The description of the key generation mechanism has been updated to point out a limitation arising from its use of HMAC-SHA1 as the pseudo-random function: When used in connection with 192- or 256-bit AES encryption, the fact that HMAC-SHA1 produces a 160-bit result means that, regardless of the password that you specify, the search space for the encryption key is unlikely to reach the theoretical 192- or 256-bit maximum, and cannot be guaranteed to exceed 160 bits. This is discussed in section B.1.1 of the RFC2898 specification.
Document version: 1.04
Last modified: January 30, 2009
Copyright© 2003- Corel Corporation.
All Rights Reserved