Introduction

Tarcrypt is a filter which process an input tar file, and outputs an encryption enhanced tar file containing extended header information related to the encryption.

The data is presented in a tar compatible format with additional PAX style extended headers describing items such as the compression and encryption algorithms, encrypted private key, public key, and HMAC information. The purpose of utilizing tar style headers, instead of just encrypting the entire file as one unit, is to maintain a format that can be used with existing Unix/Linux backup tools which already support tar file inputs, with minimum modifications to those tools.

Usage

To produce an encrypted tar file the following commands are used.

Creating a key file

tarcrypt genkey -f [keyfile.key] [-c "comment" ]

This will generate an RSA keypair, and prompt for a passphrase to protect the private key. The encrypted private key, public key, an HMAC key, and other information are stored in the key file. Only the HMAC key should be considered secret, as it is used to provide message authentication.

Encrypting a tar file

tar -cf - /some/files | tarcrypt encrypt -k [ keyfile.key ] \
>encrypted_file.tar.enc

This pipes the output of tar into tarcrypt, which produces the an output tar file with tarcrypt extensions. This has been tested on GNU tar, and supports tar files with standard GNU extensions along with files with PAX headers and extended file attributes. The -k option may be specified multiple times with different keys, which will produce an encrypted file that can be decoded using the password from any of the keys.

The encrypted private key, public key, and a hash of the HMAC key are stored within the generated tar file. Each file contained in the tar archive includes a signature generated by the secret HMAC key and the file contents, which can be used for authenticity verification.

Decrypting a tar file

cat encrypted_file.tar.enc |tarcrypt decrypt >plaintext.tar

You will be prompted for the passphrase protecting the private key embedded in the encrypted tar file’s global header. A standard tar file will be generated as output. If there are multiple keys used during encryption, then the passphrase for only one of the keys is needed.

Security

The key file generated by the genkey function contains the following components:

  • Encrypted RSA-2048 bit Private key

  • RSA Public key

  • Public key fingerprint

  • HMAC secret key

  • Comment section, including:

    • User ID and host the key was created from

    • Creation date

    • Optional comment specified with the "-c" flag

The HMAC secret key is used to produce a signature for each file in the archive. This key is generated by running the Public key through an HMAC-SHA-256 function, using a SHA-256 hash of the private key passphrase for the HMAC key. Therefore, since the public key is contained in the header of the encrypted tar file, and the user is required to enter the passphrase when restoring, this HMAC secret key can be re-created during restore for use in authenticating the data contents. A SHA-256 hash of the HMAC secret key is also included in the tar file header to validate if the key was re-generated correctly.

Since the private key is passphrase protected and the public key is used for encryption, this makes it safer to have the key file on the server for unattended backup operations. However the key file should still be protected since it contains the HMAC secret key. A compromise of this key will permit an attacker that has access to the backup server target to send forged data, although the attacker still will not be able to decode data on the backup server without the Private key passphrase.

The "-k" option can be specified multiple times during the encryption operation — this allows files to be decrypted by supplying the passphrase for any one of the supplied keys. This is useful if you require a secondary key to be used across multiple servers in case the passphrase for the primary key is lost.

Individual files within the tar file are encrypted using AES-256-CTR using a random key. This random key is in turned encrypted with the public key for each key file specified.

Tarcrypt file format

The format of the data produced by tarcrypt follows the standard POSIX PAX tar file layout. Each file data block is proceeded with a 512-byte header, containing metadata about the file (file name, size, date, owner, etc). In addition, this 512-byte header can be proceeded by a PAX header which contains a similar 512-byte header block followed by a data block containing key-value pairs that make up the PAX extended variables. Therefore, tarcrypt places its extended information into this PAX header.

PAX extend header fields used by tarcrypt

The global header contains the following fields

TC.eprivkey -- Encrypted RSA private key
TC.pubkey -- Public key matching the above private key
TC.pubkey.fingerprint -- Fingerprint of the public key
TC.hmackeyhash -- SHA256 hash of the HMAC authentication key
TC.keyfile.comment -- Comment line from the key file used to generate the encrypted tar
TC.version -- version number of the file format

If there is more than one encryption key file used, the above fields (with the exception of TC.version) are appended with a number indicating which key they belong to. For example: TC.pubkey.0 is the public key for the first key file, TC.pubkey.1 is for the second key. Additionally, a grouping field "TC.keygroups" is specified that groups certain keys together. For example:

TC.keygroups=0|1,2|3

This indicates that there are two key groups. The first group, "0|1", means that some files in this group are encoded with keys 0 and 1. The second group, "2|3", indicate that there exists other files that are encoded with keys 2 and 3.

In addition, each individual file header contains the following:

TC.filters -- which filters were used to process raw file (i.e., "compression|cipher")
TC.compression -- compression algorithm used prior to encryption
TC.cipher -- cipher algorithm string (i.e., rsa-aes256-gcm)
TC.original.size -- size of the original raw file
TC.hmac -- HMAC hash computed from the hmackey in the key file, and the raw file contents.

If more than one key file is used, then the key group that this file belongs to is specified with:

TC.keygroup -- which key group this file belongs to from TC.keygroups in the global header (0 indexed)

If an input file, after compression and encryption, is larger than the default internal buffer size (10 MB), the encrypted contents are broken into multiple segments. In that case, the following header field is included

TC.segmented.header=1 -- set to 1 for true, field not present for false.

This indicates that the file is represented as multiple segments in the tar file, in the format:

original_filename/part.[sequence_number]

That is, the original file header is converted to a directory type object, and each segment (of 10 MB) is represented as files under that directory with increasing sequence numbers in the file name.

The final segment is preceded with a PAX header with the fields:

TC.segmented.final=1 -- indicating this is the final segment
TC.hmac -- computed HMAC hash of the raw input file

(Again, TC.hmac may be followed by a numeric digit [0 indexed] indicating the key it belongs to, if more than one key was used)

The purpose of segmented files is that when compressing a file prior to encryption, the final size of the file is not known unless two passes are made (which takes time, and has failure modes if the file is being updated during processing). And the tar format requires that the 512-byte header block proceeding the file data block contain the exact size of the data contained within the following data blocks (which is now compressed and encrypted). So by processing the file and storing the output in internal buffers, tarcrypt can write out each segment header block at the time the buffer is filled, with a known file size for that segment to put in the header.

Information for existing backup tools

If a backup program supports consuming tar files, it may need to be modified to recognize the above headers, and reproduce them when generating a file restore. If the inbound tar file is not kept whole (i.e., if the file contents are broken down and the meta data extracted from the headers), then the backup software will need to be able to re-generate the appropriate header information when performing a restore operations, prior to passing the generated tar file back through the tarcrypt program. Note, that if a segmented file is ingested, the re-generated tar file upon restore does not need to maintain the segmented format. Instead, since the compressed/encrypted file size is known, it can safely represented as a single group of file blocks.