Strongbox: Secrets management

Introduction

When compute and applications are distributed across hundreds or thousands of sites, it is critical that data is protected from security breaches, both when in storage, and when transported over the network. Physical theft is a real possibility.

Secrets such as crypto keys, certificates, and third-party access credentials should not be bundled into application images. Secrets should instead be accessed through fully authenticated and audited APIs that allow secrets to be updated, and crypto keys to remain hidden, without updating software images.

An intrusion at one site must not compromise data at other sites, and when a security breach occurs, it must be easily isolated and mediated.

Strongbox is a system-wide distributed service in Avassa for managing secrets and policies. Secrets are automatically shared to sites where they are required, and policies are applied across the entire system. Sharing only occurs in one direction, from the management point and outward. Local secrets are not propagated upward to avoid poisoning from a potential breach.

Key Strongbox features include:

Cryptographic isolation of secrets between tenants and sites, separate keys for each tenant and site
One-step operation to block a tenant, a site, and a host
Fine-grained control of how secrets are distributed to sites
Local secrets storage is sealed until remotely unsealed
Fully audit-logged access to secrets
Centralized key management: key rotation, revocation, access
Encrypt and decrypt services that allow use without access to actual crypto keys
API for format-preserving encryption/decryption as well as masking of data before logging, storage, and transport
Fully encrypted and authenticated communication both between sites (inter-site) and between hosts at a site (inter-host)
Mount secret as file in the file system for a container

The seal

The Strongbox application consists of a protected core process that handles the plain text data. No data is allowed to leave the core process un-encrypted. The data is protected as long as the memory of the core process cannot be accessed.

The state of the Strongbox application is encrypted using a AES 256 GCM cipher when it leaves the core process.

The key to the Strongbox state cipher (called the sealkey) is not stored locally, it has to be provided by some external means. The process of providing the sealkey is called unsealing. The Strongbox application is unusable until it has been unsealed since it cannot access its internal state.

The sealkey is generated when the site is first initialized. It is presented exactly once and must be stored securely outside the system. The sealkey is split up into five parts, using the Shamir Secret Sharing algorithm. To recover the sealkey at least three of the five parts must be provided. It is recommended that the parts are stored separately from each other.

The Strongbox application runs on the controller nodes in an Avassa site. The nodes are connected using mutual TLS, ie TLS with client and server certificates. The entire site is unsealed as a unit. The individual nodes in the site are automatically unsealed as long as one node is unsealed.

Automatic unseal

The Control Tower must be unsealed by some external entity when restarted. Other sites request remote unseal from their parent site. Each site is globally unsealed. Individual tenants do not have to unseal, a site is either fully sealed or fully unsealed.

Automatic unseal is possible provided that a site has stored its sealkey with its parent site, and it has a token that allows it to access the parent site. The unseal secret is not stored in plain text at the parent, instead it is encrypted using a private-public key pair. The sealkey is encrypted using the public key before it is stored at the parent. In order to unseal, the sealkey needs to be decrypted using the private part of the key. The key-pair for this can be stored in a TPM to further secure the remote auto-unseal process.

When a site is created, a site specific access token is created for that site. The use of this token is limited to storing and retrieving the unseal secret. The token is split up into X parts, where X is the number of controller nodes in the site, using the Shamir Secrets Sharing algorithm. Each controller node in the site gets one part of this token secret.

In order for a site to request remote unseal from its parent site it needs to assemble the access token from the different parts. A majority of controller nodes are needed to recover the access token from the parts. This is to prevent the site from being vulnerable to theft of a single node.

Once the access token has been recovered it can be used to retrieve the unseal secret from the parent which, in turn, can be de-crypted using the private key and unseal is possible. It is possible to further tighten security by limiting the IP addresses allowed to access the unseal secret at the parent.

In case an edge site cannot reach its parent and request automatic unseal, it is possible to manually unseal the site. This operation is performed by providing the remote sealkey over a local craft interface. The remote sealkey can be accessed by an administrator using an action at the Control Tower.

Tenant secrets

In addition to encrypting the Strongbox state, each tenant's Strongbox data (eg stored secrets) is encrypted with a unique AES 256 key, different from the sealkey. The tenant keys are stored in the Strongbox state and only becomes available once the state has been unsealed.

Tenant data is stored separately from each other, encrypted with unique keys for each tenant. The keys are also unique to each site. If an adversary gains access to a key for one tenant, at one site, the data for other tenants remain secure, as well as data for the same tenant on other site.

When the data is transferred between sites it is secured using mutual TLS and, in addition, the data is encrypted using a tenant specific transfer key. This transfer key is shared between all sites that a tenant has access to. In case of a security breach the transfer key is rotated to lock out the compromised site.

Tenant data is not kept in plain text in memory when it isn't needed, it is de-crypted when needed for an operation, and then re-encrypted and stored.

Distribution

It is important to be able to control how secrets are distributed to different sites. There are two principles at play. First, distribution only occurs from top to bottom of the site tree. For example, a secret is never shared from an edge site to the Control Tower. Secondly, distribution only occurs when explicitly configured and only to sites that have been assigned to the tenant.

It is possible to configure how secrets should be distributed using the distribute setting. It may be either a generic setting of the to leaf which can have the value all (distribute to all sites a tenant is assigned to), none (do not distribute at all), and inherit (inherit distribution setting from parent), a list of deployments, or an explicit list of sites.

When a site is initially started, or transitions from being blocked to becoming unblocked, it retrieves its initial state from its parent. This initial state contains all data that should be distributed to the site and becomes a starting point for incremental updates.

When a secret is modified (created, updated, or deleted) the changes are distributed according to the distribution setting for the secret. A minimal diff is calculated and sent downstream to the site that should receive it. The diff is encrypted using the tenants transfer key.

Audit logs

Audit logs are provided for all operations performed by a tenant. The log includes the access token, if provided, the operation, and any parameters provided.

However, all sensitive data has been hashed, using a tenant specific HMAC, before being logged.

To search for some specific sensitive data in the logs, for example, operations performed using a specific access token, a plain text version of the data is hashed in the same way. The audit log HMAC function can be accessed using then strongbox/audit/hmac action.

Audit logs are streamed upwards from local sites to aggregation sites higher up in the tree. This to allow inspection of audit logs if a site is compromised.

Core functionality

Strongbox provides a number of services available through REST APIs:

Vault - Key/Value Store
Crypto Functions (encryption, signing, hmac, etc)
Transformation
SSH CA
SSL CA
One Time Passwords

Most of these services have some state. That state is encrypted with the tenant specific secret (as described above), and is handled by a separate process for each tenant.

Vault - key/value store

Vault is an encrypted key-value store. The user may have multiple vaults, where each vault may have different settings in terms of how it should be distributed among sites.

Each vault, in turn, may have multiple secrets where each secret stores a separate dictionary of keys and values. These dictionaries are treated as an atomic unit, ie they are read, written, updated, and deleted as a unit.

It is possible to mount a vault dictionary as a file image provided the mounting containers hash has been added to the allowed-image-access list of the secret.

Auto-mount

Vaults can be auto mounted when an application is started in two different ways.

As files in a volume
As environment variables

When mounting as files the a specific secret in a vault is mounted as a volume with files named after the keys in the secret and the file content is derived from the associated values in the secret.

When mounting as a variable a vault, secret and a specific key has to be specified.

Versioned vault - key/value store

There is a versioned version of Vault that keeps a history of old values. It is possible to retrieve an old value of a key. There is a configurable maximum number of versions to keep at any given time. Old versions are removed once the max-versions threshold has been reached.

Cooperative locking can be achieved by requiring that the old version value is supplied when storing a new version. If the version does not match the stored value then the storage operation will be rejected.

Versions can be deleted, and later un-deleted. If the latest version is deleted then a read of the secret will return the latest live version.

Using the PATCH or merge operation on a versioned secret will result in a new version based on the previous version.

Writing a new value will result in a new version of the secret.

Crypto functions and transit keys

It is important to avoid including keys and other sensitive information in the applications that are distributed and instead access these functions through a fully access controlled API.

Strongbox provides a number of cryptographic functions under the transit path, many of which require some form of encryption key. The state associated with these functions is always encrypted when stored and distributed.

When a transit instance is created it is possible to import an existing key, or to generate a new one. Depending on the selected cipher type different operations are supported: encryption, decryption, signing, signature verification, key derivation, and convergent encryption. The supported ciphers are:

aes128-gcm96, aes192-gcm96, aes256-gcm96 Supporting:
- encryption
- decryption
- key derivation
- additional auth data
- convergent encryption
chacha20-poly1305 Supporting:
- encryption
- decryption
- key derivation
- additional auth data
- convergent encryption
ed25519 Supporting:
- signing
- signature verification
- key derivation
ecdsa-p256, ecdsa-p384, ecdsa-p521 Supporting:
- signing
- signature verification
rsa-2048, rsa-3072, rsa-4096 Supporting:
- encryption
- decryption
- signing
- signature verification

It is possible to keep a number of versions of cipher keys at the same time. Each ciphertext will be tagged with the key version used to encrypt the data. This makes it possible to smoothly phase in and out new version of a key, ie to perform key rotation.

It is possible to specify a minimal version for both encryption and decryption. It might be desirable to phase out an old key by increasing the minimal encryption version, while keeping the minimal decryption version until all data has been migrated to the new version, or become irrelevant.

By default, all version of a key are kept. The only way to remove keys is by using the trim operation. It allows all keys, up to a given version, to be removed. The specified version must be less than the minimal encryption/decryption versions.

Encryption

The encryption service allows a program to encrypt and decrypt data without having the encryption key in clear text in the program. By default the newest available key version is used when encrypting, but it is also possible to specify an earlier key version.

The resulting cipher text include information about which key version was used to encrypt the data.

Encrypted data will be on the format sbox:v<KeyVersion>:<Data>

There is a specific API operation to re-encrypt data using a new key, latest if not explicitly specified. This operation does not return the data in plaintext and can thus be delegated to non-privileged users.

All data that is encrypted is expected to be base64 encoded, and the result of decrypting a cipher text is base64 encoded plain text.

Signatures

Some key types, see above, can be used for signing and verifying signatures. A base64 encoded signature is created using the transit sign operation.

Derived keys

If the derived option is set for a key then the bcrypt pbkdf2 algorithm is used to calculate the key using the secret key component together with the provided context. This makes it possible to have a large number of keys without using more than one key definition (and not use any extra space). They are all rotated at the same time.

If the convergent_encryption options is also set then the IV (nonce) will also be calculated using the same bcrypt pbkdf2 algorithm with key, context as initial input. The result is that the same plaintext input will always result in the same cipher text. This is useful when it is desirable to be able to compare values without decoding them.

Export, backup and restore

When large amounts of data is to be encoded / decoded it is not recommended to use the provided APIs, instead the key should be configured as exportable (once this setting has been enabled it cannot be revoked), and the application should "check out" the key and use it for bulk encryption / decryption.

Provided that the key has been configured with allow_plaintext_backup a plaintext representation of the entire key state can be extracted using transit backup. This is a base64 encoded internal representation of the state. This state can then be restored using transit restore, possibly under a different key name.

Generate data keys

It is also possible to use Strongbox to generate data keys. They are optionally returned in both plaintext (base64 encoded), and wrapped (encrypted using the indicated transit key). These keys are not stored.

Hash and HMAC

It is possible to use Strongbox to calculate a hash of some data using a specified algorithm. Supported algorithms are: sha (sha1, not recommended), sha224, sha256 (default), sha384, sha512, sha3_224, sha3_256, sha3_384, and sha3_512. The result can be returned in two different format: base64 and hex.

Also, all key types can be used for calculating a HMAC together with a HMAC altorithm. Supported algorithms are sha (sha1, not recommended), sha224, sha256 (default), sha384, sha512, sha3_224, sha3_256, sha3_384, and sha3_512.

BCrypt

There is also an action for hashing passwords using the bcrypt password hashing function (version 2y), as well as an action for verifying bcrypt encoded password.

Transformation

It is sometimes desirable to hide data (mask or encrypt) in such a way that the original format of the data is preserved. For example, it might be desirable to protect a credit card number by encrypting it, and at the same time be possible to provide it to a sub-system that expects a value on the format of a credit card number.

There is a class of encryption algorithms that supports this called Format Preserving Encryption (FPE) algorithms. We use an Erlang implementation by Guilherme Andrade called erlffx of the algorithm described in the 2010 paper The FFX Mode of Operation for Format-Preserving Encryption by Bellare, Rogaway and Spies.

By encrypting the data it can be restored by decryption. It is also possible to mask the data using masking. Masked data cannot be restored, however this is sometimes desirable as well, for example, when logging.

Transformation setup

A new transform service is configured on a given path with a specified role name, together with parameters for:

Parameter	Default	Description
`key-length`	16	length of AES key used to perform encryption
`type`	`fpe`	`fpe` or `masking`
`tweak-source`	calculated	one of `supplied`, and `generated`
`masking-character`	`*`	character to use for masking text
`template`	(no default)	`creditcardnumber` or a template specification

A template consists of:

Parameter	Description
`alphabet`	One of `numeric`, `alphalower`, `alphaupper`, `alphanumericlower`, `alphanumericupper`, `alphanumeric`, and a custom alphabet specified as a binary of all alphabet elements.
`pattern`	a regular expression pattern where each group is subject to encryption, for example `\([a-zA-Z0-9]+\)-\([a-zA-Z0-9]+\)`

tweak is some data used together with the encrypted data as a type of salt. Typically the surrounding text is used as tweak data. If you, for example, want to encrypt then middle four digits of a larger number, it would be possible to guess the content of the middle numbers by comparing different encrypted strings. However, if the surrounding texts were used as tweak when encrypting the middle numbers, then the same numbers are no longer encrypted to the same string, and becomes more difficult to guess.

The tweak character can be automatically calculated or provided for each encryption/decryption invocation. Note that the same tweak text must be supplied when decrypting as was supplied when encrypting.

Encryption/decryption

Data can be encrypted or masked using the transform encrypt command. It has to conform to the pattern and alphabet specified for the transform. A tweak can optionally be specified, or automatically derived from the surrounding text.

Decryption is only possible if the value was encrypted using fpe and not masking. The original value will be restored as long as the same tweak is supplied.

SSH

Strongbox can function as a ssh CA and generate both host (server side) credentials and client credentials. Both ssh keys and ssh certificates can be generated.

It can also generate OTPs that can be used by a client to log on to a service, and by a server to authenticate a user.

SSH setup

When a ssh CA service is created it can be configured to generate a signing key, or a signing key can be supplied. Supported key types are rsa (key length 1024, 2048, 3072, and 4096), ed25519, ecdsa (curves nistp256, nistp384, and nistp521). The public key is available as state.

SSH host certificates

Host certificates are used to identify a host to a client. Often the public key is used to identify the host. This is a security risk since most users tend to just accept the public key that the host provides and add it to their known_hosts file. A much better way is to use a host certificate. The client then adds an entry in the .ssh/known_hosts file to indicate that it trusts all certificates signed by a given CA. An entry in the clients .ssh/known_hosts file may look like this:

@cert-authority tio.avassa.io ecdsa-sha2-nistp256 AAAAE2VjZHN...

The host needs to configure the host certificate using the HostCertificate setting in /etc/ssh/sshd_config. For example in OpenSSH:

HostCertificate /etc/ssh/host_id_ecdsa-cert.pub

Client certificates

SSH keys may be used to facilitate authentication without passwords towards a SSH host. A problem with this is that clients needs to install their public keys on all hosts they want to access. It is difficult to add keys to all hosts, and more importantly, when a client's access is revoked all entries needs to be removed on all hosts.

A better solution is to use ssh client certificates. The client provides it's public key to the CA, which signs it and returns a signed certificate with a limited validity time. The client can use this certificate to authenticate towards a host as long as the certificate has not expired.

The host needs to be configured to trust certificates signed by the CA. In OpenSSH this is done using the TrustedUserCAKeys setting. For example:

TrustedUserCAKeys /etc/ssh/ca.pub

The Strongbox CA can sign and generate SSH certificates. When generating a SSH certificate both the private and the public key will be generated, and the public key will be signed.

SSH one time passwords

As an alternative to SSH client certificates Strongbox can be used for generating and verifying OTPs. An OTP is issued for a specific user and a specific IP address. The OTP can be validated exactly once.

To verify an OTP the server is configured with a PAM module that invokes a program that performs the verification towards Strongbox. This may look like:

auth requisite pam_exec.so quiet expose_authtok log=/tmp/sboxssh.log /usr/local/bin/sbox-ssh-helper -dev -config=/etc/sbox-ssh-helper.d/config.hcl
auth optional pam_unix.so not_set_pass use_first_pass nodelay

And the sshd_config is modified to use PAM, ie

ChallengeResponseAuthentication yes
PasswordAuthentication no
UsePAM yes

Roles

Different roles are configured for issuing certificates and OTPs. Each role instance can be configured with different limitations for the certificates and OTPs it can generate.

It is a good idea to create one role per user you want to issue OTPs for.

TLS

Strongbox can be setup to function as a SSL/TLS CA, either with a self-signed root certificate, or with a provided SSL/TLS CA certificate.

This functionality is primarily intended for securing communication between applications using, for example, mutual TLS, but it can also be used to secure Web traffic.

The certificates can be configured to be automatically rotated when they are about to expire.

Intermediate CA certificates

The Strongbox CA can issue intermediate CA certificates, which helps with setting up a distributed trust scheme.

Client and server certificates

Certificates are signed and issued by different TLS roles. Different roles can be configured to be allowed to issue certificates with different restrictions such as allow-client-certificates, allow-server-certificates, allowed-hosts, allowed-domains, ttl, allow-subdomains, etc.

The CA can create and sign RSA (1024, 2048, 3072, and 4096) and ECDSA (secp256r1, secp384r1, and secp521r1) certificates. Certificates can be created from a provided public key, or a proper key pair can be generated.

Revocation lists

The CA functionality can also keep track of revoked certificates, and generate properly signed revocation lists on demand.

Introduction​

The seal​

Automatic unseal​

Tenant secrets​

Distribution​

Audit logs​

Core functionality​

Vault - key/value store​

Auto-mount​

Versioned vault - key/value store​

Crypto functions and transit keys​

Encryption​

Signatures​

Derived keys​

Export, backup and restore​

Generate data keys​

Hash and HMAC​

BCrypt​

Transformation​

Transformation setup​

Encryption/decryption​

SSH​

SSH setup​

SSH host certificates​

Client certificates​

SSH one time passwords​

Roles​

TLS​

Intermediate CA certificates​

Client and server certificates​

Revocation lists​