Back

Dark mode

Unicity Distance (OTP)

02-23

Introduction

This post will discuss the concept of unicity distance in cryptography and briefly introduce the one-time pad cipher and it’s relation to unicity distance. An implementation of the one-time pad cipher in Python is also included.

Unicity Distance

Originally defined by Claude Shannon in his 1949 paper "Communication Theory of Secrecy Systems”, unicity distance is the length of ciphertext needed to break the cipher by reducing the number of spurious keys to zero in a brute force attack.

Spurious keys

A spurious key is a key which decrypts the ciphertext to meaningful, yet incorrect, plaintext.

For example take the ciphertext “FHTSNA”.

The key keyakey_a may decrypt this to incoherent plaintext such as “VBSAQU”. Thus keyakey_a is not a possible key as it does not decrypt to meaningful plaintext.

However, keybkey_b may decrypt this ciphertext to “ATTACK” and keyckey_c may decrypt this ciphertext to “RESCUE”.

Thus, without more ciphertext we cannot determine the correct key.

If more ciphertext is intercepted; “FHTSNA TXC”, we can test both keybkey_b and keyckey_c to see if we can determine the correct key:

Ciphertext decrypted with keybkey_b: “ATTACK NOW”

Ciphertext decrypted with keyckey_c: “RESCUE TGX”

With this additional ciphertext we can now discard keyckey_c as a possible key.

The unicity distance gives us the minimum about of ciphertext required before it has only one possible decryption.

Key length

In the example above, there are a possible n=265n = 26^{5} possible keys if we assume a key must be made up of 5 upper-case English letters. The majority of these keys will not decrypt to any meaningful plaintext and can be discarded. However, mm of these keys will decrypt to meaningful plaintext but only one will be the correct key.

It is likely that mm is much smaller than nn and since mn\frac{m}{n} gets smaller as the length of the ciphertext increases, there will be some length of ciphertext, ll, which will reduce the number of spurious keys to zero (i.e reduce mm to 1). ll is the unicity distance.

Redundancy

The unicity distance n0n_0 can be formally defined as:

n0=log2Kρn_0 = \frac{\log_2 \lvert K \rvert}{\rho}

Where n0n_0 is the unicity distance, K\lvert K \rvert is the number of possible keys, and ρ\rho is the redundancy of the plaintext.

For example, the redundancy of the English language has been found to be 3.2 bits. If we consider a simple Caesar cipher (25 possible keys), we can compute the unicity distance for this cipher as:

n0=log2253.2=1.451202n_0 = \frac{\log_2 25}{3.2} = 1.45120 \approx 2

Meaning that with a ciphertext length of just 2 characters, an attacker have enough information to reduce the number of spurious keys to zero in a brute force attack. This is intuituve given how a Caesar cipher functions.

Security

Let nn be the length of the ciphertext.

If n<n0n < n_0 we have an unconditionally secure system as there will exist spurious keys and there is not enough information for an attacker to determine the correct key.

If nn0n \ge n_0 we have an insecure system as there are no false keys and an attacker has enough information to determine the correct key in a brute force attack.

Generally speaking, in a secure cipher n0n_0 should be as large as possible.

One-time pad

The One-time pad (OTP) is a perfect cipher that cannot be cracked. In the OTP the plaintext and key are combined (XOR in binary) to produce the ciphertext. In the OTP cipher, the key must be:

  • Completely random (uniformly distributed in the set of all possible keys and independent of the plaintext)
  • At least as long as the plaintext message
  • Secret
  • Never reused

As noted by Claude Shannon, given the key is completely random and always at least the length of the plaintext, the unicity distance of this cipher will always be longer than the length of the ciphertext. Meaning there will always exist spurious keys and the cipher can never be cracked.

An example of the OTP in Python can be found here.

In practice, this cipher is not feasible to use due to the difficultly in generating a truly random key which is at least the length of the plaintext. Imagine the scenario of encrypting a 10GB file - we would need a 10GB truly random key.

References

Note: these references exclude hyperlinks included throughout the document.

  1. Unicity Distance

Disclaimer: Do not use this to secure any information. This code is purely to be used for educational purposes only.

--- E.O.F ---
Back