Cryptography in Plain English - What Every Developer Must Know

Most developers whom I have come across, lack a solid grasp of the fundamentals of cryptography. When a developer who does not understand crypto needs to use crypto, several things can go wrong:

•    Not understanding the implications of using some crypto technology in the code
•    Not realizing where to use crypto
•    Not implementing crypto correctly and hoping that the implementation is correct
•    Not implementing crypto correctly, but feeling secure because “we have used cryptography”
•    Not using crypto at all

The unfortunate part to this whole situation is that Cryptography is not hard to understand. Most of the perceived complexity is in areas where an application developer would typically never venture: designing an algorithm or a protocol, cryptanalysis, etc. In most cases, one can use one of the modern crypto libraries and go by with a minimal understanding of the principles involved. The aim of this post is to provide that minimum background required to start using crypto effectively in day to day work.

1. Introduction

While Cryptography can serve in multiple ways, its most common usage – which also illustrates most of its principles – is secure communication. To understand the notion of secure communication, let’s bring in our cast:

This story is about two nice, genuine people – Alice and Bob – who are in separate locations and want to have a conversation. The third character in our story is Eve who does not like Alice and Bob and is their adversary. She wants to intercept this conversation, find out what Alice and Bob are talking about, and if possible, disrupt the conversation by modifying the messages between them or by impersonating Alice or Bob.

Now Alice and Bob are aware of Eve. To prevent her from accomplishing her purposes, they set up a pipe which goes from Alice to Bob. Whatever message Alice drops in the pipe at her end would be received by Bob at the other end and vice versa. The pipe is opaque and indestructible so Eve cannot look at the messages nor modify them.

Such a pipe or channel then has the following properties:

•    Confidentiality: The advisory cannot see the contents of messages passing between sender and receiver cannot be seen by the adversary since the pipe is opaque.
•    Integrity: Contents of messages cannot be modified since the pipe cannot be penetrated
•    Authentication: The pipe, being indestructible, does not allow anyone else except Alice and Bob to send messages. So Eve cannot masquerade as Bob.
•    Non-Repudiation: Since only Alice and Bob can send messages on the pipe, when Bob receives a message on the pipe, Alice cannot deny that she had not sent that message.

The purpose of cryptography is to establish this kind of a secure channel over a public network like the Internet.

2. Confidentiality using a Shared Secret

Traditionally the problem of making communication confidential over an insecure transport has been solved by Private-Key Encryption [also called Symmetric Encryption]. The idea is simple: Before exchanging messages, Alice and Bob meet in person and decide on the following:

An Encryption Algorithm (EA)
A Decryption Algorithm (DA)
A Shared Secret – a key (K)

[Note: Alice and Bob do not care if Eve gets to know EA or DA, as long as K remains a secret. This is a very important point in modern-day cryptography: we assume that the Adversary has the knowledge of how the algorithm works. This is called Kerckhoffs’ Principle. Mechanisms that are based on keeping the working of an algorithm secret are obscure, not secure. Such mechanisms are of historical interest only and are inadequate for today’s world.

Sidebar: Security by Obscurity

Some people compare this with government agencies (like NSA in the US) not publishing their mechanisms. While this is true, such non-disclosure does not imply that the agencies like NSA design algorithms on the basis of obscurity. Security agencies in governments (or large IT companies creating security technology) hire top-notch cryptographers who are well aware of the Kirchhoff’s Principle and design with the assumption in mind. Obscurity then is either an extra defense (however weak that maybe), or a way of protecting intellectual property rights on the mechanism. Bottom-line: modern-cryptography assumes that the attacker has knowledge of the working of the algorithm and only the key is secret.]

Now to send messages securely, Alice takes the message (called Plaintext in this context) and generates an encrypted message out of it (called Ciphertext) by using the Encryption Algorithm EA and the Secret Key K. Thus:

Ciphertext = EA(Plaintext, K)

Alice now sends this Ciphertext to Bob over the public network. On receiving the Ciphertext, Bob uses the Decryption Algorithm (DA) and the Secret Key (K) to generate the Plaintext back. Thus:

Plaintext = DA(Ciphertext, K)

Eve does not have the Secret Key (K) and is therefore not able to obtain the Plaintext. We have achieved confidentiality by sharing a secret.

2.1 Substitution Ciphers

The simplest way of understanding how private-key encryption works in practice is to consider an elementary algorithm – The Substitution Cipher. A Substitution Cipher works as follows: There is a message M that needs to be sent. To encrypt the message, characters in blocks of n characters are taken, and substituted by a different set of characters picked up from the Alphabet, differing from the original set by a function of K.

The idea is not new. Its most famous usage was by none other than Julius Caesar who replaced each character in the message by a character three characters ahead (shifting by 3) in the Latin Alphabet. This is called the Caesar-Cipher. Even as late as 1915, the Russian army was using the Caesar cipher since their troops could not learn to use more complicated ones. A trivial modern day implementation is called ROT13 (Rotation 13 – rotation by 13 positions). Since the English alphabet is 26 characters, ROT13 is its own inverse, i.e., the decryption and encryption algorithms are the same rule because of the choice of the key (13). ROT13 is used by people asking questions / puzzles, etc, on the Usenet. The standard Usenet client has ROT13 built-in.

Breaking a Substitution Cipher

What can Eve – the adversary – do about breaking a substitution cipher being used by Alice and Bob? Cryptanalysis – analysis of a Cryptographic Scheme with the goal of breaking it – of a substitution Cipher is rather straightforward, and by analyzing a few instances of Ciphertext produced by a substitution cipher, a cryptanalyst can easily break the cipher.

One of the simplest techniques that Eve can use for this purpose is Frequency Analysis. This is based on the fact that in any language some letters are used more than others. (In English the approximate sequence is ETAOIN SHRDLU for the top 12 characters). So by analyzing the frequency of the alphabets appearing in the Ciphertext, the Plaintext can be figured out. The earliest use of this technique was by the Arabs in the 9th century who realized that certain characters appear more frequently in the Koran than others. Several of the ciphers used in World War II were also broken using frequency analysis.

2.2 Transposition Ciphers

We know that substitution ciphers are not strong enough. So what are our other choices? There is another technique called the Transposition Cipher, where the characters are permutated, i.e., positions of characters are interchanged. Transposition Ciphers have also been known for a very long time and one can find a transposition cipher every day in a newspaper. We know it by the name “Anagram.” An anagram is a very simple form of a transposition cipher, and has more recreational value than cryptographic. Nonetheless, anagrams have been used by linguists and intellectuals throughout history for amusement or for hiding messages, and provided a powerful plot device in Dan Brown’s The Da Vinci Code.

Anagrams aside, the earliest example of a Transposition Cipher being used for a security purpose is a Scytale – Greek for a baton – a device used by the ancient Greeks to communicate messages during a military campaign. The working was really simple: a strip of paper was wound around a cylinder of a known diameter and a message written on the paper. The paper would then be peeled off and sent thru a messenger. The recipient would have a similar cylinder on which it would wind the paper again to read the message. Unless the recipient’s cylinder was of the same diameter, the letters would not come together properly and the message would not make sense. However, manual analysis of the letters is straightforward – letters after a specific distance need to be put together to make sense out of the message, and the strip of paper is a big giveaway.

2.3 Product Ciphers

Transposition Ciphers or Substitution Ciphers are both open to fairly simple Cryptanalysis, but a combination of a Transposition operation with a Substitution operation leads to a much stronger cipher. A Product Cipher, alternates transposition and substitution, and carries out this core function over several iterations. The ciphers used in World War II (ENIGMA by the Germans, PURPLE by the Japanese) were Product Ciphers. Modern-day Product Ciphers include DES as well as AES (a.k.a. Rijndael). DES uses a 64-bit key and alternates 16 substitutions with 15 transpositions. AES uses a 128-bit key and alternates 10 substitutions with 10 transpositions. AES is stronger.

2.4 Modes of Operation

Depending on how a Symmetric encryption algorithm deals with data, we can define two distinct categories for symmetric algorithms: Block Ciphers and Stream Ciphers. In a Block Cipher, data is taken in blocks (typically 64-bit or 128-bit block) and the algorithm encrypts / decrypts the block as a whole. A Stream Cipher operates on a stream of data and encrypts / decrypts the data one-bit at a time.

2.4.1 Stream Ciphers

A Stream Cipher uses the bits in the secret key to perform some operation with each bit in a stream to generate Ciphertext. For example, a simple operation would be to XOR each bit in a key with each bit in the Plaintext stream. However, it is highly unlikely that the key would be as long as the Plaintext (see “One Time Pad” in section 3.1 for a discussion), so what typically happens is that the Secret Key is used to generate a pseudo-random bit-stream called keystream, and the keystream is then used to perform some operation with the Plaintext stream.

The keystream is generated using some internal state in the algorithm. For higher security, this internal state needs to be updated continuously. Based on how this state change takes place, Stream Ciphers can be classified as:

1. Synchronous Stream Ciphers: The internal state of the cipher changes independent of Plaintext or Ciphertext.

2. Self-Synchronizing Stream Ciphers: The internal state changes based on the previous Ciphertext digits.

Since the keystream itself is not really random, the bit sequence in the stream would repeat itself after a period. The longer the period, the more secure the algorithm.

2.4.2 Block Ciphers

A block cipher operates on a pre-defined block-size. Typically the block-size is 64-bits or 128-bits. So for a block cipher, the biggest question is how to deal with messages longer than the block-size. There are various modes of operation which a Block Cipher can employ. The main ones are:

1.    Electronic Codebook (ECB): This is the simplest mode. The message is split into parts and each block is encrypted / decrypted separately. This is insecure and not recommended since it is subject to two simple attacks:
•    Stereotyped Beginnings and Endings: Messages have typical headers and footers and that knowledge can compromise the key.
•    Replay Attack: The adversary does not know the Plaintext or the key, but she simply intercepts the Ciphertext and forwards it to the receiver. An example where this can hurt is authentication.

2. Cipher Block Chaining (CBC): In this mode, each block is XORed with the previous block before encryption. It is very commonly used. A variation on CBC is called Propagating Cipher Block Chaining (PCBC). This mode is used in Kerberos.

3. Cipher Feedback (CFB), Output Feedback (OFB), Counter (CTR): All of these modes convert a block-cipher into a stream-cipher by taking the output from the previous block. CFB converts a CBC into a self-synchronizing stream cipher. OFB and CTR makes a block cipher a synchronous stream cipher.

All modes except ECB use output from the previous block to operate on the next block. This leads us to the question: How is the first block processed? To solve this problem, a dummy block, called an Initialization Vector (IV) is used. The IV need not be secret, but the same IV should never be used with the same key.

3. Need for Cryptography – the Chimera of Perfect Secrecy

I mentioned that Product Ciphers are stronger as compared to Transposition and Substitution Ciphers. But on what basis does one make a statement like that? Is there a way of figuring out how secure an encryption is, or how difficult it is to break it? Till recently, most cryptography was ad-hoc without any solid theory behind it. This changed in 1948 when Claude Shannon – the Father of Information Theory and Modern-Day Cryptography – published A Mathematical Theory of Communication. The key result he proved was that a secure encryption system could exist only if the size of the Secret Key was as large as the total size of the messages that would ever be exchanged using that secret key.

3.1 One Time Pad

An example that illustrates Shannon’s principle is the One Time Pad (OTP). The idea is this: suppose Alice has Plaintext of length N characters. She now generates a key randomly, also of length N and uses it to generate the Ciphertext. The key is used only once, and for the next message, a new key is generated, again equal to the length of the message. This is mathematically proven to be unbreakable. (Since the key can be used only once, it is called “One Time”, and “Pad” because the key used to be on a slip of paper torn from a Pad on which random text was printed).

The problem with OTPs is key distribution – it is logistically tough and prone to security breaches. So while OTPs in themselves are perfectly secure, their implementations are vulnerable because of the insecurity involved in key distribution. OTPs are typically never used, except in highly critical applications where the parties involved are willing to undertake the key exchange challenge.

The reason I brought up OTPs is to point out that perfect security – that can defeat unlimited computing power – is not always practical. Modern-day Cryptography does not chase this chimera. Its goal is not to provide security in the face of infinite computing capacity. Instead, it assumes that the adversary has (reasonably) limited resources. What makes modern-day Crypto work is the gap between the computational ability it takes the legitimate
user to encrypt vs. the computational infeasibility of decryption for the adversary who does not have the secret.

3.2 Cryptography Backgrounder – Computational Complexity

Whether or not a computation is feasible can be found out if we know the cost involved in making the computation. The discipline of computer science which deals with cost of computation is Computational Complexity Theory. The theory measures the cost of making a computation in terms of the space (quantity of information storage) and time it takes to make the computation. Here it is important to note that when we think of computation, we are not thinking about a specific computation model like the Von Neumann Architecture on which modern PCs are based, or a Turing Machine, but any arbitrary model.

3.2.1 Algorithm Efficiency

Before we get deeper into the theory, a quick word about algorithm efficiency: An algorithm’s efficiency is defined as the amount of the time it takes the algorithm to compute the problem, as a function of the problem size. (The problem size measures the size of the input to the program. For different problems, problem sizes are differently defined, and getting it right takes some familiarity with complexity theory.) So for example, the time taken could be:

T(n) = n³ + n² + 1

Where n = problem size.

Now as n gets arbitrarily large, the value of (n² + 1) becomes insignificant as compared to n³. So T(n) = n³ for very large n. Such an algorithm then has complexity of the “order of n cube”, and the algorithm is called a O(n³) algorithm.

An algorithm is considered to run in polynomial time if T(n) = O(n^k) where k is a constant. Example include

•    Constant time: O(1) [k = 0].
•    Linear time: O(n) [k = 1]
•    Quadratic time: O(n²) [k = 2]
•    Cubic time: O(n³) [k = 3]

An algorithm running in polynomial time would be more efficient than an algorithm running in exponential time: T(n) = O(kⁿ)

3.2.2 Complexity Classes

In general, complexity theory looks at the universal set of problems in terms of what are called Complexity Classes. A complexity class is a set of problems that are of related complexity, as in, the algorithm efficiency for solving that set of problems can be expressed by T(n) = O(f(n)). As can be expected, there are a lot of complexity classes, but the ones of biggest interest to us are called P and NP classes.

3.2.3 Class P Complexity

A problem that can be solved in polynomial time is said to have Class P Complexity. A number of problems fall under Class P complexity:

Finding the greatest common divisor
Determining if a number is Prime
Linear Programming
Shortest Path
Sorting
Modular Exponentiation MOD(x, y, z) = x^y MOD z

From the perspective of Complexity theory any algorithm that runs in polynomial time is considered as being efficient (or feasible), and any algorithm that does not run in polynomial time as being intractable or infeasible. So any computation in the class P-Complexity is considered efficient. However, this is only in theory, it may not be so in real life. For example if the value of k is 10000, T(n) = O(n¹⁰⁰⁰⁰) is certainly a very inefficient algorithm.

3.2.4 Class NP Complexity

Put informally, problems of class NP complexity can be defined as ones having a verifier function that can be solved in polynomial time. To understand this better, consider the problem of finding out whether a number is composite or not. Doing this for large numbers is really hard, with T(n) = O(k^{Log n}) for the most efficient algorithms. However, once we have a candidate factor of n (say x), finding if x is actually a factor or not is trivial:

bool IsFactor(int n, int x) { if (n % x != 0 || x == 1 || x == n) return false; else return true; }

Since the verifier function (IsFactor) is a class P algorithm, the problem of whether n is composite or not is a NP problem. All NP problems have a verifier function which accepts an input and a verification value, which can be computed in polynomial time. A number of problems are Class NP Complex, the most common being the Factoring problem and the traveling salesman problem.

The hardest problems in Class NP complexity are said to be NP Complete. An example is the problem of multi-processor scheduling: Find the minimum time required to schedule n jobs (of varying lengths) on m processors such that no two jobs overlap.

3.2.5 Is P = NP?

Consider the case where a P class problem also has a P class verifier. Would this problem be a class NP problem or a class P problem? The simple answer is P is a subset of NP. So every P problem is a NP problem, but the reverse may not be true. However, some researchers believe that any problem whose verifier can be solved in polynomial time, can itself also be solved in polynomial time. In short, P = NP.

Whether P is equal to NP is one of the most important unsolved problems in modern mathematics and if it can be proven that all class NP problems are actually class P problems, the implications would be stunning, with a huge impact on our society (for starters, most of modern-day Cryptography would be rendered useless)

3.3 Information Leakage

While the computation of the Plaintext from the Ciphertext may be infeasible (NP Complete), what if it is feasible (P Complex) to compute part of the Plaintext? For example, what if the length of the Plaintext can be known? Does that compromise the security of the message? Assume a situation where Eve expects Bob to ask Alice whether she would marry him. The answer which Alice would return is a simple yes / no, represented by a single bit. Now if the Eve is looking at the Ciphertext going along the wire, and discovers a single unit moving across, she may deduce that Alice has replied to Bob’s question. What now is the probability of Eve finding out the correct answer? 50%. Much higher than what a secure communication may require.

So from a Cryptography standpoint, when we look at computational feasibility, we need to look at it not just from the perspective of the entire data becoming available, but also how feasible it would be to recover part of the information.

4. Cryptographic Primitives

Modern-day cryptography – based on computational complexity – is built using certain Cryptographic Primitives. What is a primitive? A primitive is an atomic operation (from the perspective of the layer above it), a building block, which does a certain defined task. Cryptography requires primitives that have certain computational hardness. All computational complexity in Cryptography comes from these underlying primitives. Understanding the primitives is the key to understanding Cryptography. While you would not have to typically deal with how Protocols are constructed, a basic understanding is important to get the fundamentals right.

4.1 One Way Functions

The simplest of the primitives is a One-Way Function. A one-way function is any function that is easy to compute but hard to invert. A mathematical function that is hard to invert is multiplication: Given integers P and Q, N = P x Q can be easily calculated, but given a random N, how would one factor it into P and Q? Sure, we can produce all the factors. But getting P and Q specifically is probabilistic. The higher the value of N, the lower is the probability of getting P and Q back quickly and consistently. This is called the factoring problem, and it is a good example of a function that is efficient to compute, but infeasible to invert.

Today we have no rigorous mathematical proof that what we consider one-way functions, are indeed one-way. What that means, is that today we are not sure whether there exist inversion methods that have lower computational complexity than the ones we are aware of. Just because a lower-complexity inversion has not yet been discovered, does not mean that they cannot exist. So cryptography is built on assumptions. So far the assumptions have been valid.

4.2 Hash Functions

One of the most important one-way functions is a Hash Function. A Hash function takes an arbitrary number of bits as input and produces a smaller number of bits based on the input. The algorithm used is deterministic (given the same input, it would always pass thru the same steps and produce the same output), so for the same input bits, it would always generate the same output bits. In context of hash functions, the input is often called a message and the output is called a message digest or a digital fingerprint. So:

Digital Fingerprint / Message Digest = HashFunction(Message)

A hash function is obviously one-way. That’s because a hash function is lossy – the number of output bits is less than the number of input bits so there is not enough information in the output to calculate the input. However, this alone would not make hash functions useful. What makes hash functions useful is that they are collision resistant and create an Avalanche effect.

Collisions in Hash Functions

A collision is a situation where the hash function generates the same output for two different inputs. A good hash function is collision resistant. Once again, if you consider the fact that the number of output bits is less than the number of input bits, it is obvious that there is no way of avoiding a collision. A simple way of approaching this is to think about the input as a 64-bit number and the output as a 16-bit number. Now there are 18,000 trillion 64-bit numbers (264 = 18,446,744,073,709,551,616) 64-bit numbers and 65,536 16-bit numbers. As we keep reducing each number in the 18,000 trillion-range to 65,536 numbers, there are bound to be collisions.

So a hash function cannot be collision resistant. However, we can reduce the probability of collision if we choose a large enough output range and a good algorithm. For cryptographic purposes, we define collision resistance as: It should be computationally hard to find two inputs m1 and m2 so that their output h = HashingFunction(m1) = HashingFunction (m2). The other important property for a cryptographically secure hashing algorithm is that it should be Pre-Image Resistant: Given an output h, it should be computationally hard to find an input m so that h = HashingFunction(m).

Avalanche Effect in Hash Functions

The other desirable property of a good hash function is what is called the Avalanche Effect. The idea is that if the input changes even by a small amount, the output changes dramatically. The Strict Avalanche Criterion states that if one bit is flipped, each of the output bits should change with a probability of 0.5, in other words, half of the output bits should flip.

There are many hash functions in common use today like SHA-1 (aka MD5) and SHA-2. For secure hashing, SHA-2 should be used.

4.3 Random Numbers

The other important family of Cryptographic primitives is Random numbers and their generators. Recall Kerckhoffs’ Principle – the only secret is the key, not the algorithm. Now, we may keep the key a secret, but what if someone guesses the key? So we want to make the key harder to guess. A “guess” can even be an automated attack where the program tries all possible permutations. (This is called a brute-force attack) To reduce the possibility of such an attack from being successful, there are two things that are needed:

1. The key should be long. The longer the key, the stronger the encryption
2. The key should be unpredictable. The more random a key, lesser are the chances of it being predictable.

Point one is easy to understand, but point two is a little ambiguous. What is “more random?” Without getting into a philosophical discussion on randomness, I would just like to say that a random number is one which is generated in such a way that its probability of being generated is as high / low as any other number, and there is no way of determining when the number would be generated.

Generation of random numbers depends on the initial input on which a mathematical calculation would be performed. But unless that initial condition has randomness in itself, the output would also not be in random. This initial randomness is called an Entropy Source. Based on the entropy source, Random Number Generators (RNGs) can be classified as:

1. RNGs based on Deterministic Random-bit Generators (DRBGs):

DRBG means that the source of entropy is deterministic. Modern computers are all deterministic (as compared to quantum computers which are non-deterministic). If the entropy source is deterministic, the numbers generated cannot really be random. This means that a software-based generator does not generate truly random numbers. Hence we call the numbers generated by such a generator Pseudo-Random Numbers, and the generator itself is called a Pseudo-Random Number Generator (PRNG). All software based RNGs are PRNGs.

2. RNGs based on Non-Deterministic Random-bit Generators (NDRBGs):

NDRBG means that the source of entropy is non-deterministic. Such RNGs are typically hardware based and use a physical phenomenon as a source of entropy. Examples of physical phenomena used by NDRBGs include photoelectric effect, thermal noise, or some other quantum phenomena. Quantum phenomena in theory are unpredictable and therefore a reliable source of entropy. We call the numbers generated by such a generator Truly-Random Numbers, and the generator itself is called a True-Random Number Generator (TRNG). TRNGs are rather expensive and are needed only for very high security scenarios.

Cryptographic Random Number Generators

How does one find out if a PRNG is good enough to be used in Cryptography? A random number generator would typically generate numbers in a given range. This leads to two important properties:

• The distribution of values of the numbers generated in the range. The more even the distribution, the better.
• Ease of prediction of numbers based on a given set of numbers from the generator. The more difficult the prediction, the better is the PRNG.

In order to have these properties, a PRNG needs two things:

1. A random bit stream from a source of entropy. There is some entropy even in a deterministic computer. Examples include Clock Tick count, and values of System Handles (like Thread handle, Process handle, Network Card Id, etc.)
2. A mathematical calculation that removes the bias from a bit-stream and makes it more randomized. Examples include hashing algorithms like MD4, SHA, etc.

For the second point, there is a standard by a body called FIPS (Federal Information Processing Standard) which can be found in FIPS 186-2.

A function catering to the above two requirements is called a Cryptographically Secure Pseudo Random Generator (CRNG). Typical random number generators like the C++ rand() are not CRNG and they should not be used in Cryptography.

5. Integrity and Authentication via Cryptography

Going back to our problem of message exchange between Alice and Bob, even if we establish confidentiality using a shared key, there is still another problem. What if an adversary was to modify the messages, or insert their own messages. We need some way to ensure that messages are not tampered with (integrity), and originate from the actual sender (data authentication). There are many ways this is done.

5.1 Message Authentication Codes

A Message Authentication Code (aka MAC / checksum / tag / keyed hash / error detection code) is a small piece of information used to verify that a message has not been modified and indeed originates from the stated sender.

The way the scheme works is that Alice and Bob decide on a shared secret key (K). To send a message Alice uses the key and some MAC algorithm to generate a tag and sends the tag along with the message. Bob uses the tag and the message to verify that the message is not changed. If the message is modified the verification algorithm detects that the message has been tempered with. The key requirement of a MAC system is that it “must resist existential forgery under chosen-message attacks.” Which means that the adversary can create at least one message-tag pair where that specific message-tag was never generated by the legitimate sender.

Note that MACs provide both integrity and authentication. If there is tampering the recipient can detect that. And since the scheme depends on a shared secret, only the entity holding the secret key can send the message, hence authenticating the sender.

There are several MAC algorithms – they can be constructed using both block ciphers using a scheme called CMAC (which gives us CMAC-AES, CMAC-DES, etc.), or using hash functions. A specific type of hash-based MAC called HMAC (with algorithms like HMAC-MD5 and HMAC-SHA256) is particularly interesting because of the additional security properties it enables. Specifically, HMACs do not leak any information about the message as long as the underlying hash function is secure. It is recommended to use HMACs for most cases where a MAC is required.

6. Public Key or Asymmetric Cryptography

Coming back to the problem of message exchange between Alice and Bob, the situation so far is as follows:

Prior to exchanging messages, Alice and Bob agree upon a large random key. The longer and more random the key, the more secure the encryption. The key is private to Alice and Bob – it is a shared secret between them and not disclosed to a third party.
The sender then uses a sophisticated encryption scheme (like AES) to encrypt the message. The strength of the algorithm being determined by how efficient the algorithm is for the sender / receiver to execute, but computationally hard for Eve (the Adversary) to invert. The bigger this difference, the better the algorithm.
To ensure integrity of the message and authenticate that it is indeed coming from the sender, a MAC algorithm like HMAC-SHA256 is used.

The only problem in this situation is the initial condition – the sharing of the secret key. Since the security of the scheme derives from the key (recall Kerckhoffs’ Principle), its sharing in a secure way is a pre-requisite. The question is how does one share the secret in a secure way without meeting in person? There has to be some way of transferring the key in a secure way too. But wouldn’t secure transmission of the key also involve some sort of encryption / MAC? And in that case, wouldn’t we need a key once more? The problem seems to be recursive.

A Revolution in Cryptography

This problem was tackled for the first time in 1976 by Whitfield Diffie and Martin Hellman in their seminal paper New Directions in Cryptography in which they prophetically proclaimed that “we stand today at the brink of a revolution in cryptography.” The paper proposed two approaches:

Public Key Cryptosystem (PKC). “In a public key cryptosystem enciphering and deciphering are governed by distinct keys, E and D, such that computing D from E is computationally infeasible (e.g., requiring 10^100 instructions). The enciphering key E can thus be publicly disclosed without compromising the deciphering key D.”
Public Key Distribution System. “In such a system, two users who wish to exchange a key communicate back and forth until they arrive at a key in common. A third party eavesdropping on this exchange must find it computationally infeasible to compute the key from the information overheard.”

The Diffie-Hellman paper also proposed for the first time the idea of Digital Signatures – a signature that anybody could verify but nobody could forge:

A signed contract serves as legal evidence of an agreement which the holder can present in court if necessary. The use of signatures, however, requires the transmission and storage of written contracts. In order to have a purely digital replacement for this paper instrument, each user must be able to produce a message whose authenticity can be checked by anyone, but which could not have been produced by anyone else, even the recipient.
New Directions in Cryptography

The DH paper described a solution for the Public Key Distribution system which is now called Diffie Hellman key exchange, but left the problem of PKC and Digital Signatures open.

6.1 Diffie-Hellman Key Exchange Algorithm

The DH paper described for the first time a solution for exchanging a secret key without the key being known ahead of time. Here is how it works:

Alice and Bob agree to use a prime number p (say 11) and a ‘generator’ g (say 5). These numbers are well established and everyone (including the adversary) is aware of it.
Hereafter, the following actions take place on Alice’s and Bob’s sides (MOD is the modulus / remainder operator):

Alice	Bob
Alice chooses an integer ‘a’ at random. Say 2. This is Alice’s private key.	Bob chooses an integer ‘b’ at random. Say 3. This is Bob’s private key
Alice computes `(g^a MOD p)` – this is Alice’s public key – and sends it over to Bob. `(5² MOD 11 = 3)` is sent over.	Bob computes `(g^b MOD p)` – this is Bob’s public key – and sends it over to Alice. `(5³ MOD 11 = 4)` is sent over.
Alice now computes `(g^b MOD p) ^a MOD p`. So Alice computes `4² MOD 11 = 5`.	Bob now computes `(g^a MOD p) ^b MOD p`. So Bob computes `3³ MOD 11 = 5`.
Alice obtains 5. This is now the shared secret key	Bob obtains 5. This is now the shared secret key

Diffie Hellman Key Exchange

Note that the only information being publicly shared is the generator g, the prime number p and the public key. The private key is never shared. Once a shared secret key is established, regular symmetric cryptography is be used to encrypt / sign subsequent messages.

The math involves modular exponentiation – raise a number to an exponent and take a modulo: x^y MOD z. While this looks computationally intensive, it is actually doable quite efficiently in O(log y) time. What makes the scheme secure is that computing (g^b MOD p) ^a MOD p or (g^a MOD p) ^b MOD p without knowing the private key takes a very long time, especially if the private keys and the prime number p are large enough. This is called the discrete logarithm problem (DLP).

6.2 RSA Cryptography Algorithm

As mentioned earlier, the DH paper defined the PKC and Digital Signature problems but left them open to be solved. This was solved for the first time by Ron Rivest, Adi Shamir and Leonard Adleman in their paper A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. The scheme is called RSA after the initials of the authors.

The key to devising a solution to the problem lies in finding a function which is easy to compute in one direction, but difficult in the other. Such a function is called a Trapdoor function. For RSA, the problem is factorization. It is easy to multiple large numbers but factoring them is difficult. In RSA, the encrypting (public) key E is generated by multiplying two large prime numbers p and q, and the decrypting (private) key D is generated using a different process involving p and q. Reversing the process would involve factoring which is computationally intensive.

To understand how RSA works, we need to understand two mathematical ideas which are both attributable to Euler.

Euler’s Totient function

This function is a measure of how factorable a number is. For a number n, the totient function Φ(n) is the count of all positive numbers ≤ n which are co-primes with it (do not share a common factor)

n=2, Φ(n) = 1 // 1
n=3, Φ(n) = 2 // 1, 2
n=4, Φ(n) = 2 // 1, 3
n=5, Φ(n) = 4 // 1, 2, 3, 4
n=6, Φ(n) = 2 // 1, 5
n=7, Φ(n) = 6 // 1, 2, 3, 4, 5, 6
n=8, Φ(n) = 4 // 1, 3, 5, 7
n=9, Φ(n) = 5 // 1, 2, 4, 5, 7

Interestingly, Φ(n) is multiplicative, which means that Φ(p * q) = Φ(p) * Φ(q) where p and q are co-prime.

Now of course Φ(n) is very easy to calculate for a prime number. It is always n-1. But for all other numbers, you would need to count the factors.

Therefore if p and q are prime numbers, then Φ(p * q) = (p-1) * (q-1)

This is the first building block of the RSA algorithm. Finding Φ(p * q) is very easy for us. But if you do not know p and q, you will need to count all the factors up to p*q which for large prime numbers will be computationally intensive.

Euler-Fermat Theorem

This theorem provides a relationship between the totient function and modular exponentiation. It says:

If m and n are co-prime then m^Φ(n) is congruent to 1 mod (n) or m^Φ(n) ≡ 1 MOD n

In other words m^Φ(n) MOD n is always 1

For example:

m = 3, n = 5
Φ(n) = 4
m^Φ(n) = 3⁴ = 81
81 MOD 5 = 1

This is the second building block of RSA. From Euler’s theorem, we can do the following:

// Euler's theorem
m^Φ(n) ≡ 1 MOD n 

// multiplying the exponent by constant k does not change the result
// since 1 raised to any exponent is still 1
m^k*Φ(n) ≡ 1 MOD n 

// multiplying 1 by any number gives the same number, so:
m * m^k*Φ(n) ≡ m MOD n

// simplifying:
m^{k*Φ(n) + 1}  ≡ m MOD n

With the previous result we have a way to represent a product of two numbers which depends on the totient function.

let there be an encrypting exponent e and a decrypting exponent d such that:
e*d =  k*Φ(n) + 1 
where 1 < e < Φ(n) and  e is co-prime with Φ(n)

// now we can derive the decryption exponent if Φ(n) is known:
d = (k*Φ(n) + 1) / e 

// and this gives us a very interesting result:
(m^e)^d (MOD n) = (m^d)^e MOD n = m

This is interesting, because we now have a way to recover m. If we compute c = m^e MOD n, then c^d MOD n == m. You get m back!

For plaintext message m, the ciphertext is calculated using the encryption key e as:
c = m^e MOD n 

For the ciphertext c, the plaintext can be recovered by using the decryption key d: 
m = c^d MOD n

RSA Problem

The key to RSA is that by knowing the public key (n, e) and the ciphertext C, it is computationally intensive to find out the plaintext M. This is called the RSA problem. Solving this would involve factoring n (a product of two large primes), which is computationally intensive. If there was an easy way to calculate Φ(n) without knowing the prime factors, the adversary would be able to find out the decrypting key d and thus calculate the plaintext. There could be other ways of solving the RSA problem which do not involve factoring, but there is none which is known.

Let’s walk through an example:

p = 13, q = 17 // take two prime numbers of similar size
n = p * q = 221
Φ(n) = (p-1) * (q-1) = 192 // easy for us to calculate since we know p and q

e = 5 // pick an encrypting exponent < Φ(n) and co-prime with Φ(n)
d = (k * Φ(n) + 1) / 5
d = (2 * 192 + 1) / 5 = 77

Public key  = (n=221, e=5)
Private key = (n=221, d=77)


To encrypt m=12:
c = m^e MOD n 
c = 12⁵ MOD 221 
c = 207

Now to decrypt c=207
m = c^d MOD n
m = 207⁷⁷ MOD 221
m = 12

To verify this, you can use an online calculator such as this one.

6.3 Digital Signatures

Recall, that for a digital signature scheme, “a user must be able to produce a message whose authenticity can be checked by anyone, but which could not have been produced by anyone else, even the recipient.” With RSA’s system of public and private keys, addressing the problem of digital signatures is easy.

Suppose Alice wants to send a signed message to Bob. To do this, she uses her private key to derive a signature from the hash of the message:

h = hash(m) // take a hash of a message using a regular hashing function
s = h^d MOD n

To verify, Bob can do the same thing with Alice’s public key and compare the signature:

h = hash(m)
s = h^e MOD n // which is the same as h^d MOD n

Thus RSA can be used for two separate purposes: 1) for confidentiality – use the public key to encrypt and send the message which can only be decrypted by the private key, and 2) for authentication – use the private key to sign a message which can be verified by anyone with the public key.

6.4 Public Key Infrastructure (PKI) Cryptography

With asymmetric cryptography, the problems of confidentiality, integrity and authentication are solved in theory. However, there is still the problem of a man in the middle attack – someone can intercept the public key of the sender and send their own key instead. In other words, how do you authenticate the public key itself? This is solved via the concept of a digital certificate – a document that includes the public key, information about the owner of the key, and the digital signature of the entity (typically called a CA – Certificate Authority) which has verified that the owner indeed owns the key.

Now obviously the problem is recursive – how do you trust the certificate of a CA? The most well established system is a hierarchy of CAs with a set of root CAs which are typically large global companies whose certificates are verified and trusted by web-browsers / web-servers / operating-systems. This is called a certificate chain – it starts with the public key certificate and ends with the root CA certificate. Which is why if you want to host a server that can establish secure communication with a web-browser, you need to put up a digital certificate obtained from a CA. If you put up a certificate issued by you, or if the CA issued certificate has expired, the web-browser would show a security warning.

6.5 Secure Transport Protocol – HTTPS/TLS

There are several ways of employing cryptography to provide secure communications. We will focus on the most widely used protocol which is used to protect communications over the Internet – HTTPS/TLS. Websites employ HTTPS which is essentially HTTP made secure using a protocol called TLS (formerly called SSL). It provides confidentiality, integrity and authentication using a variety of protocols. At a high level, the key steps are:

Figure out the protocol version both parties can use
Decide on the choice of algorithms and size of keys (called a cipher suite)
Authenticate the server via its public key + digital certificate issued by the CA
Use asymmetric crypto (both DH and RSA can be used) to establish a session key (shared secret)
Use the session key for securing the rest of the communication using symmetric cryptography

This is how TLS sets up a secure channel:

Client	Server
sends a “hello” message with the protocol version, a “client random,” and a list of cipher suites it can support.	server responds with its selected cipher suite, its digital certificate and the “server random.”
client verifies the digital certificate client sends a pre-master secret encrypted with the server’s public key	server decrypts the pre-master secret with its private key
client computes a session key using client-random, server-random and pre-master secret and sends a “finished” message indicating that it is ready for secure data communication.	server computes the same session key using the same information and sends the same “finished” message.

TLS 1.2 handshake

There are more nuances to TLS which mitigate against potential attacks, provide extensibility (for example for mutual authentication) and optimize for efficiency. That does not mean that TLS 1.2 is perfect. There are several known deficiencies in TLS, and subsequent versions of the protocol are expected to fix them.

6.6 Elliptic Curve Cryptography

As discussed earlier, RSA is dependent on the fact that multiplying two numbers is easy, but factoring the product to find the two numbers is computationally hard – a Trapdoor function. But is factoring the perfect Trapdoor? Hardly! There are specialized algorithms which are used to tackle the problem of prime factorization such as the quadratic sieve, and number field sieve (NFS) and these actually are more efficient at factoring larger numbers. In the long term it is fair to expect that over time as computing resources increase, the the bit-lengths of the keys would grow and the gap between multiplying and factoring difficulty would reduce. There is a need for a better Trapdoor and that’s where Elliptic Curve Cryptography (ECC) come in.

ECC algorithms are the same as typical public key algorithms like Diffie-Hellman and RSA, except that the underlying math is on points on an elliptic curve instead of regular integers. Thus you have ECDH (EC Diffie-Helmann) and ECRSA. Without getting into the math of elliptic curves, let’s focus on what they mean for cryptography. The underlying hard problem here is called the elliptic curve discrete logarithm (ECDLP), and it is 50x harder to break as compared to factoring or DLP. The best known algorithm for factoring – NFS – runs in sub-exponential time, while the ECDLP runs in full exponential time, which means that the key lengths can be much smaller and lesser computing resources are required. This means that as mobile device adoption increases, the need for ECC would increase and one would expect it to become the default for most cryptographic services over time.