In this article I am going to introduce you to TLS, explain what it is, why we need it and why some people don’t like it. After the basics are layed out, in the next post, I will introduce mcTLS and give a rough overview why I think it is a bad idea.
This topic comes up in the blog because I have to do a seminar on the topic and think it is interesting enough to share it to a broader audience.
Part 1: What is mcTLS and why is it a bad idea?
First off, what on earth is TLS?
TLS is the abbreviation for Transport Layer Security and it tries to do exactly what the name suggests: It was made to secure data on the transport layer. Before I explain why there are layers and against whom we are securing it, I want to detail on how important and widespread TLS is.
You are using TLS every time your browser displays a (hopefully green) lock at the very left of the address-bar when you visit a website. Every time you type in
https:// in your browser, you use TLS. When you are using WhatsApp, Email or another messenger, you use TLS for parts at least. Even when reading this blog, your connection is secured with TLS (see the lock in the address bar?).
What we want
Now to the explanation what TLS tries to achieve. Security researches have defined many nuances of what ‘security’ is, but I want to use the most basic three here: confidentiality, authenticity and integrity. If you are not familiar with them, don’t worry, they will be explained in detail in an instant. Although TLS can be used without the internet, I am going to explain the security properties regarding internet communication.
If your network knowledge is a bit dusted, worry not: For the first section, it is sufficient to know that ordering something from amazon is basically nothing more than communicating with one of their servers over an internet connection by sending messages back and forth. The gory details of how this work will be spared.
In the following I use the same example multiple times: You work in an office for some company and use your employers internet connection. You want to order something from amazon and your employer wants to spy on you. For instance the employer wants to identify employees with a higher risk of getting pregnant, using drugs, being kinky or looking for another work opportunity. Albeit being illegal (at least in Germany), this is not that unrealistic. Furthermore, your boss does not necessarily has to be evil, maybe it is just one guy from the IT-department that is too curious and wants to see the naughty images your spouse sends you. Or your employer gets hacked and the nasty hackers try to earn some dollars by threatening to sell sensitive information. This scenario is not limited to the employer attacking your connection, perhaps the Airport- or Hotel-Wifi you are currently connected to is doing nasty stuff or your Internet Service Provider (ISP) is evil or compromised.
The 3 basic security properties
Let’s start with the most obvious, confidentiality: Confidentiality is to ensure that if you communicate with someone, no other parties are able to read your communication. Imagine you send your friend Bob an Email, in that case confidentiality means that no one else than Bob can read the contents of the Email. You don’t send emails? Well than think about ordering stuff on Amazon. Confidentiality means here that no other party than you and Amazon knows what you ordered (at least based on the order process, the seller, the postman or even your spouse might get to know this based on other ways). If you order at work, especially your employer, whose internet connection you are using, should not be able to tell if you order a new book for your spouse or some nice dessous for her (or him).
At least as important as confidentiality is authenticity: You make sure that you actually talk to the person you are thinking you talk to. Assume you can establish confidentiality on a communication, what does it help with keeping secrets when you cannot be sure if you are sending your order to amazon or just someone pretending to be amazon (e.g. your employer). It is important in this case to understand the differences between authenticity and confidentiality in that case. Your (hypothetically) evil employer might impersonate amazon and communicate with you over a confidential channel and then forward your order to amazon. In that case, your coworker cannot eavesdrop on you but your employer nonetheless gets to know if you are buying something to read or to wear.
The last security property we discuss here is integrity. Imagine again ordering something from the internet. It would be a shame if your new book would be delivered to a nasty hacker instead of you because someone can change your order on-the-fly, albeit not being able to read it. Integrity checks can prevent this.
So, if we can achieve all three mentioned security properties, an attacker (e.g. the evil employer) can neither read your internet traffic you exchange with amazon, nor impersonate amazon unnoticed, nor can they change any part of the secured communication in their favor unnoticed.
Does this make you anonymous on the internet? No. Here’s why:
Your employer can’t read what you communicate with your communication partner (e.g. amazon) but they can see with whom you are communicating. This information might be enough to get you fired, for example if you connect to
phncdn or something comparable.
The endpoint can see who you are. If you log into amazon, they know who you are. Even if you don’t log in anywhere, chances are high they can identify you anyways, e.g. by matching your IP-Address or use browser fingerprinting. This is enough to fill a whole other article.
Anonymity is not the goal of TLS. There are other mechanisms that try to achieve this but they will not be discussed here. Look into TOR, VPNs and problems with DNS if you are interested.
Some notes on internet communication
To understand the further discussion, we need to clarify a few concepts of the internet, if you already have a rough understanding about the most popular protocols, you can skip this section.
We will discuss TLS (and after that mcTLS) with HTTP as an example. Keep in mind, that we can easily exchange the protocol while keeping TLS in place.
Let’s explore what happens if you access
amazon.com (At least the parts relevant for the article): Your browser sends a
GET request to a server and receives a response. This is done through HTTP, HyperTextControlProtocol. These protocols define how computers or computer programs communicate. The Protocols used for internet communication are layered on top of each other. This means that one protocol is conveyed by another and they don’t know of each other and hence they can be swapped out for an equivalent protocol without anyone noticing something.
In the case of accessing
amazon.com, the most important protocols layered on top of each other are IP, the Internet Protocol, TCP, the Transmission Control Protocol, and on top of all those the aforementioned HTTP. As you can see, in this basic setup there is no TLS in the stack. Nonetheless, this stack is perfectly valid and is most likely used when you access a website and your browser shows the
http:// prefix that indicates an unencrypted connection.
To illustrate the composability of the protocol stack, let me resort to an analogy: transport of cargo. Imagine a large container ship transporting a load of standardized containers. Neither the content of the container cares if the container is on a ship, a train or a truck nor does the ship care about the contents of the containers. In this analogy we can easily exchange the content of the containers or the means of transportation without notice (except we want to ship dangerous goods or something that needs to be handled differently).
The ‘raw’ http request looks something like
GET / HTTP/1.1
This means something like: “I want to
GET the resource
/ (indicating the content root) and the requestor supports
HTTP/1.1 as its protocol. The rest of the request is ellipsed for simplicity. As a response we would get the website itself or a referrer that indicates where to find it. The underlying TCP only knows of the request as ‘payload’, not what it contains or if it is secured.
Stacking protocols provides, amongst the interchangeability, further benefits when designing protocols: one can abstract away underlying protocol implementations and just assume their provided guarantees are given. So is it possible for HTTP to assume that the message lost is taken care of by an underlying protocol. The HTTP request may further contain an authorization header, e.g.: (when signing in to amazon this is a bit more complex, but in the end they need to check your password and username)
Authorization: Basic Ym9iOnNlY3JldA==
Ym9iOnNlY3JldA== is the Base64 encoded string
bob:secret, in this example portraying the username and the corresponding password. Keep in mind that Base64 encoding and decoding adds no security, it just eases handling the data in transit (like wrapping a stack of boxes in transparent foil does ease transport but does not obscure its content).
Let’s illustrate why this imposes a security problem: Imagine you are at a workstation at work and try to log in to amazon. Everyone on the path your request takes from your workstation to amazon will be able to read your password and impersonate you. You might think that this is really hard and only nasty hackers can do this but most likely your entire IT-department would be able to do so, even your co-workers, depending on the setup. Further, if any component or company on the way is compromised by an attacker, the password can be stolen there. Most likely your employer is even required to store these details without being allowed to look at them.
In the best case, your stolen password is only used for your amazon account but what happens if you used the same password (or a simple variation) for your emails, your bank, social security system and so on? (If you got sweaty palms because you feel guilty, better change them now) Even if your password is safe, it can be a problem if an attacker changes the content of such a request. Think of an online-banking scenario in which you transfer a considerable amount of money to your friend for buying his boat but an attacker changes the recipient on the fly without you noticing it.
In the next chapter I introduce some cryptographic primitives that help us securing the aforementioned scenarios.
Some cryptographic primitives
This chapter gives an overview of some cryptographic primitives but does not explain them in depth, nor will I discuss any algorithms here. If you are interested in the topic, feel free to look up the details online, the Wikipedia entries are quite good in this regard.
Encryption is the act of modifying a message in a way that it can be only restored if you know or possesses a given secret. There exists an encryption function
E and decryption function
D that, if applied in sequence, yield the original message:
D(E(m))=m (this means that we can decrypt a encrypted message and get back the original). There is symmetric encryption that requires the same key to encrypt and decrypt messages and asymmetric encryption, that employs a public and private key. Think of symmetric encryption like a box where you can put messages in and then use a key to lock it and your communication partner can unlock the box with a copy of the same key. You can think of asymmetric encryption like a post-box where everyone can encrypt a message (trow it in the post box) but only the owner with the key matching key can decrypt it (unlock the postbox to retrieve its contents). In general, symmetric encryption is way faster than asymmetric encryption. In most requirements, an encrypted message should be indistinguishable from a random sequence. This means that if I provide you with a sequence
e5af405f4fec536014eb048a you must not be able to tell if it is random number or your name encrypted, even if you (obviously) know your name.
Checksums enable us to verify that a message was not altered in transit. A checksum is like a fingerprint of a message, tiny compared to the message in size but quite hard to fake. Let’s compute the checksum of the following two bank transactions: ‘Transfer 10$ to Alice’ and ‘Transfer 90$ to Alice’:
➜ ~ md5sum -- Transfer 10$ to Alice ad34d5d2c8e0fa98984f5a3b860c92bf - ➜ ~ md5sum -- Transfer 90$ to Alice 9ed0a1db9be8d4a17fc5a5059d9a37b2 -
As you can see, even similar messages produce completely different checkusms.
If you are provided with a checksum and a message, you can compute the checkusm of the message again and compare it with the provided one to ensure the message was not altered. But how can you know when receiving a message, that the checksum itself was not modified? The next section provides a way to do this.
Digital signage is a way to ensure authenticity of a message. Like signing a letter on paper lets the recipient verify that it was actually you who wrote the letter, a digital signature enables us to do the same for electronic messages. One problem, as with real-world-letters persists: if you communicate with someone for the first time, how do you now their signature is legit? Again, the next section provides you with a (partial) solution for this.
Certificate authority (CA) is a trusted third party that states ownership. In our letter-example, think of a common friend of you and your new communication partner, that both of you trust. If you both send that friend your signatures, he can again sign them to verify that they are legit and you then can send this ‘certificate’ to your communication partner as a proof of your identity. In the internet, your relation to the CA is not like that to a friend that you both know but rather to an official bureau controlled by the state, that issues a official document that contains your signature, but I like the friend example better since you don’t have to pay your friend an insane amount of money. An internet-ca normally signs a public key. Since you are the only one possessing the private key, we can assume that if someone can read messages encrypted with the public key, it has to be you. (or a hacker who stole your key)
With the aforementioned primitives, we can assure secure communication between you and the online shop or your bank:
- You check the signed certificate of the bank if it is valid and signed by a trusted CA. Also check if it matches the public key presented.
- Compute a shared key (we get on to how to do this)
- Each and every message from now on gets encrypted with a shared key
- You can use checksums to verify that your communication has not been tampered.
This is, in a nutshell, what we do with TLS. The next chapter describes how TLS works in more detail.
The basics of TLS
Before sending secured payload data over a connection, TLS establishes the connection with a so called handshake. If you are interested in the details of a TLS handshake, have a look at this website that explains every byte transferred. If you want to wade through the gory details, you can have a look at the official RFC
In the following I summarize how the handshake works but leave out many details like cipher specs that enable a whole class of attacks but are not relevant for this article. We will only take a look at the server-authenticated mode, where the client ensures the authenticity of the server and simply ignore the mode supporting mutal authentication.
In short, the following happens:
- The client generates a random number that it sends to the server alongside the name of the server it wants to connect to
- The server also creates a random number and sends it to the client
- The server also sends its public key and a digital signature, that shows that a CA has attested that the public key belongs to the service/name/company/whatever
- The server can compute an ephemeral session key from the provided random data of the client and server that yields a pair of public and private key used for agreeing on a common key later on. This ephemeral key is signed by the “real” public key of the server to assure it is legit. The server then sends the ephemeral public key to the client.
- The client does exactly the same except for signing (since it has no trusted public key in this mode) and sends it to the server
- Now both parties can compute the session keys used for the fast, symmetric encryption. They do this using some crypto-magic that enables them to compute the same key in a way that an attacker that can read every message cannot compute it. It is important that the initially exchanged random numbers are used in addition to the own private key and other parties public key to compute the session keys. This makes each session unique, given that the random numbers are actually random.
- Now they switch to the newly computed session keys and send each other some initialization vector and the encrypted Message Authentication Code (MAC). This MAC is generated by generating a digital fingerprint of all exchanged messages and a secret. Thereby both of them can assure that they have the same view on the conversation and no messages were changed.
- As a last step they send a
pongto signal that they are both ready.
It is important to note that the checking of the MACs is absolutely crucial to the security of the whole system since otherwise they cannot guarantee end-to-end encryption.
Why use that many secrets?
Let’s briefly discuss why each part exchanged is in place before we close this part of the series.
- Random numbers: The initially exchanged random numbers are used to guarantee different session keys (and MACs and ephemeral keys) for each session. This ensures that an attacker cannot replay messages captured in the past or re-establish old sessions.
- Ephemeral keys: Although we could use the signed public key of the server, we generate an ephemeral key so that if one communication and its key gets compromised, all other communications are safe (depending on the sort of compromise)
- Signed Certificate: The CA assures to the client that the public key signed belongs to the the server, company or service.
- MACs ensure the integrity of all messages passed in the handshake.
Part 2 of the series will show some attack scenarios on TLS and describes why they won’t work. In that part we also discuss mcTLS, show why people might want it, especially when regarding what alternatives exist and why most of the alternatives and mcTLS give a false sense of security and thereby endanger the user.