SHA stands for the secure hash algorithm. Which is interesting given that it has just kind of been broken But I'm not going to talk specifically about the attack on SHA today that's for a different video, but What I wanted to do was talk in a little bit more detail about hash functions and what they are And how SHA in particular works so you can get an idea of a kind of way these things are implemented let's have a quick look at what a hash function is briefly because Tom's got somebody covered it in this video and He's gone into a lot of the reasons you might use a hash function.
The kind of hash functions I'm talking about are not the ones that we've been talking about for Hashing passwords. Those ones have to be quite slow in some sense because you need them to be secure We're going to talk mostly about the hash functions that are used routinely in cryptography for things like message authentication Digital signatures, and so on. So they need to be fairly quick, both to verify and compute. A hash function Take some string, right, let's say "abc" and it turns it into some fixed length string (that's not usually three long) of random So we know a bit string right, but so so you, 011001011… Go forward this way for however many bits that hash function is. Now, there's a few important properties of Hash functions That we care about for cryptography, but the most important one perhaps. Is that it's essentially pseudo-random So that means that we put in "abc" And we get out something that is in no way like "abc" and appears completely random to us and if we change this even slightly Because it's appearing random this has completely changed So let's have a quick look at SHA-1 as an example just so we can see this in action I'm on some page that has a script that calculates hashes on the fly so I can put in "abc" and You can see that the hash is a9993e and so on all the way up to d, right? This is the SHA1 hash.
A SHA1 hash is 160 bits long. If I change this C to a D the hash is completely changed. So there's the appearance of randomness – the idea that This hash is actually not related to this at all Even though it is and we know it is because if I put C back again We're back to a9993 So we can use this to verify messages haven't been changed or verify signatures on certificates And we can do it knowing that we have the appearance of randomness , but actually it's not random at all.
Today We're going to talk a bit about how you actually write a hash function to do this How do we take something that essentially isn't random with a very known structure and turn it into something that looks like nonsense Such that we can use it. Now, There'll be people raising a few eyebrows that I'm using SHA1 as an example to do this But actually there's fairly reasonable reason to do so. First of all you know we might also talk about the weaknesses at some point but also SHA-1 bears a striking similarity in structure to MD4 and MD5 which is see a lot of use historically and SHA-256 and SHA-512 which is a SHA2 Which currently is in some sense a standard that everyone uses right SHA3 is quite different And that's is something else for another day So SHA1 was developed by the NSA And released and published in 1995 now a lot of people don't trust things that the NSA do sort of by default Which might be fair, but in this case actually SHA1 was quite good for a long long time when there were some concerns …
Recently much more serious concerns, but Originally the NSA weren't doing it as a back door and stuff the NSA need cryptography Just like everyone else and this is a good function MD5 had a lot of problems and so what they basically did was extend it and make it better SHA1 takes any length of string and outputs a 160 bit value. Alright, so that's 160 zeros and ones. The question then becomes: I've got a string but could be "abc" or it could be an incredibly long file or You know a whole movie right, and I want to calculate 160-bit signature of that How do I even get started doing that well the answer is that I basically have a loop that takes 512 bit blocks of data one at a time until the file's expended.
Let's look at an example of how SHA1 works on just a single message of exactly the right length And then we'll just talk briefly about what you do when inevitably isn't the right length Which is almost always, right? So SHA1 takes a message of n blocks of 512 bits in length, and If it's only one block – if the message is exactly 512 bits in length, then we only run it once, in essence And then we out put the hash at the end so SHA-1 Starts with an internal state then we bring in bits of our message one at a time And we change that internal state and after we've done that at the very end when there's no more message left We just read what the internal state is and that's our hash alright so we're going to basically be taking the internal state and updating it with the message until We run out of message, and then as soon as that process stops we can read the results That's how the SHA family of hash functions works, so our internal state we call H so I'm going to say H0 H1 H2 H3 and H4 Now the internal state of SHA1 is exactly the same length as the hash that it produces.
Which is 160 bits. Which is five 32-bit words four bytes each you know for 32-bit machine this would be an int So this is initialized based on some constants. Which is defined in the standard We might talk about that in a bit but it's not very important and what we're going to do is we're update these h's as We bring in our message and then at the end.
We'll see what the Hs are and that's our hash function So how do we do this? Well? We have something [called] a compression function? It's going to take in this data and a bit of message Turn it into another set of h values and that's going to repeat as we have message But that's only going to happen once this time because my message is exactly 5 12, which is very handy So this is our compression function And I'm going to rename these slightly just to confuse everyone to ABc dearly so at the beginning of our compression function We copy B's the internal state into a b c d and e We then perform 80 rounds of char compression function, right? Which is like this so x 80 now what that's going to do is take in words from our 512 bit block of our message, so if this is our message here that message is 512 bits this is going to come in at this point and be mixed in with this ABc and D so well for now We won't talk about exactly what's going on in this compression function But the idea is that the bits of abcde are being combined together? They're being shuffled They're being commuted around to make it look more and more random as we go and at the same time We're bringing in bits from this message to further increase the appearance of mandamus But also to make sure that this shower function is calculating a digest on this specific message rather than just a general one That's the same every time for this message.
We're always going to perform the exact same algorithm So if we put in this message a second time the shower function will produce exactly the same result Now once we've done this and we shuffled up abcde will be left with a new Abcde so a b c d and e And then we finish this block by bringing our h values down here and adding [them] to these values here to create a new H naught H1 H2 H3 H4 the State is now been updated by whatever we did in this compression function by just adding to it all right now all addition In Char is done without any overflow modulo 2 to the 32 well that means is that if you ever go over? The maximum value allowed by a 4-byte integer you lap back around again Right which is one of the reasons why shark arm in reverse because you might lose information that way This is not encryption.
We don't need to be able to reverse it, so this state is Finished now if our message is exactly 512 bits We need these 18 or h1 h2 h3 h4 values out that is our hash, so for short messages. We're done I could just you know go home in actual fact the the principle of extending this to arbitrary length messages right in increments of 512 Bits is We copy this state back up to the top and we repeat the process and then we copy back up and we repeat the process For as many blocks as we have of our message 512 bits at a time of our message We feed it in we alter the state using this approach here, and then we lead off a state when we're done, right? That's basically how it works So be the security of SHA is all in this compression function and what it's doing.
If it's shorter than that, what happens there? If it's not a multiple of 512 bits. We're going to have to do some padding right? Char only works with 512 bit blocks, so what we do is if we have our message. Which is let's say 1 0 1 1 0 1 it's not a very long message. If we want to pad that up to 512 bits We start with [a] one. We pad with 11 [alright], so I'm going to sort of mark, de-mark the padding here so we know we go with 0000 and then we finish off the message with the length of the actual message in it So we know where to sort of remove the padding which in this case is 7 so in Binary 1 1 1 so 1 11? Obviously would a lot more bits for your length than I have done.
You get the idea now this padding scheme Ensures that messages of the same length and messages that end in the same way or in very similar ways don't share the same padding And don't end up being the same that's very important, so this approach to SHA [its] repetitive Updating of the internal state with a compression function in essence is called a merkel down guard construction This was sort of independently proved by murkland damn. [God] But essentially what's good about it is if your compression function is good and has good pSeudo-random properties then so does your shower function Which is of course very useful right? The problem is that the compression function of char one is not so good that the attacks are To the 18 they can be reduced somewhat to about 2 to the 60 something like this which it becomes Into the realm of possibility for people with a lot of money.