Blockchain Basics Part 1: Cryptographic Hash Functions
I try to parse the technical foundations upon which blockchains (especially Solana) are built.
Introduction
I first heard about Bitcoin in 2012. I was in the first year of college and I remember browsing obscure internet forums to figure out how to create my bitcoin wallet and the mathematics behind how it works. The technology has come a long way since then. Cryptocurrencies are now a real market with huge trading volumes, hundreds of different coins, and notoriously volatile prices. In addition, modern blockchains are also capable of executing arbitrary logic - called “smart contracts” that in my opinion have the same revolutionary potential today as the internet had in the early 1990s.
The first half of 2022 has not been kind to the world of crypto. Firms are announcing layoffs and several famous bankruptcies and wallet hacks have soured the public view of crypto. Despite these setbacks, I think the technology has enormous potential and is worth learning about. Even if cryptocurrencies fall out of favor, the technology behind blockchains will likely endure in some form for the foreseeable future.
The mathematics of cryptography is the foundation upon which modern blockchain systems are built. A public, decentralized blockchain that nobody controls raises some questions:
If I send a request to transfer some of my bitcoin to my friend, how can the network of computers know that I was the one who sent the request without having a human verify my ID?
How can a smart contract that logs your attendance verify confidently that you are who you claim to be?
How does Solana know that the person who sent a transaction sent it at a specific time?
How does a bitcoin miner prove that they did the work to verify a block of transactions?
The answer to all these questions lies in the math of cryptography. Cryptographic systems let us encrypt data, prove the origin of documents and provide several mechanisms for blockchains to reach consensus. In this article series, I’ll do a deep dive into many of the components of the cryptographic systems that make blockchains tick.
Part 1 - Cryptographic Hash Functions
Cryptographic Hash Functions
Cryptographic hash functions map a sequence of input bits (that represents a file such as a picture or an email) into a sequence (usually smaller than the original file) of bits that can be thought of as a “digital fingerprint” of the file. This is because cryptographic hashes have some special properties.
Easy to Calculate: Takes a few seconds on a modern computer.
Pre-Image Resistance: It would take billions upon billions of years to compute what the input was given the hash.
Collision Resistance: It would take billions upon billions of years to find two inputs that have the same hash value (called a hash collision).
Hashes for Error Checking
One useful application of these properties of hash functions is error checking. For instance, you can check if the file that you downloaded to your computer is the same as the file on a server. The server can publish a large file along with its hash. Once you download the file, you can check if there were any errors during the download by locally calculating the hash of the file and checking it against the hash on the server.
In fact, if you’ve ever downloaded Ubuntu, you’ll know that the ISO images usually come with a hash. See for yourself at https://releases.ubuntu.com/22.04/! There should be a file there called SHA256SUMS that specifies the hashes of the files available to download. This is useful because, without hashes, you’d have to re-download the file to check if there are any errors, and even the newly-downloaded file may have different errors. The hash is much shorter than the original file and can be compared easily with the hash that is calculated on your computer.
Storing Passwords
In the early days of the internet, there was a time when websites stored their user’s passwords on their server with little to no protection. Every time a user enters a password, the server would compare the password with the actual password stored on the server. However, a breach of the server would mean that the user’s passwords were available for anyone to see.
Hashing is the solution to securely storing passwords. Secure modern websites usually take the user’s password, add a random but deterministic string to it (called a salt), and then hash the password before storing the hash on the server. Usually, this is done on the user’s computer even before the password is transmitted to the server. When the user tries to log in later, the same procedure is repeated, and the hashes are compared. This way, the server never needs to know the user’s password and the pre-image resistance of hashing functions means that an attacker could never guess the original password from the hash, even if they stole all the data from a website’s server.
Finding a Specific Hash is Proof of Work
One prominent application of cryptographic hashing is Bitcoin’s proof of work consensus mechanism. For the bitcoin network, verifying a block of transactions involves finding a very specific number called the nonce that, when added to the block, makes the hash value of the block start with a certain number of leading zeroes. The nonce is very difficult to find. Indeed, it usually takes some of the most powerful mining computers around 15 minutes to find the nonce. However, anyone can verify in less than a second that the hash of the nonce and the block does indeed have the required number of leading zeroes. So the existence of the nonce is proof that someone has done the work to find it.
Chaining Hashes Prove Passage of Time
The Solana blockchain uses cryptographic hashes to prove the passage of time and synchronize different validators on the network. Solana calls this the Proof of History mechanism.
Proof of history works by starting with a random seed phrase and recursively feeding the output of the previous hash as input to the hash function for producing the next hash. The pre-image resistance property means that it is impossible to predict what the hash of a value will be without first calculating it. This means that the value of the hash at a specific iteration cannot be predicted in advance without actually calculating the hashes for that many iterations. The chaining of the hashes means that this calculation cannot be sped up by parallelization.
So if you tell me the hash value of the chain at two different iterations, I can be sure that a certain amount of time has passed. You’ve either calculated the hash from the first observation till the second or you’ve actually observed the blockchain.
Summary
Cryptographic hash functions produce digital fingerprints of data. The hash has properties that make it useful for many applications, including checking errors in files, securely storing passwords on a server, and proving the passage of time on blockchains.
Nice one. Enjoyed reading the post. Also I just asked a question to myself about cryptographic property like is it possible for any quantum computing strategy to crack the hashed version of the information as it can deal with such exponential problems however it is secure.
Waiting for part2.