Hash Functions

Hash functionIntroduction to Hash Functions

Hashing is a technique in which an algorithm (also called a hash function) is applied to a portion of data to create a unique digital “fingerprint” that is a fixed-size variable. If anyone changes the data by so much as one binary digit, the hash function will produce a different output (called the hash value or a message digest) and the recipient will know that the data has been changed. Hashing can ensure integrity and provide authentication. The hash function cannot be reverse-engineered: in other words, you cannot use the hash value to discover the original data that was hashed. Thus, hashing algorithms are referred to as one-way hashes. A good hash function will not not return the same result from two different inputs (returning the same result from two different inputs is known as a collision). The collision domain of the function should be large enough to make it extremely unlikely to have a collision. All of the encryption algorithms studied so far, both symmetric and asymmetric, are reversible. However, there is no reversible function for hashing algorithms, so original material cannot be recovered. For this reason, hashing algorithms are commonly referred to as one-way hashing functions. Irreversible encryption techniques, however, are useful for determining data integrity and authentication. It is easy to generate hash values from input data and easy to verify that the data matches the hash, but hard to fake a hash value to hide malicious data. This is the principle behind the Pretty Good Privacy algorithm for data validation.


Sometimes it is not necessary or even desirable to encrypt a complete set of data. Suppose someone wants to transmit a large amount of data, such as a CD image. If the data on the CD is not sensitive, they may not care that it is openly transmitted, but when the transfer is complete, they want to make sure the image you have is identical to the original image. The easiest way to make this comparison is to calculate a hash value on both images and compare results. If there is a discrepancy of even a single bit, the hash value of each will be radically different. Provided they are using a suitable hashing function, no two inputs will result in an identical output, or collision. The hashes created, usually referred to as digital fingerprints, are usually of a small, easily readable fixed size. Sometimes these hashes are referred to as secure checksums, because they perform similar functions as normal checksums, but are inherently more resistant to tampering.

Encrypted passwords are often stored as hashes. When a password is set for a system, it is generally passed through a hashing function and only the encrypted hash is stored. When a person later attempts to authenticate, the password is hashed and that hash is compared to the stored hash. If these are the same, they are authenticated, otherwise access is rejected. In theory, if someone were to obtain a password list for a system, it would be useless, since by definition it is impossible to recover the original information from its hashed value. However, attackers can use dictionary and brute-force attacks by methodically comparing the output hash of a known input string to the stolen hash. If they match, the password has been cracked. Thus, proper password length and selection are crucial.

There are several different types of hashing, including division-remainder, digit rearrangement, folding, and radix transformation. These classifications refer to the mathematical process used to obtain the hash value. Here are two popular hashing algorithms:

  • Message Digest 4/Message Digest 5 (MD4/MD5): The message digest (MD) class of algorithms were developed by Ron Rivest for use with digital signatures. They both have a fixed 128-bit hash length, but the MD4 algorithm is flawed and the MD5 hash has been adopted as its replacement. However, the security of the MD5 hash function has been severely compromised, and a collision attack exists that can find collisions within seconds on a computer with a 2.6 GHz Pentium 4 processor. MD6 was introduced in 2008 and is designed to take advantage of the full potential of future hardware processors with tens and thousands of cores instead of the conventional uni-core systems.
  • Secure Hash Algorithm (SHA): This hashing algorithm was created by the U.S. government (NIST and the National Security Agency [NSA]) and operates similarly to the MD algorithms. The most common is SHA-1, which is typically used in IPsec installations, and has a fixed hash length of 160 bits. SHA-2 has a fixed hash length of up to 512 bits and has a word size of 64 bits, as does SHA-3.


External Links:

Cryptographic hash function at Wikipedia

Be Sociable, Share!

Speak Your Mind

*

© 2013 David Zientara. All rights reserved. Privacy Policy