The other day I was reading Jos Wetzels' post on the Full Disclosure mailing list regarding a vulnerability in the open source social networking kit HumHub. One of the issues he pointed out was a PHP 'type juggling' attack where an attacker can force a password reset against HumHub for a user many times until a specific value is selected that reduces the password entropy (uniqueness), allowing her to access accounts without authorization.
I have not previously worked with HumHub, but the illustrative code Jos pointed out was intriguing (press CTRL+C to close the cat output after the closing PHP ?> tag):
$ cat >bahhumhubbug.php <?php if (md5('240610708') == md5('QNKCDZO')) { print "Yes, these are the same values.\n"; } ?> $ php bahhumhubbug.php Yes, these are the same values.
Huh? No, those are not the same values:
$ echo -n 240610708 | md5sum 0e462097431906509019562988736854 $ echo -n QNKCDZO | md5sum 0e830400451993494058024219903391
The issue here is related to how PHP performs type checking. Being the friendly language it is, PHP attempts to convert variables of different types in a way that makes comparison work as the programmer might expect*. For example, consider this "math":
$ cat >math.php <?php print('5' + 8); ?> $ php math.php 13
Here, PHP adds a string '5' to the integer 8. This is intuitive for a lot of people, since 5+8 is indeed 13. The magic of PHP is that these two values are of different types, but PHP accommodates the addition anyway. Contrast this with a strong typed language like Python:
$ python -c "print '5' + 8" Traceback (most recent call last): File "<string>", line 1, in <module> TypeError: cannot concatenate 'str' and 'int' objects</module></string>
PHP is convenient in this way; if you accept a numeric input in a HTTP POST value, for example, you can treat it as an integer if it fits any of the loose typing rules PHP uses for integers. For beginning programmers, this allows them to sidestep any type conversion routines necessary to force data into a specific type.
This is all well and good... until it's not. Let's revisit Jos' code example from before. Why are the two MD5 output values equal to PHP? Because PHP thinks they are numbers:
$ echo -n 240610708 | md5sum 0e462097431906509019562988736854 $ echo -n QNKCDZO | md5sum 0e830400451993494058024219903391
PHP sees a number (0), followed by the letter "e", and it converts the MD5 string to exponential notation (e.g. 0462097431906509019562988736854). Because both MD5 hashes start with "0e", they both evaluate to 0, making them numerically equivalent.
In the case of HumHub, an attacker can force a password reset without authentication. HumHub selects a new password using a variety of techniques (which is also problematic, as pointed out by Jos) and MD5 hashes it. If the resulting password hash starts with one or more zeros, followed by "e" and the rest of the characters are numeric, then an attacker can login with the password "0". If the password hash does not match this criteria, the attacker resets it again and keeps trying until they can login.
I posted a note about this on Twitter the other day, and one response from @nicoduck reminding people to stop using MD5 was awesome:
Since SHA1 hashes are longer than MD5 hashes, it is less likely that a randomly selected value will meet the 0 + e + digits rule for PHP to evaluate it as an exponential value. I left this Python code running on my Mac overnight to test SHA1 hashes for this pattern:
>>> import hashlib, string, itertools, re >>> for word in itertools.imap(".join, itertools.product(string.uppercase + string.lowercase + string.digits, repeat=8)): ... if re.match(r'0+[eE]+\d+$', hashlib.sha1(word).hexdigest()): ... print word ...
The itertools.imap() call returns a Python iterable object that consists of all the string combinations of upper and lowercase letters and numbers up to 8 characters in length. Each "word" is SHA1 hashed and compared to a regular expression to see if it meets the 0 + e + digits requirement. This routine is only single-threaded, but it was late at night and I was ready for bed.
In the morning, I awoke to this output:
AAJd1x3j AAPkbYlH AAZlIwOZ
Sure enough, all these strings SHA1 hash to what PHP would consider exponential numbers:
$ echo -n AAPkbYlH | sha1sum 0e51223820731210116366152413868569204545 - $ echo -n AAJd1x3j | sha1sum 00e6811279456694288001763399976992804485 - $ echo -n AAZlIwOZ | sha1sum 0e13965443605273185827757762777509208778 - $ cat p.php <?php var_dump((sha1('AAPkbYlH') == sha1('AAJd1x3j')) == sha1('AAZlIwOZ')); ?> $ php p.php bool(true)
The Fix
My good friend Tom Hessman pointed out that this problem is not inherent in the PHP "===" operator:
$ sed -i php 's/==/===/' bahhumhubbug.php $ cat bahhumhubbug.php <?php if (md5('240610708') === md5('QNKCDZO')) { print "Yes, these are the same values.\n"; } ?> $ php bahhumhubbug.php $
The "===" operator returns true only when the values are equivalent and are of the same type. That said, I'd be willing to bet that other PHP projects are vulnerable to similar types of bugs, judging from the popular use of "==" in PHP projects.
As an instructor, I know a lot of pen testers out there get to a point in their careers where they are great at using tools, but lack coding skills. Coding isn't for everyone**, but if you want to be a great penetration tester, you need to be able to read code from time to time. I'm not recommending you print out the source code to a popular web app and sit down with a good cup of coffee and read it end-to-end. However, you should be able to make judicious use of grep (or findstr.exe, I won't judge) and search through source to find interesting bits of information that is worth a second look. Understanding tidbits about the language you are looking at (like the difference between "==" and "===" in PHP) can be extremely useful:
$ egrep -R "md5|sha" * | grep '==' helpdesk/helpdesk-admin/login.php: if(sha1($_POST['password']) == $admin_password) {
Uh-oh.
-Joshua Wright
@joshwr1ght
*As "some programmers" might expect. This programmer finds this weak typing thing bizarre.
**Coding isn't for everyone, unless you sign up for Mark Baggett's SANS SEC573: Python for Penetration Testers class. You'd be amazed at how much you can get done with a little Python!