Grammatical rules and software security
by Jeffrey Goldberg on
Failure to strictly adhere to rules of grammar creates surprising security holes that come back to bite you much later.
It’s been proven: accommodation of invalid data in web server software leads to security vulnerabilites. This very issue led to a recently exploited security bug but it happens all the time. In the security notes for the iOS 12.4 update, you’ll find seven instances where some security vulnerability was “addressed through improved input validation.” What does this all mean? And why does it matter? I’m so glad you asked.
When one computer system communicates with another, they talk in a language that has a grammar. A formal grammar is a set of rules that tells us how to form strings — single continuous lines of characters. I’ll use a very simplified version of an email address as an example, and we’ll play a little game.
According to the grammar above, which of these are email addresses?
If you answered 1 (firstname.lastname@example.org) and 4 (email@example.com), you are correct! The other four strings above are invalid in some way. If you included 2 (wenDy@hotmail.com) and/or 5 (firstname.lastname@example.org), you are the perfect kind of incorrect.
wenDy@hotmail.com is a perfectly valid email address — that the capital D makes no difference — but capital letters were not defined in the grammar. Developers think the same way. They want to build systems that accomodate these little differences; that have this inbuilt flexibilty. It’s clear what is intended, and they don’t want to break compatibility with what is already out there. But permitting this flexibility is exactly what causes the trouble.
The accommodation of an unofficial standard can be very hard to undo and creates additional incompatibilites because not every system will do so — and that’s the best-case scenario. More seriously, it leads to security bugs, particularly as different parts of some complex systems deal with the malformed data differently.
It turns out that the resulting inconsistency can be used to trick some parts of a system to ignore incorrect grammar, while others respect it. Using our email addresses above as an example, a front end might handle
wenDy@gmail.com “properly” and reject it, while the back end might handle it “improperly” and accept it. Attackers deliberately seek out the inconsistencies in how things are interpreted and exploit them.
The attacks work because the target systems don’t know that something has been trying to trick them.
There is a method of communication in which this flexible approach is explicit: human language. When it comes to speech, humans are very accommodating.
If someone uses “who”, where we might use ”whom”, most of us barely notice. We’re also able to identify the intended meaning when what we hear is ambiguous. When someone asks, “can you pass the salt?” we know they are probably not asking if we are capable of passing the salt. We even speak our native languages in slightly different ways; with accents, or just pronouncing things differently. This only rarely leads to any communication problems, and when it does we can usually sort them out.
But computing systems don’t have the luxury of saying, “hey, you tricked me! That’s not what I meant at all.”
The analysis description by PortSwigger Security does a far better, and more thorough, job of explaining how getting a system to interpret a string differently can be turned into a security exploit.
Perhaps it was a simple programming error and not a deliberate choice to accommodate input that doesn’t conform to the standards. And even if it was an unintentional bug instead of a design choice, shouldn’t we be looking for ways to prevent such bugs?
The answer, of course, is “yes”.
It would be nice to make sure that handling of input data to some system or function follows the specification precisely. Our very own XORceror, Pilar Garcia, will describe some of our work on that. You’ll just have to wait for a future blog post.