When dealing with security, software QA deserves its own mention. QA, or Quality Assurance, can be considered its own field of study. Most people believe software QA is checking that a program works. This is actually a very small part of QA. Most of QA is making something fail before a user can. A QA specialist can break software with great ease. This is also why great QA skills are essential for programmers. A programmer who is adept at breaking things can write code that can’t be broken and this is essential for software security.
Software security stops hackers, so what exactly is a hacker? The term Hacker has multiple connotations, depending on who you ask. To some a hacker is a great coder. To others, a hacker is a criminal. It’s an ambiguous term, so to clarify things let’s define a hacker as a malicious actor, or someone who accesses computers or information they aren’t supposed to. Any programmer can learn how to be a hacker, if they are good enough. The skill most essential to hacking is actually QA and not programming.
Most hackers utilize program flaws to gain access to computers or data. So how would a hacker start? All program hacking begins with source code. A programmer looks at the source code for a particular program and thoroughly QAs it. Any bugs are analyzed to see if they can be used maliciously. Open source makes this very convenient, since all source code is publicly available. While this also means the code can be reviewed by anyone, in reality people don’t sit down and QA someone else’s open source project. QA takes a lot of time and effort after all. With closed source products, if you can look at a binary then you can disassemble/decompile it to see the code. Tools also exist to view the code and memory as it runs. (Which is important if the code is self-mutating or self-decrypts.) Where source code is not available, such as client-server technology, you can examine the communication between the two and try things with that.
Not all bugs lead to security vulnerabilities. In fact most bugs are completely harmless, as far as security goes. One type of security bug is a backdoor. A backdoor lets a user access something they aren’t supposed to. String conversion issues, another common bug, can also cause exploitable bugs. Finally buffer over or under runs can lead to security issues.
A buffer over/under run refers to writing information into memory. Programs often use pointers, such as with traditional arrays. A traditional array is an allocated section of memory. A program can read and write to different memory positions. Each memory position corresponds to a position in an array. Suppose an array has been allocated for ten items (a[0] through a[9]). Accessing a[10] or a[-1] doesn’t always cause an error, since the program is accessing legitimate and allocated memory. However doing this can read or alter memory the array shouldn’t be able to access. This is a buffer over or under run.
Most of the time, buffer over/under runs do nothing other than crash the program. This is because arrays are allocated on the heap and not the data or code segment. In segmented architecture, different segments of the program have different purposes. A code segment contains all instructions for the program. A data segment contains the static data. In some operating systems, the code and data segments are protected, meaning you can’t write to them. So a buffer overrun that affects protected segments just crashes the program. Some operating systems allow code and/or data segment changes after the program runs. Buffer over/under runs on these operating systems have significantly more security risks. Always pay close attention to buffer over/under runs.
String conversion issues abound, mostly because unskilled programmers forget about them. When programming, remember that a string is not a string. A regular string is not an HTML string, is not a database string, is not an XML string, is not a JSON string, etc. Even regular strings come in various types, such as 1 byte, 2 byte, 4 byte, UTF-8, UTF-16, etc. To help combat conversion issues, computer languages now offer automated string conversion. For example, nearly all languages now use parameterized queries for databases. This is the wrong approach because it makes the programmer lazy and obfuscates string conversion. Additionally you can’t always know exactly how a string is converted. Does it expect UTF-16? What happens if you give it UTF-8 instead? Automated conversion ends up making a mess of things. Additionally there will always be areas where automated conversion is impossible.
Consider parameterized queries in database APIs. Relational databases run on SQL, or Structured Query Language. A program has to generate SQL and send it to the database, and SQL is encoded in a string. So many programmers made data conversion mistakes that they introduced parameterized queries. A parameterized query is an SQL statement with placeholders where variables should be added. Variables are provided and this is converted to SQL before sending it to the database. This would seem like a good solution, except it doesn’t handle all cases. SQL allows for multi-record inserts, which parameterized queries can’t handle. The second problem is that SQL requires different types of encoding, depending on where the data is used. Specifically, the LIKE operator requires its own form of encoding. This is because LIKE is a wildcard-type matching operator with its own form of encoding for wildcard characters. Parameterized queries are an incomplete solution. While they prevent some common mistakes made by unskilled programmers, they prevent programmers from learning good programming practices.
Another string conversion example is HTML. I often add unusual text to input fields for fun, such as: “><script>alert(“Hello.”);</script> Display the page and you often get popup Hello messages. You can also add Unicode characters to a string, to see if it saves properly then displays properly afterwards. The first example checks for proper character encoding. If the page properly encodes the data then the text, as it’s typed, appears in the page. If it doesn’t then the page has character encoding issues. The second example checks for end to end character set encoding. The first example is actually the security problem. If I can enter JavaScript and have it execute on a page, then that’s a code injection bug. If another user sees this information on his page then that JavaScript runs on his account.
String issues include string types. Most strings are null terminated, meaning the last character in a string is 0. This is not always the case. Strings in some applications are length based and can include nulls. While this can sometimes cause security bugs, most of the time it just messes things up.
Backdoors are also common, and most of them are unintentional. For example, consider a web page that serves icons. It has a simple interface, where the caller requests a certain icon from a server script. The server script checks the requested icon. If it’s a certain type then it generates a custom icon and returns it, otherwise it tries sending the matching file. The server code uses the provided string directly on the hard drive. The programmer’s intention is innocent enough but this is a huge security vulnerability. A hacker can use this to request any file in the system, or at least any file the script code can access. (Such as requesting for the file ../../index.php) A hacker can use this to retrieve the system password file, database files and more. Backdoors like this happen when user input is directly used to access a file. To avoid these kinds of issues simply avoid using user input in filenames. Another similar backdoor uses system commands instead of the file system. Likewise, never use user input in any system command.
Unlike unintentional backdoors, intentional backdoors are harder to detect. They originate from malicious programmers. If done properly, their code can be scattered among many different functions, making them difficult to locate. They are also difficult to find through QA. A very thorough peer review can find these, but that tends to be costly. The best way to avoid intentional backdoors is to only employ programmers you trust.
Backdoors, string conversion issues and buffer over/under runs account for most bug-related security vulnerabilities. It’s a good idea to test programmers on this issue during the hiring process.
One last QA related security issue involves communication hacks. In any client-server application, the client sends data to the server. Ideally the server verifies all data before using it. However, if the client has user validation built in then this can be missed. It can also be missed in QA, because QA typically doesn’t try hacking the system. Any validation performed in the client must also be done on the server.