Jump into digital forensics and you will quickly learn that SQLite is one of the most prevalent and useful sources of information or evidence for almost any examination. SQLite is open source and free to use for any purpose, which has made it the most used database technology in the world. Chat applications, web browser data, operating system settings, application tracking, and much more all use SQLite databases.
Introduction to SQLite
Many forensic examinations may require a deeper look into the structure of the database, such as the recovery of deleted data still stored within the SQLite database. This article will be the first in a series of articles that will break down important features necessary for examiners to understand when working with SQLite databases and will help them explain how various processes work when called upon to defend their work.
Variable-length integers or VARINTs (for anyone slightly familiar with SQLite) may seem like an odd place to start. However, much of the primary information related to how the data is stored within a SQLite database uses this format to define locations, sizes, and even values. It is important to understand this data structure before we start to break down how to read the database structure itself.
What Is a Variable-Length Integer?
A variable-length integer is a method of storing an integer value using the least number of bytes possible. SQLite does this by using a clever compression method which utilizes the most significant bit in a byte.
VARINTs can store information such as the ROWID, the size of the entry, column data, column data size, etc. This is structured data that the database needs to know to properly store information and recall that information when necessary.
Why Use Variable-Length Integers?
SQLite databases are designed to be lightweight while at the same time able to hold massive amounts of data (140 terabytes). Part of being lightweight is being as efficient with space as possible. VARINTs help reduce that space by not having a set number of bytes reserved to store data which may frequently NOT use all the space reserved.
To better understand the concept of VARINTs let's look at an example outside of databases or computer language.
A restaurant has found that the maximum amount of time that a table is occupied is one hour. They have 10 tables, and are open for 8 hours. This means that if they allowed only one person to sit at a table within the one-hour time frame whether or not that person ACTUALLY used the full hour, they would be limited to 80 customers per day.
However, if most people only use 20 minutes of the hour, and then they immediately put another person on that table, they dramatically increase the number of customers they can serve per day, potentially up to 240 customers per day! It’s the same amount of space, but the utilization is much higher.
VARINTs function in the same way, by storing structured data in the smallest space possible. If the ROWID requires only 2 bytes of data, then SQLite will only use 2 bytes to indicate the ROWID, and the VARINT which indicates the size of the entry will immediately follow. If the ROWID required 4 bytes of data, then it would store 4 bytes and then the size of the entry. While this may be efficient, it does lead us into the next section.
How Do We Read a Variable-Length Integer?
Storage is cheap, and on computers it is easy to expand storage. Because of this, most of the time structured data has a set number of bytes which that data can be stored, whether it fills that space or not. This is easier to store data and it is easier to read data. SQLite was built for smaller, lower powered devices, such as mobile phones where storage was not as cheap and so efficiency was king. A VARINT requires that the database analyze the most significant bit of each byte to determine the length of the VARINT and then parse the information stored.
In SQLite, a VARINT can be between one and nine bytes in length. When analyzing a VARINT there are two parts: the most significant bit, and the value. The most significant bit determines whether or not to include the next byte in the calculation. If the most significant bit is a 1, the next byte must be considered. If the most significant byte is a 0, the next byte is not considered.
The value portion of the VARINT is the remaining 7 bits. To determine the value in decimal, convert the 7-bit binary value into decimal. Let’s look at a few examples:
For the first example, let’s look at one byte with a hexadecimal value of 0x74. First, we need to look at the most significant bit, or the left most bit in the byte. Determine the most significant bit by converting 0x74 to binary, which is 01110100. The most significant bit is 0. 0 means that we have reached the end of our VARINT which indicates that this VARINT is one byte in length. We then ignore the most significant bit and determine the 7-byte value. For a one-byte VARINT, the value is always the hex value of the original hexadecimal value. In this example, 0x74 or 116 in decimal.
In the second example we will look at two-bytes with a value of 0xA102. Look at the first byte, and determine the value of the most significant bit. 0xA1 converts to 10100001. Because the most significant bit of the first byte is one, we also need to look at the next byte. Converting 0x02 to binary gives us 00000010. The most significant bit is 0, indicating we have reached the end of our VARINT. We then take each of the remaining 7-bit binary values, combine them together to get our result. 01000010000010 binary equals 0x1082 or 4226 in decimal.
While we won't work out completely the last example, a set of bytes with a value of 0x9A81FF618B would be a VARINT of how many bytes? This would be a VARINT of four bytes. Identifying each significant bit for each byte would be 0x9A = 1, 0x81 = 1, 0xFF = 1, 0x61 = 0. The 0 indicates that we have reached the end of our VARINT so 0x8B is not considered.
A trick to quickly determining the length of a VARINT, is if the hexadecimal of a byte starts with a value of 8 through F, then the most significant bit is 1. If the hexadecimal value starts with a value of 0-7 the most significant bit is 0.
Remember, the maximum size of a SQLite VARINT is nine bytes. If these three examples were part of the same data within a database, by using a VARINT instead of a flat 9-byte reservation, the database saved 20 bytes. That may not sound like much, but think about your own text messaging app, and how many messages you have sent or received. Think about a 20-byte savings for every message sent or received. It adds up fast!
A Note on VARINTs
Variable-length integers are important for defining key information within a SQLite database. This includes ROWID, size of the cell (how an entry is stored), and how data is organized within an entry. It is important to understand how to decipher these values to properly unpack SQLite data. It is also critical for those performing database data recovery.