PortaBase Encryption
Starting with version 1.6, PortaBase provides the option of creating encrypted data files. This allows you to create databases of sensitive information (like passwords, ATM PIN numbers, or whatever) with reasonable confidence that if a malicious person gains access to your data files (steals your phone, gains access to your PC files, etc.), they will not be able to access the content of these encrypted files without exerting an amount of effort difficult even for large organizations (at least with the computing power expected to be available for the next 5-10 years.)
File size
For security and implementation reasons, the entire content of encrypted files must be held in memory at once; thus encrypted files cannot scale to large sizes as well as non-encrypted files. Files of a few hundred or a few thousand rows should still perform well, but files containing many thousands of rows of data probably won't (at least on Maemo; desktop computers with lots of memory can handle quite large encrypted files).
Password guidelines
Each encrypted file is accessed by providing a password specified by the person who created the file. (This password can be changed later, provided that the previous correct password has already been given.) Because this password must be relatively easy to remember, this is the weakest point in the encryption scheme; therefore, it is important to choose a good password. No amount of encryption technology in the world can protect your data if a malicious person can easily guess the password. To make it more difficult for even programs using a dictionary or listing of your personal data to break the encryption, file passwords should meet the following guidelines:
- At least 6 characters long; the longer, the better (no upper limit)
- Contain a combination of upper and lower case letters, numbers, and punctuation symbols
- Should not consist of data (or misspellings thereof) that would be likely to be in personal records (friend/relative/pet names, contact information, etc.)
- Easy to remember; if you choose a password that is "good" in the above senses and forget it, your data will be virtually impossible to retrieve
Technical details
For people with significant knowledge of cryptography who want to assure themselves that there aren't any security holes in the implementation, or for people who are merely curious as to how it works, a summary of how PortaBase encrypts the data is presented here. I won't attempt to define the cryptography terms used here, because there are better sources for such definitions; you can learn more about them via web searches or a good book on cryptography. (I'm using "Applied Cryptography: Protocols, Algorithms, and Source Code in C, Second Edition", by Bruce Schneier.)
PortaBase normally accesses data via seeks directly to address locations in files. For encrypted files, all the data is stored in memory instead; the data is only written to disk after the in-memory data structure has been encrypted. Encrypted files are still Metakit data files, but only contain a generic version of the standard global view and a special "_crypto" view containing the encrypted data and associated information. The in-memory storage object is encrypted as follows:
- The object is serialized to an in-memory byte array.
- The user-entered password is converted to a 160-bit key by applying the SHA-1 hash function to it.
- A random 64-bit initialization vector (used in a later step) is created using an ISAAC pseudo-random number generator, seeded with the best source of entropy available on the system (on Linux/UNIX systems, this is typically the /dev/urandom device).
- The data to be encrypted is padded so that it is equal to a multiple of the Blowfish block cipher's block size (64 bits).
- A 160-bit hash is generated for the padded data using SHA-1; this is used at decryption time to see if the data has been decrypted correctly.
- The data is encrypted using Blowfish in CBC mode, using the key and initialization vector created in steps 1 and 2.
- The encrypted data, initialization vector, and hash of the padded data are stored in the data file.
The following procedure is used to decrypt the data again when the file is opened (after the user has been prompted for the password):
- The user-entered password is converted to a 160-bit key by applying the SHA-1 hash function to it.
- The data is decrypted using the hash just calculated and the stored initialization vector, again using Blowfish in CBC mode.
- A 160-bit hash is generated for the decrypted data using SHA-1; if this does not match the hash value that was saved for the padded data, an incorrect password has been entered; stop and inform the user.
- Remove the padding that was added before encryption.
- Use the resulting byte array as the input stream for the in-memory storage object.
Algorithm implementations
PortaBase uses the SHA-1 implementation from Qt, a Blowfish implementation slightly refactored from the Beecrypt library, and the ISAAC implementation from the RandomKit library. Encryption algorithms are complicated enough that it's easy to get them wrong in subtle but important ways, so I chose to use existing code that's been in public use for a while and already been through public scrutiny and debugging. Other libraries (like Crypto++ and Botan) are more popular and/or capable, but they are also much larger and/or aren't actively maintained on all of PortaBase's target platforms.
Disclaimers
I am not a professional cryptologist. I also do not have the financial resources to compensate anybody for damages due to the theft of confidential information. I've spent many days researching cryptography for the purpose of implementing this encryption scheme, but there is a chance that there is a bug (or several bugs) in the code or the reasoning behind it which make the encrypted data less secure than I believe it to be. If you discover such an error, please let me know so I can correct it in future versions.