I wrote a little program called xormydata. It encrypts and decrypts files. As a medical statistician, I often had to receive highly sensitive files of people’s medical records for analysis from doctors, hospital administrators etc. They generally were anxious about transferring such data (rightly so) and often didn’t know how to do it properly.
I saw a few recurring problems.
- Their employing organisation refused any transmission; I had to turn up in person with a memory stick. This is OK, but often their local IT department had blocked encrypted USB drives, so it had to be carried across the country in my pocket in an unencrypted drive. This is very risky and you shouldn’t do it!
- They were forbidden from using email and had to do something like Google Drive instead. Well, you just gave those medical records to Google, who are under no obligation to delete it when you click delete on your remote copy. They also share stuff with intelligence agencies, which is for good reasons, but you are responsible for the protection of sensitive data and you can’t bury your head in the sand about this sharing.
- They added a password to an Excel spreadsheet and sent me that. Remember that they are often using old versions of Office so the encryption is very poor and easily cracked. Then they’d send the password in a other email. I don’t know where folk wisdom like that comes from. Emails reside indefinitely on a whole chain of servers from thee to me, which might criss-cross the world in doing so, getting tapped along the way. So Johnny Hacker can just pick up both emails and type in your password.
- They asked my advice, and I sent them some links for OpenSSL, but that can be confusing for people who aren’t total computer nerds, hard to install on Windows networked machines, and so on.
To add to that, there is some doubt as to whether or not an encryption algorithm like AES might already have been cracked.
To be clear, if you use commercial encryption software, you are probably discharging your duties and won’t get in trouble. One day, that encryption will be broken, so it’s a question of whether you feel that you’ve done what’s required of you and the future is not your problem (go commercial) or that you have to take personal responsibility for posterity too (use xormydata).
The point of xormydata is to make it easy to send and receive data files, securely, without any of that silly stuff like sending passwords in a separate email. It doesn’t use an algorithm, it takes your data file at a binary level:
and combines it with a “code file”, which acts like a password:
using exclusive or (XOR). This is a logical operation, like OR and AND. It works like this: if the bit in the data file and the corresponding bit in the code file are the same, the result is 0, if they are different, it is a 1. With the bytes above, you would get:
You can read more musing about the process, and how you should use it, on the Github page. You should also read the warning there about how it can get you in serious trouble.
You download xormydata.cpp from Github, or clone the repository.
That is a C++ source data file. You need to compile it so it can run on your computer. Typically, your computer might have a compiler such as “clang” or “g++”. If you have Linux or Mac, you can just go straight to the terminal, cd to the folder where you saved xormydata.cpp, and type:
g++ xormydata.cpp -o xormydata
This should produce a new, executable file in the same folder, called xormydata. If you are using Windows, you probably need to install a C++ compiler first, and if you are networked and controlled by a central admin, you’ll probably need their help to get permissions to do this. They will be suspicious. One compromise might be to use an old, unwanted laptop for this encryption and decryption, though obviously that’s a bit of a pain.
Now you are ready to go. You need a collection of code files (see the Github page), and your recipient needs xormydata and their own code files. Crucially, there is no need for you and your recipient to communicate about the code files you are using (like sending passwords).
Alice has a data file (patient_HIV_status.xls), which she wants to send to researcher Bob. They both install xormydata and away they go! Alice is going to use a music mp3 file (Schools_Out.mp3) as the code file, so she types
./xormydata patient_HIV_status.xls Schools_Out.mp3 data_for_Bob.xor 118309
The order of this command is
- the command itself, ./xormydata in Linux/Mac and xormydata.exe in Windows
- the input data file
- the code file
- the name of the desired output file
- optionally, a number indicating where (in bytes) to start using the code file’s 1s and 0s. I strongly recommend you include this because there can be predictable sections of metadata at the start of certain file types.
Now, she sends Bob “data_for_Bob.xor”
Bob is using video files as codes. He types
./xormydata data_for_Bob.xor Go_Pro_commuting.avi data_back_to_Alice.xor 7199003
the file is now double encrypted, with both Alice’s code file and Bob’s code file added. Alice removes her code thus:
./xormydata data_back_to_Alice.xor Schools_Out.mp3 final_data_to_Bob.xor 118309
Now, it just has Bob’s code applied, and she sends it back to him. He types:
./xormydata final_data_to_Bob.xor Go_Pro_commuting.avi patient_HIV_status.xls 7199003
and the original file is revealed. This is a triple-pass system, which is simple (at the cost of sending stuff three times), and requires no handing over of passwords and such, but not perfect. Charles can intercept Alice’s emails and pretend to be Bob (man-in-the-middle attack), or Charles can just go snooping on their email servers afterwards; by xor’ing the encrypted files together the right way, even with knowing the code files, he can get the original data back. So, if you are worried about people intercepting your stuff and trying hard to break into it, then you probably shouldn’t be using xormydata. I suggest you don’t just use vanilla email to send your xor’d files, but maybe an end-to-end encryption like ProtonMail. That will ensure that your data-transmitting messages are indistinguishable from the ones where you discuss where to go for your colleague’s leaving party.
Also, this is intended as a one-time pad, which means you use that code file once only (or, at least, that code file at that start byte). You should keep track of the pairings of code files and data files so you can get them back later, and of course, don’t store that list somewhere where people can get at it. Does it need to be digital at all? Can you just write it in a notebook?
Some questions you might have:
How do I know I can trust you, Mr Grant? You don’t; that’s life, kid. But you can read the source code, it’s only 120 lines.
If this is so simple, and you’re, like, not even a professional programmer, how come nobody else is doing it already? I don’t know. The crypto world generally went off triple-pass systems decades back, because of the risk of a man-in-the-middle attack. It’s not cool.
This is still hard work … isn’t there something with just a one-click option? Not if you want it to be secure into the future, and secure from even the big guys. There’s no free lunch.
My organisation wants to use a trusted commercial package instead; what can I do? Not a lot in my experience, though I suppose you could xormydata it and then put it through the commercial package.
Isn’t this going to be used by bad guys too? I hope not, but potentially, yes. The same way that you can use a hammer to build a hospital or whack someone over the head. This is technology; if we avoided risk of abuse we would not even have adopted the flint hand axe.