PURBs Reducing Metadata Leakage from Encrypted Files and Communications

PURBs is an encrypted format for ciphertexts that leaks minimal information to non-recipients: only its size, and even then, leakage is minimized. This is important because researchers repeatedly show how this metadata is useful in breaking the encryption.

PURBs stands for Padded Uniform Random Blobs [1], and the name stems from the fact that a PURB file (or network packet), without the decryption key, is indistinguishable from random bits from start to end; it has no plaintext marker or metadata, yet can be decrypted efficiently in diverse scenarios (single- and multi-cryptographic groups and keys).

Read the paper : PETS 2019 Publication
View the presentation : PETS 2019 Slides
Get PURBs : github.com/dedis/purb

PURBs is a joint work between Kirill Nikitin*, Ludovic Barman*, Wouter Lueks, Bryan Ford, and Jean-Pierre Hubaux.

The Metadata Problem

Most encrypted formats (e.g., PGP, TLS) leak substantial metadata from their plaintext headers; in other word, the payload is typically well-protected with encryption, yet plaintext information remains in the header, readily available to any third party.

Here's an example with a symmetrically-encrypted PGP packet:

plaintext information:
  • PGP packet, Symmetrically-Encrypted
  • PGP Version
  • Hash Algorithm
  • Encryption Algorithm
  • Salt + Iterations
  • Plaintext length
pgp packet:
8c0d 0402 0302 0db7  3eb4 1ac0 7379 60a4
4883 5b13 1f13 af59  4daf e960 b6ee 0ee3
0c4b 28f5 e969 d5e4  9c5b fbcf 7dea aa4c
5e87 dfd5 dc7e 5d63  9404 9b8c ce12 d513
2c23 abeb 49f4 2c4c  f34a 8948 a46b c04d
13bc 19cb c466 ca18  2e

Moreover, asymmetrically-encrypted PGP blobs (ciphertexts) also typically embed the list of recipients in the header:

plaintext information:
  • F019751EAEB57B0Eludovic.barman@protonmail.com
  • 0ACC4ADD2E76A582ludovic.barman@gmail.com
  • ...
pgp packet:
0185 030c 19f0 1e75 b5ae 0e7b 0701 2cfe
...
e0c7 62f0 68c0 7698 cc0a dd4a 762e 82a5
...

This might look like harmless information at best. Researchers have repeatedly shown, however, how metadata can be exploited. In particular, the attacker might be able to fingerprint users [2,3] and the applications used [3]. Using traffic analysis [4], an attacker might be able to infer user-visited websites [4, 5, 6, 7, 8] or to identify the videos that a user watches [9, 10, 11]. On VoIP, metadata can be used to infer the geo-location [12], the spoken language [13], or the voice activity of users [14]. The side-channel leaks of data compression [15] make several attacks on SSL possible [16, 17, 18]. The lack of proper padding might enable an active attacker to learn the length of the user’s password from TLS or QUIC traffic [19]. In social networks, metadata can be used to draw conclusions on users’ actions [20], whereas telephone metadata has been shown to be sufficient for user re-identification and for determining home locations [21].

Efficient Decryption

The kind of metadata presented above is typically embedded in the header for efficient decryption: Otherwise, how would the legitimate recipient know what encryption algorithm is used ? In the case of a multi-key server (e.g., which has one key per cryptographic group), how does the server know under which key the ciphertext has been encrypted ?

One inefficient solution is to ask the recipient to try every combination of parameters the recipient knows. Although time-consuming, at some point, the recipient will find the correct combination of cryptographic group / hash algorithm / encryption algorithm and successfully decrypt the ciphertext.

Therefore, a naïve approach for hiding metadata would be to remove it entirely from the header; then nothing would leak (except the length, which is another topic), but then the decryption would be prohibitively inefficient.

PURBs

PURBs is an encrypted ciphertext format (we do not re-invent a cryptographic scheme, we simply re-use existing ones). At its heart, it uses efficient trial decryption on the header to enable a recipient with a key to quickly retrieve the relevant parameters for decryption. A third party without a key is unable to perform this step hence cannot obtain any information from the ciphertext.

A PURB is a file (or network message) structured as follows (in the case of an asymmetrically-encrypted blob for 4 recipients in 2 different cryptographic groups):

[nonce]
[public key A]
[public key B]
[entrypoint 1]
[entrypoint 2]
[entrypoint 4]
[entrypoint 3]
[payload]
[mac]
allowed positions for public keys
growing hash tables for entrypoints

Despite the structure being shown here, a third-party does not see the "boxes" (and colors) because every field is indistinguishable from random bits. In particular, the following PURB encrypted for 1 recipient in 1 suite is indistinguishable from the previous one:

[nonce]
[public key A]
[entrypoint 1]
[payload]
[mac]
allowed positions for public keys
growing hash tables for entrypoints

... or, in other terms, a third party cannot differentiate between a long payload with few recipients and suites, and a short payload with many suites and recipients. Without decryption keys, a third party is unable to determine which structure is being used. This is also the case when a PURB is symmetrically encrypted.

A legitimate recipient can efficiently recover the structure in one public-key operation (deriving a shared secret from a [public key], that can be found in O(1) due to an additional encoding trick not shown here) and log(N) symmetric, fast trial decryptions (attempting to decrypt the [entrypoints] ).

For more details, we refer you to our paper.

References

  1. Nikitin, K.*, Barman, L.*, Underwood, M., Lueks, W. & Ford, B. Reducing metadata leakage from encrypted files and communication with PURBs. arXiv preprint arXiv:1806.03160
  2. Pang, J., Greenstein, B., Gummadi, R., Seshan, S., & Wetherall, D. 802.11 user fingerprinting. In Proceedings of the 13th annual ACM international conference on Mobile computing and networking (pp. 99-110). ACM.
  3. Zhang, F., He, W., Liu, X., & Bridges, P. G. Inferring users' online activities through traffic analysis. In Proceedings of the fourth ACM conference on Wireless network security (pp. 59-70). ACM.
  4. Danezis, G., & Clayton, R. Introducing traffic analysis. In Digital Privacy (pp. 117-138). Auerbach Publications.
  5. Panchenko, A., Niessen, L., Zinnen, A., & Engel, T. Website fingerprinting in onion routing based anonymization networks. In Proceedings of the 10th annual ACM workshop on Privacy in the electronic society (pp. 103-114). ACM.
  6. Dyer, K. P., Coull, S. E., Ristenpart, T., & Shrimpton, T. Peek-a-boo, i still see you: Why efficient traffic analysis countermeasures fail. In 2012 IEEE symposium on security and privacy (pp. 332-346). IEEE.
  7. Wang, T., & Goldberg, I. Improved website fingerprinting on Tor. In Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society (pp. 201-212). ACM.
  8. Wang, T., & Goldberg, I. On realistically attacking Tor with website fingerprinting. Proceedings on Privacy Enhancing Technologies, 2016(4), 21-36.
  9. Reed, A., & Klimkowski, B. Leaky streams: Identifying variable bitrate dash videos streamed over encrypted 802.11n connections. In 2016 13th IEEE Annual Consumer Communications & Networking Conference (CCNC) (pp. 1107-1112). IEEE.
  10. Schuster, R., Shmatikov, V., & Tromer, E Beauty and the burst: Remote identification of encrypted video streams. In 26th {USENIX} Security Symposium ({USENIX} Security 17) (pp. 1357-1374).
  11. Reed, A., & Kranch, M. Identifying https-protected netflix videos in real-time. In Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy (pp. 361-368). ACM.
  12. Le Blond, S., Zhang, C., Legout, A., Ross, K., & Dabbous, W. I know where you are and what you are sharing: exploiting P2P communications to invade users' privacy. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference (pp. 45-60). ACM.
  13. Wright, C. V., Ballard, L., Coull, S. E., Monrose, F., & Masson, G. M. Spot me if you can: Uncovering spoken phrases in encrypted voip conversations. In 2008 IEEE Symposium on Security and Privacy (sp 2008) (pp. 35-49). IEEE.
  14. Chang, Y. C., Chen, K. T., Wu, C. C., & Lei, C. L. Inferring speech activity from encrypted Skype traffic. In IEEE GLOBECOM 2008-2008 IEEE Global Telecommunications Conference (pp. 1-5). IEEE.
  15. Kelsey, J. Compression and information leakage of plaintext. In International Workshop on Fast Software Encryption (pp. 263-276). Springer, Berlin, Heidelberg.
  16. Juliano Rizzo and Thai Duong. The CRIME Attack. online.
  17. Gluck, Y., Harris, N., & Prado, A. BREACH: reviving the CRIME attack. Unpublished manuscript.
  18. Be’ery, T, and Shulman, A. A perfect CRIME ? Only TIME Will Tell. online.
  19. Guido Vranken. HTTPS Bicycle Attack. online.
  20. Ring-road: Leaking sensitive data in securityprotocols. online
  21. Mayer, J., Mutchler, P., & Mitchell, J. C. Evaluating the privacy properties of telephone metadata. Proceedings of the National Academy of Sciences, 113(20), 5536-5541.