Wait, wait, wait: browsers allow websites to store junk on my drive? They take up gigabytes of memory and still write to disk on top of this? Without even asking whether the site can use local storage?
Years and years back when laptops still had HDDs, I had a script to put the Firefox profile &c on a ramdisk and sync it on reboots so that it didn't spin up the drive constantly. I guess I should have kept doing it.
Still don't really understand how it works - I put the reddit logo into your local storage and it only took 20ms to take it out again instead of 50ms so therefore you have reddit open in another tab?
Attacking website periodically makes random reads from a large file in localStorage. Other tabs and websites open have Javascript running that periodically performs operations that will result in SSD traffic. For example, GMail has a certain polling interval to check for new mail, and each request is going to result in a cache write that makes the SSD busy and delays other conflicting IO operations. Reddit checks for new chat messages. Large memory-heavy websites get paged out of RAM.
The pattern of IO operations that a website makes creates a fingerprint of interference with the IO ops that the attacking website is doing, showing up as differing amounts of latency as the SSD is periodically busy. This fingerprint can then be reconstructed to a specific website by training a CNN on it, basically using a neural net to classify a certain pattern of delays to the IO ops that other websites are doing.
In theory it makes sense, but it seems very noisy. Anything that makes absolutely zero requests or IO operations in the background (like say HN, or most old-school text sites) wouldn't show up, and would be indistinguishable from any other zero-request site. And having other sources of IOps on the same computer - say you're running an Ethereum client that's perpetually updating the blockchain, or you're downloading a bunch of torrents, or you've got DropBox and it's syncing your directory - would introduce noise that throws off the classifier.
Thats a good explination. It does seem extremely noisy and not at all practical for fingerprinting a user compared to other methods. If you have javascript enabled assume you can be fingerprinted.
That’s timing the cache, that’s old stuff by know. As I understand, this writes a relatively large file („Gigabytes“) using this OPFS api, which is different from the „localStorage“ api.
This seems to use actual filesystem storage on the client, instead of living completely in memory (which may be reasonable given the size of files supported). This allows to actually time SSD IOPS latency by doing random reads.
Collected enough of these samples, together with the information of what else runs on the host, put that in the ML-Blender and the result will be able to tell you, with some accuracy, from a given set of samples, what’s running on the host.
I am sure i misunderstood some things because there are so many caches and unknowns in that setup that I struggle to understand how there could be any correlation, but that’s my understanding so far.
Wait, wait, wait: browsers allow websites to store junk on my drive? They take up gigabytes of memory and still write to disk on top of this? Without even asking whether the site can use local storage?
Years and years back when laptops still had HDDs, I had a script to put the Firefox profile &c on a ramdisk and sync it on reboots so that it didn't spin up the drive constantly. I guess I should have kept doing it.
It's a sad day when Arch users are right (again) https://wiki.archlinux.org/title/Firefox/Profile_on_RAM
Still don't really understand how it works - I put the reddit logo into your local storage and it only took 20ms to take it out again instead of 50ms so therefore you have reddit open in another tab?
I assume it's something like this:
Attacking website periodically makes random reads from a large file in localStorage. Other tabs and websites open have Javascript running that periodically performs operations that will result in SSD traffic. For example, GMail has a certain polling interval to check for new mail, and each request is going to result in a cache write that makes the SSD busy and delays other conflicting IO operations. Reddit checks for new chat messages. Large memory-heavy websites get paged out of RAM.
The pattern of IO operations that a website makes creates a fingerprint of interference with the IO ops that the attacking website is doing, showing up as differing amounts of latency as the SSD is periodically busy. This fingerprint can then be reconstructed to a specific website by training a CNN on it, basically using a neural net to classify a certain pattern of delays to the IO ops that other websites are doing.
In theory it makes sense, but it seems very noisy. Anything that makes absolutely zero requests or IO operations in the background (like say HN, or most old-school text sites) wouldn't show up, and would be indistinguishable from any other zero-request site. And having other sources of IOps on the same computer - say you're running an Ethereum client that's perpetually updating the blockchain, or you're downloading a bunch of torrents, or you've got DropBox and it's syncing your directory - would introduce noise that throws off the classifier.
Thats a good explination. It does seem extremely noisy and not at all practical for fingerprinting a user compared to other methods. If you have javascript enabled assume you can be fingerprinted.
That’s timing the cache, that’s old stuff by know. As I understand, this writes a relatively large file („Gigabytes“) using this OPFS api, which is different from the „localStorage“ api. This seems to use actual filesystem storage on the client, instead of living completely in memory (which may be reasonable given the size of files supported). This allows to actually time SSD IOPS latency by doing random reads.
Collected enough of these samples, together with the information of what else runs on the host, put that in the ML-Blender and the result will be able to tell you, with some accuracy, from a given set of samples, what’s running on the host.
I am sure i misunderstood some things because there are so many caches and unknowns in that setup that I struggle to understand how there could be any correlation, but that’s my understanding so far.
{first.last}@tugraz.at
For a more technical read: https://news.ycombinator.com/item?id=48345822
I laugh at your spying attempts from my HD-equipped laptop, ...