This action requires saving a cookie to remember the page's state. You can read more about that on the cookies page.

 

The fake storage size scam just went mainstream, and many people are going to loose data

Released on: 2022-06-25

Putting aside the various controversies like employee rights; I’ve always considered Amazon Prime to be one of the safest places to buy stuff online; Especially when compared to somewhere like wish.com. While I still take care to read the reviews and do my research before buying something; it has been incredibly rare that I’ve needed to return something, for any reason. And the few times that I have needed to, it has been quick and painless.

So imagine my curiosity when I came across an external USB storage drive with a combination of specs and price that, while being on the edge of what’s plausible, seemed unlikely.

Screenshot of one of the many scam listings.
Above: Screenshot of one of the many scam listings.

For those who would like a quick overview, I’ve created a video (6:50) about that here:

If you’d like to dive deeper, read on.

Table of contents

Was it a scam?

Yes. I was really hoping to be wrong about this. But it is, and the chance of it being an accident are very, very low. It’s the old scam where the drive pretends to be a different size from what it actually is. And then when it runs out of space it does one of two things:

  • It over-writes existing data.
    • Of all the instances I’ve found of other people reporting this scam, this is by far the most common implementation.
    • The symptoms of this are:
      • The drive will slow down when it runs out of storage, because it needs to overwrite existing data.
      • Recently written data is likely to be accessible.
      • The first data to be written is most likely to be the first to be corrupted (overwritten).
  • It simply discards data beyond the end of the drive.
    • This is how the device that I had functioned.
    • The symptoms of this are:
      • The drive will likely speed up when it runs out of storage, because it doesn’t need to spend any time interacting with the underlying storage.
      • The first data to be written to the drive is likely to always be accessible.
      • Recently written data beyond the end of the drive will likely be accessible because of the Operating System’s disk cache. But as soon as that cache expires, is cleared, or the device removed from the computer, that data will be gone.

I’ve actually put together a playlist of the same scam being done in different devices. I’ve stopped actively looking for these, but if I passively come across more, I’ll add them to this list.

How do you know?

Before buying

Suspicions were triggered by all of the inconsistencies in the listing:

The price vs the capacity

was unusually good for current pricing. It’s not inconceivable. Larger capacities exist, and industry pricing is all over the place depending on what you’re looking for. But this combination is certainly pushing it, especially since the form-factor implies M.2, which is considerably more expensive than this for a given capacity.

A screenshot of one of the scam listings. Showing the price along with a few details of the product.
Above: A screenshot of one of the scam listings. Showing the price along with a few details of the product.

Also notice that it is listed as a “Hard Drive External HDD”. There are hard drives that might fit in an enclosure of this size (leaving half the thing empty). But they would struggle to get anywhere close to 16TB.

The specs in the photos didn’t match the text.

  • USB 2 vs USB 3.1.
  • Speed of 100MB/s vs 400MB/s.

A screenshot of one of the photos. Showing USB 3.1, and a speed of 100MB/s.
Above: A screenshot of one of the photos. Showing USB 3.1, and a speed of 100MB/s.

A screenshot of the detailed specifications of the product.
Above: A screenshot of the detailed specifications of the product.

The technologies and specs didn’t match.

USB 2 has a maximum speed of

480 Mbit/s (maximum theoretical data throughput 53 MByte/s)

53MB/s is no where close to 100MB/s, let alone 400MB/s.

A screenshot of the detailed specifications of the product.
Above: A screenshot of the detailed specifications of the product.

The combination of technologies was unlikely, but not impossible.

Speed vs Capacity

If you are working with a large amount of storage, like 16TB, you are likely wanting to put a large amount of data on to it. You probably don’t want to wait for huge amounts of time to do it. Let’s put that in perspective:

  • USB 2 has a maximum theoretical speed of 53MB/s.
  • USB 3.0 has a maximum theoretical speed of 400MB/s.
  • USB 3.1 has a maximum theoretical speed of 800MB/s.

Very approximate time to fill 16TB using

  • USB 2: 83.9 Hours (3.5 days).
  • USB 3.0: 11.1 Hours.
  • USB 3.1: 5.6 Hours.

And that’s before we take into account whether the storage is able to sustain that speed for long periods of time.

For most use-cases, USB 2 is unlikely to be what you want. It could still be useful as a background backup, so it could have been a cost cutting measure for that target audience, although it certainly isn’t marketed that way.

A screenshot of the spreadsheet that I used to calculate these.
Above: A screenshot of the spreadsheet that I used to calculate these.

You can download the spreadsheet in OpenOffice, and Excel (untested) formats.

A screenshot of the detailed specifications of the product.
Above: A screenshot of the detailed specifications of the product.

USB 2 vs USB C

USB C has only been around for a few years. So having the USB C connector tells you a bit about the age of the device, and therefore what assumptions come with it.

Devices with a USB C connector, and use USB 2 exist (cheap, modern phones without fast charging, for example). But they sacrifice both speed, and the amount of power that can be used. See “Speed vs Capacity” above for more information on what that is unlike in a device like this.

The deep dive

Some of the above issues could have been mistakes in the listing, or weird cost cutting choices. So there was still some plausibility to it.

When I received it, the USB cable was faulty. Not really relevant, but it wasn’t a good start.

I did a lot of testing, and actually wrote a tool to help me with this so that I would really understand what was going on, and what failures looked like. I ran the tests over exfat, and vfat (fat32). I wanted to also try ext4, but the drive simply didn’t work well enough to even mount it, let alone test it.

USB C

Correct: Here’s a photo of it.

Photo of the device showing the USB C connector.
Above: Photo of the device showing the USB C connector.

USB 2

Correct.

Screenshot showing the device within USB tree via lsusb.
Above: Screenshot showing the device within USB tree via lsusb.

Screenshot showing the root hub for the USB bus that the device is attached to.
Above: Screenshot showing the root hub for the USB bus that the device is attached to.

For comparison, this is what a USB 3 flash drive looks like on the same physical socket:

Screenshot showing a USB 3 flash drive plugged into the same physical socket.
Above: Screenshot showing a USB 3 flash drive plugged into the same physical socket.

Speed

Not even close. Here’s a quick recap for what we’re expecting:

  • Listed in the promotional pictures: 100MB/s.
  • Listed in the specifications: 400MB/s.
  • Maximum of USB 2 (ie nothing can get to/from the device faster than this limit.): 53MB/s.
    • Given that the marketing lists either 100MB/s or 400MB/s, the drive should have no trouble sustaining 53MB/s all of the time.
      • NOTE 1: It’s normal for many forms of SSD to contain their own cache, which helps them be faster for a short period of time. This is very different from the sustained speed of the drive. When the specifications/marketing material do not differentiate, they are talking about the sustained speed.
      • NOTE 2: The operating systems disk cache will interfere with initial measurement at the application level. So care needs to be taken when measuring drive performance to make sure that you are measuring the thing that you think you are. Specifications/marketing material are never talking about this measurement, because it has nothing to do with the device, and everything to do with the computer that the device is being measured from.

Reality:

  • Most of the time: 1-2MB/s.
  • Goes up to 40MB/s when writing to space beyong the physical capacity that is actually there. Ie it is sending everything into a black hole (simply discarding the data).

Screenshot of typical performance of the drive. 1139KB/s, which is about 1.1MB/s.
Above: Screenshot of typical performance of the drive. 1139KB/s, which is about 1.1MB/s.

Screenshot of performance when the drive is writing into a black hole. 40.4MB/s.
Above: Screenshot of performance when the drive is writing into a black hole. 40.4MB/s.

All of this testing is taking into account the OS-provided disk cache, which masks the performance the applications see until the cache runs out. Note that the nmon screenshots are not affected by this because they are showing what the OS is actually writing to the storage, not what is being written to the cache. Meanwhile rsync is showing what it is able to write to the cache. So it is affected.

You can see the cache running out and the rsync averages starting to go down here:

Screenshot of the OS-provided disk cache skewing early writing speeds.
Above: Screenshot of the OS-provided disk cache skewing early writing speeds.

Capacity

No. But it really looks like it unless you take time to dig a little deeper.

The normal ways of checking it

By all the methods that a user might check, it looks like it has the advertised 16TB. When the drive is plugged in, it reports itself as 16TB to the OS:

A screenshot of the device reporting its capacity to the OS.
Above: A screenshot of the device reporting its capacity to the OS.

The partition layout is set up for 16TB (15.3T):

A screenshot of the partition layout.
Above: A screenshot of the partition layout.

The filesystem also reports 16TB:

A screenshot of the filesystem reporting 16TB.
Above: A screenshot of the filesystem reporting 16TB.

What I was looking for?

Exactly what will happen depends on the filesystem being used, and the exact implementation of the scam.

The most important thing is file corruption, but also speed changes when the capacity runs out. If it speeds up, it’s probably the black hole method. If it slows down, it’s probably the overwrite method.

Here is what a failed file looked like on my drive:

A screenshot showing two files that should be identical, but are not.
Above: A screenshot showing two files that should be identical, but are not.

My drive used the black hole method, which leads to recent files being corrupted/absent.

For comparison, here is a video of someone testing for the same scam on a different device. On that device, it uses the over-write method, which corrupts previously stored files.

If it’s not happening already, I expect that scammers will use inline compression to make the device last a little longer before it gets detected. I haven’t seen anyone talking about this yet, and I haven’t seen any evidence that it is being used. But it would totally make sense for them to do at some point, if they aren’t already. The main reasons why they might not are:

  • It would massively complicate the controller firmware to make addressing work.
  • There are probably limitations with the controller on legitimate devices that would need to be overcome to implement compression on the scam devices. Assumption: One of the speculations floating around is that these devices are simply legitimate smaller-capacity devices that have been reprogrammed to show a significantly larger capacity out of hours in the factory. Compression may negate the viability of that type of source.

Also note how various places noted the corrupt/missing Alternate GPT at the end of the drive:

Screenshot of dmesg showing errors about the Alternate/backup GPT.
Above: Screenshot of dmesg showing errors about the Alternate/backup GPT.

Screenshot of fdisk showing errors about the Alternate/backup GPT.
Above: Screenshot of fdisk showing errors about the Alternate/backup GPT.

Test structure

My tool would copy from a known location on another drive to a unique directory (using a timestamp) on the suspect drive. It would repeat this into unique directories until it was told to stop, or a specified iteration limit was reached.

Since the expected state was easy to measure, it periodically does a complete check of the contents of the directory to make sure that it isn’t corrupted. However, while this is exact, it’s very slow.

So I added a fast check method, which compares a small section of the beginning, and end of the files. This gives a definite-fail, and a probably-pass. Ie if you see a fail, you know it has failed. If you see a pass, it is probably a pass, but might actually be a fail. This was plenty good enough considering how regularly fails happen once the physical capacity has been exceeded.

exFAT specific characteristics
  • All early content remained intact.
  • All content after the physical capacity has been exceeded would be lost, but may be in the OS-provided disk cache, which could mask the problem until that cache expired, or the drive was removed.
  • Directory listings in directories that were created before running out of space, remained intact.
    • Directories that were created within these directories, but after the drive ran out of space, would remain. But their contents would be missing.
    • Any files that were written before running out of space would be intact.
    • Any files that were written after running out of space would be return exactly the same data as other failed files.
  • After removing and reinserting the drive, the filesystem was still usable (except for the corrupt/missing data).
vFAT specific characteristics
  • All early content remained intact.
  • All content after the physical capacity has been exceeded would be lost, but may be in the OS-provided disk cache, which could mask the problem until that cache expired, or the drive was removed.
  • Directory listings in directories that were created before running out of space, remained intact.
    • Directories that were created within these directories, but after the drive ran out of space, would remain. But would be inaccessible.
    • Any files that were written before running out of space would be intact.
    • Any files that were written after running out of space would be return exactly the same data as other failed files.
  • After removing and reinserting the drive, the filesystem was corrupt enough that Linux would force it to be read-only (expecting you to do a filesystem repair).

How did the advertised specs match up to reality?

FEATURES:

  • Type: External hard drive Flash/Solid state drive.
  • Material: Aluminum Alloy, Matte, Chip (didn’t check, but it looks like it.)
  • Ports: Type-C/USB 2.0
  • Speed: “400MB/S” 1-40MB/s (most of the time 1-2MB/s)
  • Capability Available: “16TB” Pretended to be 16TB, actually somewhere between 8GB-120GB.
  • Drive Specification: 10x3.2x0.9cm/3.9x1.26x0.35in (didn’t check, but it looks like it.)
  • Drive Weight: 37g/1.3oz (didn’t check)
  • Colors Available: Black, Red, Blue, Silver,Gold (didn’t check, but it looks like it.)

What’s Included:

  • 1x External hard solid state drive
  • 1x faulty USB Cable

How does it work?

When you connect the device to your computer, your computer talks to a controller on the device. That controller tells your computer about the device, including things like the capacity. It also talks to the storage on your behalf.

The scam works by reprogramming the controller to miss-represent the storage size, and then doing something dodgy when it’s asked to store something in an area of the storage that it says exists, but doesn’t.

Diagram showing how the computer connects to the storage.
Above: Diagram showing how the computer connects to the storage.

Why is this a big deal?

Hopefully we already agree that getting scammed is bad. But this is worse than that. The drive is actively designed to hide that it is failing. And it will definitely fail. It’s not a far off statistical probability. It will fail. And the nature of the drive is such that people are most likely to use it for backups, or for storing stuff that they don’t need to access often.

While it’s expected that people should test their backups from time to time, that’s a heavy process and gets heavier depending on how thorough you are with the testing. Furthermore, if you test your backup by reading a sample of random files, there’s a high chance that you won’t notice the problem.

Add to that; if the drive is formatted as exfat, which is likely to be the case for most drives now and was the default for this drive, files will silently go missing when the drive is disconnected from the computer. So any files you do actually see and test are likely to be fine when you test them, but not actually fine when you need them.

Writing a new backup to the drive won’t be obvious for testing, although it will likely need to do some time-consuming work to replace the missing data. So if you’re expecting it to be quick (because nothing should need to be done), this might let you know. But then when what assumptions do you make next? Do you accept that it’s now up-to-date? Do you test it one more time? If you test it one more time, the OS disk cache may mask the fact that the data needs to be written again.

Basically, there are multiple ways that someone could miss that their data has gone missing, even if they have explicitly checked. There’s a high chance in people loosing data that they thought was backed up and therefore safe. Imagine what ever you specifically don’t want to loose, and what would be the consequences of loosing it. How bad would that be now? How bad would that be in a few years when you don’t remember what was on the drive?

What you should look out for?

Before buying

This is hard. The best numbers I give you today, will be blurry in a few weeks, at best. These scams work because they are just plausible enough to be believable, but just fantastic enough to be very attractive. And last just long enough that you’re unlikely to notice before it’s too late. So here are some things to look out for. Any of them could be a mistake (certainly not uncommon on Amazon), but as you see more of them, it will give you more confidence that is might be a scam:

  • The price is amazing for the specs. - Consider the every day price. Ie if it’s heavily discounted, maybe this isn’t a point against it. But then it’s easy for a scammer to set a higher price, and permanently discount it.
  • The specs don’t match between the photos and the specs section of the listing. Eg:
    • 100MB/s in the photos vs 400MB/s in the specs section.
    • USB3.1 in the photos vs USB2 in the specs section.
  • The specs don’t match between the description and the specs section.
  • The specs combine in an unusual way. Eg:
    • USB 2 vs USB C. Usually when USB C is used, the protocol will be USB 3.x (for now as at 2022-06-23). It’s not impossible to be USB 2, but it is unlikely, especially for a large drive where you would want to be able to access the data quickly.
  • The specs don’t match each other. Eg:
    • A speed of 400MB/s (or even 100MB/s) vs USB 2, which has a theoretical maximum of 53MB/s under ideal conditions.

I’ve always been an advocate for trying new brands, particularly when they are doing something better than the big brands. But this is one area where this strategy is more likely to bring pain. Your storage is incredibly important. It’s worth having as much trust in the transaction as possible. Here are some things that might help build that trust:

  • Is it an established brand?
  • Is the platform one that removes scams when they are reported?
  • How good are the reviews/ratings?
  • Do those reviews look trust worthy?
  • Are there reviews?!
  • Does the platform make it easy to return products?

What can you do to be more confident in your storage?

Test it.

Understand that for any rule that I give you here, scammers will find a way to hide the scam from it. Therefore this information will age. But it will hopefully give you a feel for how to build your confidence.

Tools

This will be your most time-efficient way to test the drive, and will help avoid human error/bias when testing.

I’ve written one called testForFakeStorageSize. This is in very early stages, and is aimed at a highly technical user. Therefore I’d suggest trying one of these instead:

  • F3 - Open source, and appears to be active.
  • H2Testw.
  • ChipGenius.

I have not tried any of these, yet. So I can’t tell you how good they are.

No matter what tools you try, make sure that they are up-to-date to give you the best chance of detecting the latest scam techniques.

What to look for when manually testing the drive?

Note that different file systems behave differently. The different scam methods behave differently. And new scam methods will likely appear over time. So some items in the list below may match your drive, while others may not. You’ll need to use your judgement as to whether the drive is healthy, and further whether it’s a scam, or just broken:

  • Files disappear when you safely remove the drive, and re-insert it. Even though they successfully got written to the drive, and could be read after writing.
  • Files become corrupted:
    • Early files get overwritten with newer files (Overwrite method). - Files will not be the same. Or
    • Newer files that survived unplugging and reinserting all contain the same garbage (Black hole method).
  • You get an error about a missing/corrupt Alternate GPT. (Only visible in certain contexts.)
  • It’s incredibly slow. This isn’t necessarily a sign by itself. But it does make it much harder to test the drive, so scammers may slow the drive down intentionally. Probably they are just using really a cheap controller or storage.

Things to do:

  • Make sure you put a lot of data on to it. Ie fill up most of the drive.
  • Safely eject the drive, and then plug it back in.
  • Make sure that you can still access the most recently added files.
  • Make sure that you can still access the oldest files.
  • Sample some in the middle to gain more confidence.
  • Make sure that the statistics about how much space is used on the drive approximately matches the total size of the files on the drive. It will be off by a little bit, but shouldn’t be off by a lot. Eg if your file system says that 1.7TB is used, but only 8GB of files exist, that should ring an alarm bell.

A note about the specifications that I have highlighted

This scam can be done with any capacity. I have chosen the ones that I have because they are on the edge of what’s likely right now, so they are easier to spot.

Is this on other Amazon stores?

Here is a small sample of the available localisations as at: 2022-06-24.

Who in the chain is doing this?

I don’t know. It could be anywhere from a disgruntled/dodgy employee in the factory who had the knowledge to reprogram the controller, through to a vendor on Amazon specifically seeking the drives to make a quick buck, or anywhere in between. It’s completely plausible that anyone closer to the buyer than where the scam took place, had no idea that it was a scam. But I will give you a couple of data points:

  • Before releasing any of this information, I tried to contact Amazon to report the scam so that they could get ahead of it. I was unable to find any appropriate method to report multiple listings that I had not purchased (there are a lot of them that are all the same, but with minor variations like colour).
  • The listings then disappeared, so I proceeded with the release. Eg:
  • New listings have since appeared, and are once again being promoted. Eg:
  • There are similar listings that I have not tested, but I suspect are also scams.
  • The listings are popping up under different vendors on Amazon.

What would I like to change?

  • I’d really like an easy way to report these scams to Amazon without having to buy and then return the product. If this is already available, I’d like it to be easy to find, and included with the other forms of scam that can be reported.
  • I’d like the scams to be removed more quickly. People will fall for these, and most people probably won’t notice. And many of those who do, will probably assume that they got a faulty unit.

This is not a case of “you get what you pay for”. This is a scam, and not poor quality.

Summary

It sucks that this sort of scam is being so heavily promoted on a platform that used to feel pretty safe. Fortunately, for the purposes of this conversation, Amazon’s returns process is excellent, which takes away a lot of the stress of discovering that you have a scam unit. It’s just a massive shame that many people aren’t going to notice that they have a scam unit until they’ve exceeded the return window, but more importantly, lost potentially important data.

Amazon needs to get on top of this, but this is likely to be a game of whack-a-mole, and is a really hard problem to solve. And solution will likely create some friction somewhere else. But in the mean time, it should be really easy to report a scam product.

For consumers… We’re probably going to have to accept that these scams will be common place for the foreseeable future.

Updates and corrections

  • 2023-11-10: Added a link to F3 in the tools section. It’s an open source tool for checking storage size and speed.

This post references

2022-03-07
The RandomKSandom series is the spiritual successor to FunnyHacks. Here, you can find all of the posts about it.
2023-08-22
After 4 non-stop intensely challenging years, it was time for a change. Here are some highlights so far.

Posts using the same tags

My CV had gained so much weight that it was hard to do anything with it any more, and it was hard to read. So I did something about it...
Control your computer using a Leap Motion controller, but with an every-day-quality implementation.
If you want to share stable diffusion on your network without exposing any information to the outside world. This is one way to do it.
There's a trick for getting a big speed boost on old hardware that's so easy that I'm surprised I haven't heard people talking about it.
What began as 3 tripods on a hill, and hours per photo, ended with way better results in seconds, hand held.
2023-08-22
After 4 non-stop intensely challenging years, it was time for a change. Here are some highlights so far.
I accidentally automated Javelining a plane into the ground. And I learnt a huge amount along the way.
Over the last few years, there has been a lot of talk about whether you can make use of the full resolution on a phone with a 4K display. Let's dig in and actually understand this.
2022-11-17
Group of posts about the Astro Slide 5G.
I'm stopping my Patreon activity for now. Let's dive a little deeper into why.
4 easy phone hacks to make your phone more useful and fun
2022-03-07
The RandomKSandom series is the spiritual successor to FunnyHacks. Here, you can find all of the posts about it.
My dream office.
DoneIt is a time tracker for working out where your time is going. For me there were a few things I wanted to achieve with it - Be able to say what I've been doing all day. - See how much time is lost to time ...
Well over a year ago I introduced mass which is a tool for managing large numbers of servers. It's come along way in that time, and it's well and truly time for another look. From the users' point of view, the most interesting things are probably that you can now ...
Achel is a programming language that I have been working on for the past 13 years (I said 12 in the video, it has been a long time!) There has been growing interest in the programs I have been writing using it, so I have now released it open source. ...
Home | About | Contact | Cookies | Site map