Chris’s Wiki :: blog/tech/SSDWritePerfMetricsWish

🚀 Discover this awesome post from Hacker News 📖

📂 Category:

📌 Key idea:

Modern CPUs have an impressive collection of performance counters
for detailed, low level information on things like cache misses,
branch mispredictions, various sorts of stalls, and so on; on Linux
you can use ‘perf list’ to see them all. Modern SSDs (NVMe, SATA,
and SAS) are all internally quite complex, and their behavior under
load depends on a lot of internal state. It would be nice to have
CPU performance counter style metrics to expose some of those
details. For a relevant example that’s on my mind (cf), it certainly would be interesting
to know how often flash writes had to stall while blocks were hastily
erased, or the current erase rate.

Having written this, I checked some of our SSDs (the ones I’m most
interested in at the moment) and I see that our SATA SSDs do expose
some of this information as (vendor specific) SMART attributes, with things
like ‘block erase count’ and ‘NAND GB written’ to TLC or SLC (as
well as the host write volume and so on stuff you’d expect). NVMe
does this in a different way that doesn’t have the
sort of easy flexibility that SMART attributes do, so a random one of ours that I checked doesn’t seem to
provide this sort of lower level information.

It’s understandable that SSD vendors don’t necessarily want to
expose this sort of information, but it’s quite relevant if you’re
trying to understand unusual drive performance. For example, for
your workload do you need to TRIM your drives
more often, or do they have enough pre-erased space available when
you need it? Since TRIM has an overhead, you may not want to blindly
do it on a frequent basis (and its full effects aren’t entirely
predictable since they depend on how much the drive decides to
actually erase in advance).

(Having looked at SMART ‘block erase count’ information on one of
our servers, it’s definitely doing something when the server is
under heavy fsync() load, but I need to cross-compare the numbers
from it to other systems in order to get a better sense of what’s
exceptional and what’s not.)

I’m currently more focused on write related metrics, but there’s
probably important information that could be exposed for reads and
for other operations. I’d also like it if SSDs provided counters
for how many of various sorts of operations they saw, because while
your operating system can in theory provide this, it often doesn’t
(or doesn’t provide them at the granularity of, say, how many writes
with ‘Force Unit Access’ or how many ‘Flush’ operations were done).

(In Linux, I think I’d have to extract this low level operation
information in an ad-hoc way with eBPF tracing.)

💬 Share your opinion below!

#️⃣ #Chriss #Wiki #blogtechSSDWritePerfMetricsWish

🕒 Posted on 1760898532

By

Leave a Reply

Your email address will not be published. Required fields are marked *