As you can see in the sensor graph(teal color) and min/max in the table the composite and temp1 sensors sometimes shoot straight to exactly 84°C and back down again . Generally these are small spikes but 84°C is well beyond the specified operating temperature(0°C to 70°C) for the 980.
The temp2 sensor stays around 28 to 55°C
I have a thermal grizzly M.2 SSD Cooler installed on the SSD. I removed all 4 of the plastic strips from the thermal pads.
Is this detrimental to my SSD's longevity and performance? And what could be causing this?
I'm on 2B4QFXO7
Seeing the same temp reports in nvme-cli and sensors
Interestingly the drive itself is def not 84C I'm able to press my finger against the drive heatsink without any discomfort and using an IR thermometer only reports about 35c
I'm seeing the same thing on Debian. Dumping values from lm-sensors, smartctl or nvme-cli every second shows that temperature 1 is steady in the 33-37 range, then spike to exactly 84 for 1-5 seconds and then immediately return back to normal the second after that. No transitions.
Every time it happens "Thermal Temp. 2 Transition Count" is increased by one. If it wasn't for that, I would write this off as a Linux specific reporting bug.
I've only seen it when the drive is idling and essentially no I/O is happening. When I actually do write or uncompress large files the temp will go up a little bit and slowly, but not these insane spikes.
The drive is brand new and I've only written 67 GB to it. The "Thermal Temp. 2 Total Time" is 58 seconds, which is absurd. It wouldn't take much longer than that to write the entire 67 GBs and it would imply it's been throttling the entire time.
I run a high performance gaming machine with 960, 970 Pro, 980, 980 Pro drives in a Windows 11 environment and observe no issue like all your Linux base machines are showing you.
Any temp fluctuations in Win shows as a ramped increase or decrease over the time being measured and all well within the operating specs of each drive.
Not this spastic immediate differences shown in your linux graphs. Your NVME graphing is blowing smoke at you.
While I hope you are right ddaniel51, there is one reason I think you're not:
Our tools are also showing the total time that the drive has been throttling because it's above 82 degrees C. That is a lifetime total and it increases during these spikes.
I can boot into Windows from a different drive, run Samsung Magician and see that exact same number in the S.M.A.R.T. section. It's called Warning Composite Temperature Time.
If the spikes are only in the imagination of the Linux tools, why would Samsung Magician in Windows show how many minutes of throttling they have causes?
EDIT: To be clear, I do think the 84 degree readings are bogus and not the actual temperature. But they come from the controller on the drive and it is throttling as a result.
Happens to me too. I have 2 of these drives and 3 times within the past month it has gone to 84 degrees each time. 2 on 1 drive and 1 on the other. The last recording it went from 24 to 84 to 24 within 5 seconds. That is not physically possible.
Found this discussion through Google as I have the same problem (AlmaLinux8 with self compiled 5.16.4 kernel). What I think what happens is bit shift. I think that 84 degrees is in reality 43 as I never seen that in logging. Why I'm thinking bit shift is that 42 degrees is in binary 101010 and 84 is 1010100. So instead of adding one to 42 bits get shifted to left by one. Anyone have idea where you can report this and suggest that Samsung takes a look?
Sounds like the issue!! Hopefully a new firmware update will do the trick.
Also 'power on hours' from smartctl is also buggy...It shows way to less hours from real start of 'power on'.
From my observation the power on hours look to only go up while the drive is being accessed. I plugged one of these drives into my server and never mounted it at one point. The power hours never went up until I actually started using the drive. When there was very light inconsistent workload the power on hours only when up a few hours a day.
I have exactly the same Problem with Debian Bullseye / OMV (Open Media Vault).
I tried updating the Firmware from 1B4QFXO7 to 2B4QFXO7, no difference.
The problem is really annoying, because I receive high temperature warnings several times a day, and if I turn off the warnings the magnetic drives are not monitored as well.
I find it strange that only newer Linuxes seem to have the issue. Others have reported that under Windows there is no such problem, and the problem really occurs inside the SSD since it also increases its failure counters every time the 84 degC are reported. A firmware update should fix this! A workaround in Linux to avoid this situation would be a rather bad solution in my opinion.