Original topic:

SSD 980 heat spikes to 84°C/ 183°F

(Topic created: 01-21-2022 06:10 PM)
user4ZMk19cGny
Astronaut
Options
Monitors and Memory

980temp.jpg

As you can see in the sensor graph(teal color) and min/max in the table the composite and temp1 sensors sometimes shoot straight to exactly 84°C and back down again . Generally these are small spikes but 84°C is well beyond the specified operating temperature(0°C to 70°C) for the 980.

The temp2 sensor stays around 28 to 55°C

I have a thermal grizzly M.2 SSD Cooler installed on the SSD. I removed all 4 of the plastic strips from the thermal pads.

Is this detrimental to my SSD's longevity and performance? And what could be causing this?

userTmFW5W1NS7
Astronaut
Options
Monitors and Memory

I'm on 2B4QFXO7 

Seeing the same temp reports in nvme-cli and sensors

Interestingly the drive itself is def not 84C I'm able to press my finger against the drive heatsink without any discomfort and using an IR thermometer only reports about 35c 

0 Likes
userhePoBUWX5P
Constellation
Options
Monitors and Memory

I'm seeing the same thing on Debian. Dumping values from lm-sensors, smartctl or nvme-cli every second shows that temperature 1 is steady in the 33-37 range, then spike to exactly 84 for 1-5 seconds and then immediately return back to normal the second after that. No transitions.

Every time it happens "Thermal Temp. 2 Transition Count" is increased by one. If it wasn't for that, I would write this off as a Linux specific reporting bug.

I've only seen it when the drive is idling and essentially no I/O is happening. When I actually do write or uncompress large files the temp will go up a little bit and slowly, but not these insane spikes.

The drive is brand new and I've only written 67 GB to it. The "Thermal Temp. 2 Total Time" is 58 seconds, which is absurd. It wouldn't take much longer than that to write the entire 67 GBs and it would imply it's been throttling the entire time.

0 Likes
ddaniel51
Comet
Options
Monitors and Memory

I run a high performance gaming machine with 960, 970 Pro, 980, 980 Pro drives in a  Windows 11 environment and observe no issue like all your Linux base machines are showing you.

Any temp fluctuations in Win shows as a ramped increase or decrease over the time being measured and all well within the operating specs of each drive.

Not this spastic immediate differences shown in your linux graphs.  Your NVME graphing is blowing smoke at you.

 

0 Likes
userhePoBUWX5P
Constellation
Options
Monitors and Memory

While I hope you are right ddaniel51, there is one reason I think you're not:

Our tools are also showing the total time that the drive has been throttling because it's above 82 degrees C. That is a lifetime total and it increases during these spikes.

I can boot into Windows from a different drive, run Samsung Magician and see that exact same number in the S.M.A.R.T. section. It's called Warning Composite Temperature Time.

If the spikes are only in the imagination of the Linux tools, why would Samsung Magician in Windows show how many minutes of throttling they have causes?

EDIT: To be clear, I do think the 84 degree readings are bogus and not the actual temperature. But they come from the controller on the drive and it is throttling as a result.

0 Likes
usergEyCey7MnW
Astronaut
Options
Monitors and Memory

Happens to me too. I have 2 of these drives and 3 times within the past month it has gone to 84 degrees each time. 2 on 1 drive and 1 on the other. The last recording it went from 24 to 84 to 24 within 5 seconds. That is not physically possible. chrome_AhZRN7YLh3.png

0 Likes
userO1Csr8746s
Constellation
Options
Monitors and Memory

Found this discussion through Google as I have the same problem (AlmaLinux8 with self compiled 5.16.4 kernel). What I think what happens is bit shift. I think that 84 degrees is in reality 43 as I never seen that in logging. Why I'm thinking bit shift is that 42 degrees is in binary 101010 and 84 is 1010100. So instead of adding one to 42 bits get shifted to left by one.  Anyone have idea where you can report this and suggest that Samsung takes a look?

0 Likes
useruBpXDnA8iS
Constellation
Options
Monitors and Memory

Sounds like the issue!! Hopefully a new firmware update will do the trick.

Also 'power on hours' from smartctl is also buggy...It shows way to less hours from real start of 'power on'.

 

0 Likes
usergfWxAWQAic
Constellation
Options
Monitors and Memory

From my observation the power on hours look to only go up while the drive is being accessed. I plugged one of these drives into my server and never mounted it at one point. The power hours never went up until I actually started using the drive. When there was very light inconsistent workload the power on hours only when up a few hours a day. 

0 Likes
gaaf
Constellation
Options
Monitors and Memory

Same issue here with Samsung SSD 980 1TB on firmware 1B4QFXO7.

userq0JkbuvryR_0-1646903813748.png

 

Kalerzknopf
Constellation
Options
Monitors and Memory

I have exactly the same Problem with Debian Bullseye / OMV (Open Media Vault).
I tried updating the Firmware from 1B4QFXO7 to 2B4QFXO7, no difference.

The problem is really annoying, because I receive high temperature warnings several times a day, and if I turn off the warnings the magnetic drives are not monitored as well.

I find it strange that only newer Linuxes seem to have the issue. Others have reported that under Windows there is no such problem, and the problem really occurs inside the SSD since it also increases its failure counters every time the 84 degC are reported. A firmware update should fix this! A workaround in Linux to avoid this situation would be a rather bad solution in my opinion.

 

0 Likes