How I Met My GPU
posted on Mon, Sep 02 2024 - back to homepage
For the last couple of years or perhaps even more, my GPU (a trusty 970 that is) has been crashing in random intervals. It would crash, a couple of seconds or so later the power light would also go off, without any way of powering it back on other than restarting the whole machine. Sometimes it would go months without a single hiccup, sometimes it’d keep crashing as soon as it gets powered on. As I wasn’t using it primarily anymore, I was quite hesitant to fix it, but finally gave in as I thought tinkering with such an old machine would give me a solid dose of dopamine.
I always thought that it was all about either the PCI-e power cable, or the 6 + 8 power connectors on the PCB, since the only way for me to recreate the issue was slightly bending the power cable. And sometimes after crashing, it wouldn’t boot due to a loose PCI power cable (or so my BIOS thought). This led me to assume that sometimes the cable used to find its sweet spot hence it would go months on end without crashing.
Since it also used to crash on idle, I decided to check the power draw of the card. Using MSI Afterburner, I noticed it was drawing around 80 watts on idle. This exceeds the PCI-e slot’s power limit of around 75 watts. So maybe if I could reduce the idle wattage, I could at least use the card without crashing on idle, since I thought that it would be solely using the PCI-e slot for power. Checking the Nvidia Control Panel revealed that the power profile was set to “Prefer maximum performance”. Luckily, changing it to “Optimal power” easily fixed the high power draw issue on idle. But the crashing kept on. Even though it wouldn’t crash on itself as often as before, I could still reliably recreate the crash by bending the cable. Sometimes even a soft touch on the cable would suffice.
Then I decided to check the cable, obviously. The PSU is a VS550 with no modularity, so if it really was the cable, I’d have to get a brand new PSU. I started by bending the cable in different angles at different spots to see where it started crashing. It would only crash when I bent it near the 6 + 8 power connectors on the GPU. Hoping that it wasn’t the GPU, I changed the only cable I could change. The VS550 comes with 2x 6 + 2 PCI-e power cables, so one of those 2-pin headers was always dangling unused for maybe 10 years. I replaced it with the other 2-pin cable and tried to recreate it. However hard I would bend it, yank it, it wouldn’t crash anymore. Yay! I could use this for another ten years!
Or so I thought to myself. Despite the cable fix, the card kept on crashing. I was certain that I had to get a new GPU. Even though I was already going to get a new one, I didn’t want it to be faulty as I could use it in another experiment rig. I was considering all the possibilities. It probably wasn’t the video memory, as it wasn’t artifacting and lowering the memory clock didn’t help. Then I came across my backup files on my HDD. There were modded BIOS files for the GPU. I remembered that I had extracted the BIOS, completely altered the voltage, power, and clock tables to overclock it, and flashed it back. I never thought about it since the card was running perfectly when I flashed my custom BIOS years ago. So I thought the card could be degraded over the years and might need extra voltage for stability.
I extracted the current BIOS via GPU-Z and opened it up with Maxwell BIOS Tweaker. I had set it up at 1405 MHz on core, 3600 MHz on memory at 1.181v. I increased the voltage one step to 1.187v and decreased the core clock to 1396 MHz. Flashed it back with nvflash. The crashing was gone. It has always been the voltage/clock speed. My happiness is immeasurable and my day is saved. Now I can actually run this card for another ten years, albeit in a different setup.
And this is how I met my GPU again, ten years later.