May
12
2013

Bug Check 0×116: VIDEO_TDR_ERROR Troubleshooting Tips

Over the years I have come across lot of 0×116 errors most of the times I was able to fix it. In this article I’ll share some tips that might help you in resolving such errors. I would like to thank JCGriff, Usasma, Ken, H2SO4, VirGnarus and all the other BSOD analyst who helped me gather these information.

Unfortunately there is no easy fix for this we have to do lot of troubleshooting to narrow down the issue and finally fix it. Bug Check 0×116: VIDEO_TDR_ERROR mostly happen due to Graphics card or display card related issues. So first a little explanation about what is TDR

According to Microsoft

Windows Vista and later operating systems use the timeout detection and recovery (TDR) process to detect and recover from these seemingly frozen situations. The following sections describe the TDR process and registry keys that enable TDR debugging. Windows Vista/7/8 attempts to detect these problematic hang situations and recover a responsive desktop dynamically. In this process, the Windows Display Driver Model (WDDM) driver is reinitialized and the GPU is reset. No reboot is necessary, which greatly enhances the user experience. The only visible artifact from the hang detection to the recovery is a screen flicker, which results from resetting some portions of the graphics stack, causing a screen redraw. Some older Microsoft DirectX applications may render to a black screen at the end of this recovery. The end user would have to restart these applications. Starting with Windows 8, GPU timeout detection and recovery (TDR) behavior has changed to allow parts of individual physical adapters to be reset, instead of requiring an adapter-wide reset.

So TDR is a 3 step process:

  • Timeout detection
  • Preparation for recovery
  • Desktop recovery

If you want to know more please refer this article from Microsoft.

One of the common symptoms  you might see is this error message in the notification area:

IC504939

So when does the error message turn into a full blown BSOD? If the operating system detects that six or more GPU hangs and subsequent recoveries occur within 1 minute, the operating system bug-checks the computer on the next GPU hang.

One of the first tip as suggested by H2SO4 is

If playing with video driver versions hasn’t helped, make sure the box is not overheating. Try removing a side panel and aiming a big mains fan straight at the motherboard and GPU. Run it like that for a few hours or days – long enough to ascertain whether cooler temperatures make a difference.

Yes overheat can cause this error message so it’s recommended to try some alternative methods to make your CPU & GPU cooler and see if you get the same error message it might be as simple as dust buildup and subsequently inadequate cooling.

Here are some articles that might help you understand the problem

 

Few things we can do to figure out if it’s Software or Hardware. When it some to software things can try are:

  • First and foremost STOP OVERCLOCKING
  • Complete cleanup of Drivers and install the fresh ones. Refer this guide for more assistance even it’s for nVidia most of the steps will work for other brands as well
  • Try with different TDR registry settings for debugging. Here are the relevant keys which you can play with TDR Registry Keys

Some of following software issues are examples that can cause a TDR event:

  • Incompatible drivers
  • Corrupted registry
  • Known vista issues resounding around multiple displays, aero, dreamscape, and various display drivers.
  • Known vista issues that sometimes cause corrupt information to be sent to the video card from system memory (will be addressed completely in SP2, has had various fixes applied since vista’s release)
  • Corrupted Direct X files
  • Corrupted System Files

Now if it’s not software then it’s time to replace your card.

Some of following hardware issues are examples that can cause a TDR event:

  • Failing overclock on CPU or GPU
  • Bad sector in memory resulting in corrupt data being communicated between GPU and the system (either video or system memory)
  • Corrupt hard drive/windows install resulting in corruption to the system registry or the page file
  • Over heating of GPU or CPU again resulting in corrupt data being communicated.
  • GPU failure due to any sort of issue from insufficient power(VERY common) to heat.

To prove that there are lot of stress tests you can run. I’ll suggest a few here.

Furmark is an application designed to stress the GPU by maximizing power draw well beyond any real world application or game. In some cases, this could lead to slowdown of the graphics card due to hitting over-temperature or over-current protection mechanisms. These protection mechanisms are designed to ensure the safe operation of the graphics card. Using Furmark or other applications to disable these protection mechanisms can result in permanent damage to the graphics card and void the manufacturer’s warranty.

According to John (usasma)

  • If you have more than one GPU, select Multi-GPU during setup
  • In the Run mode box, select “Stability Test” and “Log GPU Temperature” (BurnIn test in newer versions).
  • Click “Go” to start the test
  • Run the test until the GPU temperature maxes out – or until you start having problems (whichever comes first).
  • Click “Quit” to exit

Then next one is

According to John (usasma)

  • extract the contents of the zip file to a location of your choice
  • double click on the executable file
  • select “Just stress testing”
  • select the “Blend” test. If you’ve already run MemTest overnight you may want to run the “Small FFTs” test instead.
  • “Number of torture test threads to run” should equal the number of CPU’s times 2 (if you’re using hyperthreading).
  • The easiest way to figure this out is to go to Task Manager…Performance tab – and see the number of boxes under CPU Usage History
  • Then run the test for 6 to 24 hours – or until you get errors (whichever comes first).
  • The Test selection box and the stress.txt file describes what components that the program stresses

The next one you can try is

This utility allows to thoroughly test your video RAM for errors and faults. Video Memory Stress Test includes more than 40 tests, doesn’t change your current video mode, has a lot of test options and a logging feature.

So by doing all these you can narrow down and figure out if it’s software or hardware related problem. If it’s software as a last resort you can reinstall the operating system. But if it’s hardware start looking into other Display cards. Make sure your PSU is powerful enough to handle it. In case you have any question please do ask at the comments session.

About the Author: Shyam Sasindran

Shyam is the founder and editor of Captaindbg.com. He is Microsoft MVP under Windows Expert (IT-Pro) He used to write technical articles for few websites about Tips and tricks about Windows and other technologies.

Twitter Widget

TheWindowsClub

TheWindowsClub

Find us on Google Plus

Facebook Widget

UA-21360431-1