high values for system - context switches\sec

Jan 28, 2014 at 12:56 AM
i got a windows 2012 essentials server where the user is complaining about intermittent slowness.

I'm trying to correlate the time of their complaints with slowness in one of the subsystems...

but one thing i did notice is consistently high context switches. i have an avg of 40,000 and frequent spikes to 90,000 all day every day.

this is a single proc 4 core xeon server with hyper-threading enabled. the avg cpu stats for all cores are about 27% utilization. no cores are triggering utilization alerts.

given the thresh-hold is 20,000, is this definitely an abnormal value? if so, how to i look further into the cause?
Jan 29, 2014 at 9:21 PM
having read clint's writeup, yes it does look like a definite problem, but I still am not sure how to proceed.
http://blogs.technet.com/b/clinth/archive/2009/10/28/the-case-of-the-2-million-context-switches.aspx

more info:
all firmware and drivers are up to date.

PAL quick system overview doesn't show any alerts except for the context switching alerts (alerting all day long), so privileged CPU is around 20-30% (enough to trigger the context switches/sec threshold), but there aren't any other subsystems with high utilization

I can see with procmon that the 'interrupts' process context switches delta is usually 7,000 but spikes to 90,000 once every four seconds. so that seems to confirm where the majority of context switches are coming from.

last night i tried doing a profile with the windows performance toolkit, but I cant see any charts that expose which modules are causing the high context switches.

any suggestions on next steps,

can high context switches be caused by faulty hardware?

would opening a support case with MS, lenovo, or LSI would be fruitful? how do i convince them that the issue is with their product.
Feb 3, 2014 at 7:23 PM
I've opened a product support case with MS, but it hasn't made any progress yet

some stats in various startup modes:

safemode with networking the context switches/sec avg is 1,000

msconfig startup with all non-ms 3rd party services disabled, the context switches/sec avg is 12,000 with spikes to 35,000 once every four seconds

normal startup, the context switches/sec avg is 16,000 with spikes to 70,000 once every four seconds

after running for 24 hours, the context switches/sec avg is 35,000 with spikes once every four seconds
Coordinator
Feb 4, 2014 at 5:54 AM
Hi jwf1776,

Sorry for the delay in my response, I've been heads down working on my upcoming book on Windows performance analysis.

A context switch is where a processor switches from one thread to another. This is similar to a human working on a task and then switching to work on another task. The CPU usage can be inefficient if the processors are doing more switching between threads than doing real work. This commonly happens when there are a large number of ready threads in the processor queues.

Context switching is only a problem if there is high privileged (kernel mode) processor usage. Kernel mode processor usage is considered high when greater than 10% of overall processor usage or individual for each of the processors, but this depends greatly on if the system commonly does driver activities such as network or disk IO.

Can high context switches be caused by faulty hardware? This is certainly possible, but a better metric of faulty hardware is % DPC or % Interrupt Time of greater than 5% on any processor.

To move forward with this, I recommend getting an ETW trace using Windows Performance Recorder (WPR) to trace the processor usage, thread usage, and stacks. Otherwise, we can only speculate. Do a search on xperf.exe or Windows Performance Recorder and you should find some articles on how to analyze processor usage with Windows Performance Analyzer. Here is a quick overview: http://msdn.microsoft.com/en-us/library/windows/hardware/jj679884.aspx
Feb 6, 2014 at 9:09 PM
sure, so I open an etl file from the server

I go to cpu usage (precise), then switch the graph to context switch count by process:
(should be a screen shot blow this)
Image

every 5 seconds there is a context switch spike, the biggest culprits are system and javaw.exe

what do I do from here?
Feb 6, 2014 at 9:30 PM
Coordinator
Feb 26, 2014 at 1:45 AM
The side effect of high context switching is high privileged mode CPU. I'm not seeing any high CPU usage, so context switching shouldn't be problem. I think in this case, PAL is having a false alarm on context switching. When users are waiting on a system that is idle, it is very common for that system to be waiting on other network resources. Have you done any network analysis? If urgent, please open a support case with Microsoft. By the way, ETL traces are very interesting. I wouldn't mind taking a quick look at it if you share it out.