Is it OK to run perfmon on production servers? And why?

Solution 1:

SQL Server, and most other products, generate the counters all the time, no matter if there are listeners or not (ignoring the -x startup option). Counter tracing is completely transparent on the application being monitored. There is a shared memory region on which the monitored application writes and from which monitoring sessions read the raw values at the specified interval. So the only cost associated with monitoring is the cost of the monitoring process and the cost to write of the sampled values to disk. Choosing a decent collection interval (I usually choose 15 sec) and a moderate number of counters (50-100), and writing into a binary file format usually leaves no impact on the monitored system.

But I'd recommend against using Perfmon (as in perfmon.exe). Instead get yourself familiar with with logman.exe, see Description of Logman.exe, Relog.exe, and Typeperf.exe Tools. This way you don't tie the collection session to your session. Logman, being a command line tool, can be used in scripts and scheduled jobs to start and stop collection sessions.

Solution 2:

There's nothing wrong with running perfmon on production boxes. It's relatively low key, and can gather a lot of good info for you. And how would you accurately simulate production loads if you didn't run some analysis on the production server? From Brent Ozar in your own link:

Let Perfmon run for a day or two to gather a good baseline of the server’s activity. It’s not that invasive on the SQL Server being monitored, and the in-depth results will pay off. The more data we have, the better job we can do on analyzing the Perfmon results.

I've run perfmon on a number of production Exchange boxes with no adverse effects.


Solution 3:

Ever since I listened to Clint Huffman, who wrote PAL a utility for analyzing Perfmon Logs, on a podcast once. I have setup what I call the Flight Recorder on all of our production application servers. This practice has come in very handy for diagnosing problems and monitoring trends.

Below is the script I use to setup an auto-starting Perfmon Collector, with log purging. If desired, it can be fed a file listing performance counters to collect (one per line) or a PAL Threshold XML file. I like to use the PAL Threshold files.

<#
Install-FlightRecorder.ps1
.SYNOPSIS
Installs or sets up the pieces necessary to create PerfMon Collector 
snapshots, one a minute, to a file located in C:\FlightRecorder.

.DESCRIPTION
Installs or sets up the pieces necessary to create PerfMon Collector 
snapshots, one a minute, to a file located in C:\FlightRecorder.

.PARAMETER Path
File listing performance counters to collect, one per line. 
Or a PAL Threshold XML file.

#>
[CmdletBinding()]
param (
    [string]$Path
)

#Requires -RunAsAdministrator
$ScriptDir = { Split-Path $MyInvocation.ScriptName –Parent }
$DeleteTempFile = $False

function Main {
    if (-not $Path) { $Path = DefaultFile $Path }
    if (-not (Test-Path $Path)) {
        Write-Warning "Path does not exist or is inaccessable: $Path"
        Exit 1
    }
    if ($Path -like '*.xml') { $Path = PALFile $Path }

    Install-FlightRecorder
    if ($Path.startswith($env:TEMP)) {Remove-Item $Path}
    Write-Verbose 'Installation Successful.'
}

function Install-FlightRecorder {
    Write-Verbose 'Setting up the Flight Recorder.'
    if (-not (Test-Path c:\FlightRecorder\)) {
        mkdir c:\FlightRecorder | out-null 
    }
    if ((LOGMAN query) -match 'FlightRecorder') {
        Write-Verbose 'Removing former FlightRecorder PerfMon Collector.'
        LOGMAN stop FlightRecorder | out-null
        LOGMAN delete FlightRecorder | Write-Verbose
    }
    Write-Verbose 'Creating FlightRecorder PerfMon Collector.'
    LOGMAN create counter FlightRecorder -o "C:\FlightRecorder\FlightRecorder_$env:computername" -cf $Path -v mmddhhmm -si 00:01:00 -f bin | Write-Verbose
    SCHTASKS /Create /TN FlightRecorder-Nightly /F /SC DAILY /ST 00:00 /RU SYSTEM /TR 'powershell.exe -command LOGMAN stop FlightRecorder; LOGMAN start FlightRecorder; dir c:\FlightRecorder\*.blg |?{ $_.LastWriteTime -lt (Get-Date).AddDays(-3)} | del' | Write-Verbose
    SCHTASKS /Create /TN FlightRecorder-Startup /F /SC ONSTART /RU SYSTEM /TR "LOGMAN start FlightRecorder" | Write-Verbose
    SCHTASKS /Run /TN FlightRecorder-Startup | Write-Verbose
}

function DefaultFile {
    Write-Warning 'Counter or PAL file not specified, using default configuration.'
    $DeleteTempFile = $True
    $Path = [System.IO.Path]::GetTempFileName()
    Set-Content -Encoding ASCII $Path @'
\LogicalDisk(*)\Avg. Disk sec/Read
\LogicalDisk(*)\Avg. Disk sec/Write
\LogicalDisk(*)\Disk Transfers/sec
\LogicalDisk(C:)\Free Megabytes
\Memory\% Committed Bytes In Use
\Memory\Available MBytes
\Memory\Committed Bytes
\Memory\Free System Page Table Entries
\Memory\Pages Input/sec
\Memory\Pages/sec
\Memory\Pool Nonpaged Bytes
\Memory\Pool Paged Bytes
\Memory\System Cache Resident Bytes
\Network Interface(*)\Bytes Total/sec
\Network Interface(*)\Output Queue Length
\Paging File(*)\% Usage
\Paging File(*)\% Usage Peak
\PhysicalDisk(*)\Avg. Disk sec/Read
\PhysicalDisk(*)\Avg. Disk sec/Write
\Process(_Total)\Handle Count
\Process(_Total)\Private Bytes
\Process(_Total)\Thread Count
\Process(_Total)\Working Set
\Processor(*)\% Interrupt Time
\Processor(*)\% Privileged Time
\Processor(*)\% Processor Time
\System\Context Switches/sec
\System\Processor Queue Length
'@
    $Path
}

function PalFile {
    $DeleteTempFile = $True
    $InputPath = $Path
    $Path = [System.IO.Path]::GetTempFileName()
    $filesRead = @()
    Read-PalFile $InputPath | Select -Unique | sort | Set-Content -Encoding ASCII $Path
    $Path
}

$script:filesRead =@()
function Read-PalFile ([string]$path) {
    if (-not (Test-Path $path)) {
        Write-Warning "PAL Threshold file not found: $path"
        return
    }
    if ($script:filesRead -contains $path) {return}
    $script:filesRead += @($path)
    Write-Verbose "Reading PAL Threshold file: $path"
    $xml = [XML](Get-Content $path)
    $xml.SelectNodes('//DATASOURCE[@TYPE="CounterLog"]') | select -expand EXPRESSIONPATH
    $xml.SelectNodes('//INHERITANCE/@FILEPATH') | select -expand '#text' | where {$_ } | ForEach {
        $newpath = Join-Path (Split-Path -parent $path) $_
        Write-Debug "Inheritance file: $newpath"
        Read-PalFile $newpath
    }
}

. Main

Solution 4:

We do it quite frequently. It is also essential for establishing a baseline in the real environment, so you can compare later if there are issues or you need to perform a capacity study.

I recommend not going below a 10-second interval though. If you are collecting many objects/counters and the interval is too frequent, it may impact operations.

Microsoft has a PerfMon Wizard that will setup the task for you.

http://www.microsoft.com/downloads/details.aspx?FamilyID=31FCCD98-C3A1-4644-9622-FAA046D69214&displaylang=en


Solution 5:

In an ideal world where a production server exactly mirrors what a dev server, does and is also an exact duplicate of the dev server, perfmon should never be required on the production server because the results would be the same as those on the dev server. Of course that mythical situation never happens, so we do need to run perfmon on production servers and there is absolutely nothing wrong with that. Amongst other things, we may need to use perfmon and other tools to learn why the production server isn't behaving the same as the dev server.