SQL Server clears plan cache and execution stats periodically

First, get the exact times when the plan cache is being cleared. Here's the easiest way to do it - it should run nearly instantly, and won't block anyone:

SELECT TOP 1 creation_time
FROM sys.dm_exec_query_stats WITH (NOLOCK)
ORDER BY creation_time;

If that date/time seems older to you than you'd expected, then only part of the plan cache is being cleared. For example, maybe someone is doing an index rebuild or update stats job, which would flush the plan cache for the specific objects affected - but other objects will still stick around. I see this a lot when system queries (like DMV queries) stick around, but user database plans clear out.

If that date/time updates at specific intervals, like if it seems to update exactly every 2 hours to say 6:00, 8:00, 10:00, etc, then someone's probably running a job or query that causes the plan cache to clear out. Once you know the exact frequency, then you can:

  • Look at your job schedules to see what runs at that interval
  • Run a Profiler trace or Extended Events trace during that timespan to figure out the mystery (I'm not usually a fan of tracing in production, but if you know exactly when the killer is going to strike, it's easy enough to fire up a low-overhead sample of what's running)
  • Log sp_WhoIsActive to a table during that time (the easiest method, but the least likely to narrow it down to the exact query causing it)

If that date/time keeps changing each time you run the query, then your server is probably under memory pressure. Run this to generate basic health check info, and then you can copy/paste it into your Stack question so we can diagnose it:

sp_Blitz @OutputType = 'markdown', @CheckServerInfo = 1, @CheckUserDatabaseObjects = 1

(Disclosure: I'm one of the authors of sp_Blitz.)

Updated 2017/08/25 with your sp_Blitz data - thanks for running sp_Blitz and adding it to your question, and it really helps show a few things. You're running SQL Server 2016 Enterprise Edition on a VM with 2 cores and 16GB of RAM. First, a quick note on licensing: if you're licensing by the guest, the minimum purchase requirement is 4 cores, not 2. (See the SQL Server Licensing Guide for more details.) 4 cores of Enterprise Edition is about $28K USD, and it's fairly unusual to see that much licensing money spent on just 16GB RAM. If you're licensing SQL Server Enterprise Edition at the host level, you can ignore that and run smaller VMs, though.

It looks like your SQL Server is coming under external memory pressure. You've got 16GB RAM, and you've set max server memory at 15GB. Unfortunately, 1GB isn't enough left over for the operating system (plus whatever else you're going to run on there, like backup software and SSMS.) In our SQL Server Setup Guide, we suggest leaving 4GB or 10% free, whichever is greater - in your case, that would be 4GB, so your max server memory setting should be 12GB rather than 15GB.

More evidence shows up in your current memory allocations: you have locked pages in memory (LPIM) turned on, but you've only got 12.02GB of pages locked in memory. That likely (but not guaranteed) means that some other application needed memory, so Windows sent out a memory pressure notification, and SQL Server gave up the other 3GB of memory to let the other app do its thing. That's more proof that you can't really go with a 15GB max - you need memory for other stuff.

When your SQL Server comes under that external memory pressure and needs to free up memory for other apps, your plan cache will suffer.

So you've got a few options:

  • Set max memory appropriately - say, 12GB (or even lower if you're going to run other apps on the server.) That way, SQL Server won't have to have a fire sale on memory and flush stuff out just because some other app needs 2-3GB of RAM - it'll already be available
  • Stop running other apps on the server - this can be tough if it's other sysadmins remote desktopping and running stuff like SSMS, though. I've set up Perfmon counter alarms for the number of RDP sessions open, and alerted when it's anything other than 0 - that can help catch the culprit in action.
  • Add more memory to the VM - but I don't think you really need it. Some evidence is shown by the sp_Blitz report of "no significant waits detected." I don't think you're under frequent memory pressure, especially since you report that it only happens every now and then. This is the least cost-effective option.

OK, OP here, I finally fixed this problem by updating SQL Server 2016 to the latest version. I had SP1 and yesterday I installed Cumulative Update 6.

I also set "max memory" appropriately like Brent's answer suggests. Great answer by the way, I urge everyone to upvote it.

It's been 36 hours and counting, plans not being reset.

Brent Ozar also has a very nice website here: https://sqlserverupdates.com/ to help determine which updates you need.

One other thing that helped was detecting and resolving "untrusted foreign keys" issue. Brent has a very nice article (hahah, yeah, Brent again, I know right) on how to resolve it, just google, he's the #1 result