Alternatives to running query for rarely changed data everytime on large table

I am assuming from the DISTINCT that PortNames are duplicated in your table and that there are not 10 million different portnames being returned.

The minimal effort solution is to just place an index on that column:

CREATE INDEX IX_Ports_PortName ON Ports(PortName);

Of course there is still some DB load with this and storage overhead, so you may want a more sophisticated solution such as Caching, which Aaron Bertrand covers quite well in his answer.

You could also employ more Normalization: If portnames are duplicated and knowing them distinctly is important, then you could make a [PortNames] table, and use a PortNameID in the [Ports] table. That way you could just scan the [PortNames] table which would presumably be much smaller and faster. Of course that may have additional costs and considerations of its own.


For data that doesn't change often, you can use a caching layer where those queries go. There are many alternatives, such as [memcached], and many discussions already exist:

  • Cache Comparison: Memcached vs. Sql Server ……… (wait…, what?)
  • can I get Memcached running on a Windows (x64) 64bit environment?

You can also do this quite easily yourself, and depending on the scope and size of the data, you can do it on the cheap. I did this kind of thing in a previous life, where I placed a SQL Server Express instance on each app/web server, and wrote my own scripts to swap out the data in those instances periodically with minimal disruption. This kept all that heavy read activity off the primary instance and also offered the flexibility of how stale those cached copies of the data could get (simply by changing the frequency of the refresh jobs). I wrote about this process here:

  • Schema Switch-a-Roo
  • Schema Switch-a-Roo : Part 2

Another thing you can do is use log shipping to implement a poor man's Availability Group. Basically you have a set of log shipped targets, cycle through them restoring the latest logs on a schedule, and a dynamic app that knows which target to use for the next read request it gets. I wrote about that process here:

  • Readable Secondaries on a Budget

If your data is larger than 10GB, or will exceed that in the future, then Express won't work, and you'll have to use at least Standard Edition. But this type of operation, where you scale OUT reads onto commodity hardware, is much less expensive than increasing cores/memory/disk on the primary server to scale UP.

If isolating reads from writes isn’t the primary goal, then for this very specific case you can use other local solutions like indexed views. Just remember they create overhead, and you can’t be flexible with those, like adjusting how often the data is replicated (and therefore how stale the read copies are). Other query scenarios won’t lend themselves to indexed views.


Somehow, nobody mentioned an indexed view. A very brief intro to the indexed views can be found at What You Can (and Can’t) Do With Indexed Views.

In essence it is a cache, which is maintained by the engine automatically behind the scenes. Indexed view is stored on disk and updated automatically when the underlying table changes.

So, updates, deletes and inserts into the main table would become somewhat slower, but querying the indexed view would be instant, because it will not scan 10M rows of the main table. In any case, the engine is smart enough not to scan the whole 10M row table when it is updated to adjust the values stored in the indexed view.

Besides, the question title says "Alternatives to running query for rarely changed data", so I assume that this large table doesn't change often anyway. I think, indexed view would be perfect here.

You can't have DISTINCT in an indexed view, but your query can be rewritten without it like this:

SELECT PortName, COUNT_BIG(*) AS cc 
FROM Ports 
GROUP BY PortName

If an indexed view contains GROUP BY it needs COUNT_BIG(*), so I added it.