SQL Server performance with many concurrent, long-running queries

I would use the execution plan to see about optimizing the queries. Also you want to be aware that they may lock rows or tables if they take a long period of time. To answer this question you need to take into account how much memory and cpu power your sql server can handle. Testing it on a development environment will give you a good read on whats going to happen, the longer queries take, the more resources are taken and they may become a bottleneck for your entire system.


CPU

Each request coming to the server (ie. each 'batch') will be associated with a 'task', see sys.dm_os_tasks. The task are queued up on a 'scheduler', which is roughly speaking a CPU core, see sys.dm_os_schedulers. Each schedulers have several 'workers' (ie. threads or fibers, see sys.dm_os_workers) and a free worker will pick up next task from the scheduler's queue and 'run away' with it, executing it until the task is finished (ie. the requests has completed). This scheduling mechanism applies to everything inside SQL, including system tasks, CLR running code and so on and so forth.

The number of tasks that can be created is limited by available memory. Requests ('batches') do not equate one-to-one to tasks since some requests once started schedule more tasks to be executed, parallel queries being the typical example. The number of workers in the system is dynamic, but capped by the 'max worker threads' configuration setting. If the workers cap was reached, then new scheduled tasks will be queued up in the schedulers but not picked up until a worker frees up (finishes a task) and becomes available. When this condition is reached, is called 'worker starvation' and result in an unresponsive server, since new client login handshakes requires the login tasks to be executed (server appears to reject connections) and existing client's new requests will be queued up behind waiting tasks (server takes long time to respond to trivial requests).

So if you have a large number of parallel, long running queries you will consume a large number of workers doing many, long running, tasks. This reduces the size of the free workers pool, resulting in fewer workers available to service other, short tasks that are coming to the server (like OLTP requests, login handshakes etc). The server appears to be unresponsive because the tasks are piling up in the schedulers queues (this can be seen in the sys.dm_os_schedulers DMV work_queue_count column). On extreme cases, you can effectively starve the system of workers, making the server completely unresponsive until some of the workers are free.

Memory

A query plan containing parallel operations is usually associated with full scans of large indexes (large tables). Scanning an index is done by traversing its leaf pages, and reading all leaf pages in a large table means that all those pages have to be present in memory at one time or another during the execution of the query. This in turn creates a demand for free pages from the buffer pool, to house the scanned pages. The demand for free pages produces memory pressure that results in caches being notified to start evicting old entries and in old accessed data pages in the buffer pool being removed. The cache notifications can be witnessed in sys.dm_os_memory_cache_clock_hands. The data page evictions can be controlled by checking the good ole' Page Life Expectancy performance counter.

Evicting cache entries has the effect that the next time the evicted entry is needed (being a compiled plan, a permission token, or whatever) it has to be created from scratch, resulting in more CPU, memory and IO consumed, an effect that can manifest itself even after the long running queries have finished.

Now it may be the case that your system has such gargantuan amounts of RAM installed that scanning a few large tables makes no difference, your RAM can accommodate your entire database with room to spare. In that case there is no problem. But most times this is not the case.

IO

This is related to the point above (the MEMORY). All those page read to satisfy the index scan have to be transferred into memory, which means a (potentially large) portion of the IO bandwidth is consumed by the long running queries. Also, all dirty data pages that are evicted from the buffer pool have to be written to the disk, resulting in more IO. And the clean pages that were evicted are likely going to be needed back some time in the future, so even more IO.

If the IO generated by the scans exceed the bandwidth of your system, the IO operations start queuing up on the disk controller(s). this can be easily checked in the Physical Disk/Avg Queue Length performance counters.

Contention

And finally, the biggest problem: lock contention. As explained, parallel queries almost always imply table scans. And table scans take a shared lock on each row they visit. Its true that they release the lock as soon as the record is read in normal operations mode, but still you are guaranteed that you'll request an S lock on every row in the table. This pretty much guarantees that these scans will hit a row that is locked X by an update. When this happens the scan has to stop and wait for the X lock to be released, which happens when the update transactions finally commits. The result is that even moderate OLTP activity on the table blocks the long running queries. Ideally that is all what happens, and the result is just poor performance. But things can get ugly quickly if the long running query does anything fancy, like acquire page locks instead of row locks. Since these scans traverse the indexes end-to-end and they're guaranteed to enter in conflict with updates, the higher granularity locks acquired by these queries no longer just conflicts with the update locks, but it actually leads to deadlocks. Explaining how this can happen is beyond the point of this reply.

To eliminate the contention, when the queries are legitimately doing full scans, the best alternative is to use the magic snapshot: either database snapshots created for reporting, or using the snapshot isolation levels. Note that some may recommend using dirty reads, I'm yet to find a case when that was actually acceptable.