How many threads can I run concurrently?

Each thread consumes more memory (kernel stack, thread environment block, thread-local, stack....). AFAIK there are no explicit limit in Windows, therefore the constrain will be memory (probably the stack for each thread).

In Linux threads are more like processes (with shared memory) and you're constrained by:

cat /proc/sys/kernel/threads-max

A pretty good rule of thumb when running intensive tasks is to run the same number as your physical core count.

Yes, you can run more tasks, but they will wait for resources (or threads in a thread pool) and your box, regardless of size can't quite allocate all of a cpu core resources 100% of the time to a thread due to background/other processes. So the more tasks you instantiate, the more threads you spawn, as they surpass actual possible concurrent threads (1 per core), the more resource management, queuing and swapping will occur.

A test we did where I work now using a viral pattern to launch additional tasks found that optimal was pretty close to the cpu count as a cap. Tasks launched at a one-to-one ratio with the physical core count ran at about 1 minute per task to complete. Set at double the cpu count, task time went from 1 minute average to about 5 minutes average time to complete. It gets geometrically slower the more tasks initiated past core count.

So for example, if you have 8 physical cores, 8 tasks (and using TPL, essentially 8 concurrent threads in active process) should be the fastest. There is your main thread or process which creates the other tasks, and other background processes, but if the box is pretty isolated for your resource exploitation pleasure, those will be fairly minimal.

The upside of programming your task cap based on core count as you chew tasks off a queue or list so when you deploy the application on different sized boxes, it adjusts itself automatically.

To determine this programmatically, we use

var CoreCount = System.Environment.ProcessorCount / 2;

Why divide by two, you ask? Because nearly all modern processors use logical cores or hyperthreading. You should find with your own testing that if you use the Logical count, your overall speed per task, and thus the whole process, will drop significantly. Physical cores is the key. We couldn't see a quick way to find physical vs logical but a quick survey of our boxes found this to be consistently true. YMMV, but this might get your pretty far pretty fast.


Typically, the number of threads the run truly concurrently is determined by the number of CPUs and CPU cores (including hyper threading) you have. That is to say that at any given time the number of threads running (in the operating system) is equal to the number of "cores".

How many threads you can run concurrently in your app depends on a large number of factors. The best (lay man's) number would be the number of cores on the machine but of course that's like pretending no one (no other application) else exists :).

Frankly, I'd say do a lot more study on multi-threading in .NET/Windows because one tends to do more "damage" than good when one doesn't have a really solid understanding. .NET has the concept of a thread pool and you need to know how that works in addition to Windows.

In .NET 3.5/4.0 you should be looking at Tasks (Task Parallel Library) as the library does a much better job of determining how many threads (if at all) to spawn. With the TPL the threadpool gets a major overhaul and it is a lot smarter about spawning threads and task stealing etc. But you typically work with Tasks and not threads.

This is a complex area and as a result, the .NET framework introduced Tasks so as to abstract programmers from threads and therefore allowing the runtime to be smart about this while the programmer just say what she wants and not so much about how to do it.


It depends on hardware as you're (probably) not using a theoretical computer but a physical hardware one, so you have limited resources.

Read: Does Windows have a limit of 2000 threads per process?

Furthermore, even if you could run 5000+ threads, depending on your hardware, that could run much slower than a 10 thread equivalent program. I think you should take a look at thread pooling.