Algorithm for finding the busiest period?

I would start by thinking of the busy-ness of a point x as the number of activations to the left of x, minus the number of deactivations to the left of x. I would sort the activations and deactivations by the time at which they occur (in O(nlog(n)) time). Then you can traverse the list, tracking the number active (y), incrementing and decrementing that number with activations and deactivations passed. The busiest period will be the points at which y is at its maximum. I can't think of a solution off the top of my head that is better than O(nlog(n)). The brute force would be O(n^2).


I thought you could perhaps use a set() for this, and it would work if your assured that all periods intersect at at least one point.

However, this does not work as soon as a period does not intersect. You may be able to add additional logic to cover this, so I'll post what I was thinking:

>>> periods = [(2, 10), (3, 15), (4, 9), (8, 14), (7, 13), (5, 10),]
>>> intersected = None
>>> for first, second in periods:
...     if not intersected:
...         intersected = set(range(first, second + 1))
...     else:
...         intersected = intersected.intersection(set(range(first, second + 1)))
...
>>> intersected
set([8, 9])

Note: this does not include the 11-15 period. Your probably best off just creating bin pairs as mentioned by R.K.


        1     2     3     4     5     6     7     8     9     10     11     12     13     14     15
1             |--------------------------------------X---------|
2                   |--------------------------------X--------------------------------------------|
3                         |--------------------------X---|
4                                                  |-X-------------------------------------|
5                                           |--------X------------------------------|
6                               |--------------------X----------|
7                                                                     |---------------------------|

             +1    +1     +1   +1           +1     +1    -1    -2     +1           -1     -1     -2
              1     2     3     4           5       6    5      3     4             3      2      0
                                                     ^^^^

Get it?

So you need to transform this:

1: 2 - 10
2: 3 - 15
3: 4 - 9
4: 8 - 14
5: 7 - 13
6: 5 - 10
7: 11 - 15

into:

[(2,+), (3,+), (4,+), (5,+), (7,+), (8,+), (9,-), (10,-), (10,-), (11,+), (13,-), (14,-), (15,-), (15,-)]

and then you simply iterate through, counting up when you see a + and counting down on -. The busiest interval will be when the count is maximum.

So in code:

intervals = [(2, 10), (3, 15), (4, 9), (8, 14), (7, 13), (5, 10), (11, 15)]
intqueue = sorted([(x[0], +1) for x in intervals] + [(x[1], -1) for x in intervals])
rsum = [(0,0)]
for x in intqueue: 
    rsum.append((x[0], rsum[-1][1] + x[1]))
busiest_start = max(rsum, key=lambda x: x[1])
# busiest_end = the next element in rsum after busiest_start 

# instead of using lambda, alternatively you can do:
#     def second_element(x):
#         return x[1]
#     busiest_start = max(rsum, key=second_element)
# or:
#     import operator
#     busiest_start = max(rsum, key=operator.itemgetter(1))

runtime complexity is (n+n)*log(n+n)+n+n or O(n*log(n))

It is also possible to convert this idea into an online algorithm if you don't have the complete list of intervals at the start of the program but is guaranteed that incoming intervals will never be scheduled for a past point. Instead of sorting you will use a priority queue, each time an interval comes, you push in two items, the start point and the end point, each with a +1 and -1 respectively. And then you pop off and count and keep track of the peak hour.