Find the majority element in array

Majority Element:

A majority element in an array A[] of size n is an element that appears more than n/2 times (and hence there is at most one such element).

Finding a Candidate:

The algorithm for first phase that works in O(n) is known as Moore’s Voting Algorithm. Basic idea of the algorithm is if we cancel out each occurrence of an element e with all the other elements that are different from e then e will exist till end if it is a majority element.

findCandidate(a[], size)
1.  Initialize index and count of majority element
     maj_index = 0, count = 1
2.  Loop for i = 1 to size – 1
    (a)If a[maj_index] == a[i]
        count++
    (b)Else
        count--;
    (c)If count == 0
        maj_index = i;
        count = 1
3.  Return a[maj_index]

Above algorithm loops through each element and maintains a count of a[maj_index], If next element is same then increments the count, if next element is not same then decrements the count, and if the count reaches 0 then changes the maj_index to the current element and sets count to 1. First Phase algorithm gives us a candidate element. In second phase we need to check if the candidate is really a majority element.

Second phase is simple and can be easily done in O(n). We just need to check if count of the candidate element is greater than n/2.

Read geeksforgeeks for more details


// returns -1 if there is no element that is the majority element, otherwise that element

// funda :: if there is a majority element in an array, say x, then it is okay to discard 
// a part of that array that has no majority element, the remaining array will still have
// x as the majority element

// worst case complexity :  O(n)

int findMajorityElement(int* arr, int size) { 
    int count = 0, i, majorityElement;
    for (i = 0; i < size; i++) {
        if (count == 0)
            majorityElement = arr[i];
        if (arr[i] == majorityElement) 
            count++;
        else
            count--;
    }
    count = 0;
    for (i = 0; i < size; i++)
        if (arr[i] == majorityElement)
            count++;
    if (count > size/2)
        return majorityElement;
    return -1;
}

It is sad to see that in 5 years no one has written a proper explanation for this problem.

This is a standard problem in streaming algorithms (where you have a huge (potentially infinite) stream of data) and you have to calculate some statistics from this stream, passing through this stream once.


Clearly you can approach it with hashing or sorting, but with a potentially infinite stream you can clearly run out of memory. So you have to do something smart here.


The majority element is the element that occurs more than half of the size of the array. This means that the majority element occurs more than all the other elements combined. That is, if you count the number of times the majority element appears, and subtract the number of occurrences of all the other elements, you will get a positive number.

So if you count the occurrences of some element, and subtract the number of occurrences of all other elements and get the number 0 - then your original element can't be a majority element. This is the basis for a correct algorithm:

Declare two variables, counter and possible_element. Iterate the stream, if the counter is 0 - your overwrite the possible element and initialize the counter, if the number is the same as possible element - increase the counter, otherwise decrease it. Python code:

def majority_element(arr):
    counter, possible_element = 0, None
    for i in arr:
        if counter == 0:
            possible_element, counter = i, 1
        elif i == possible_element:
            counter += 1
        else:
            counter -= 1

    return possible_element

It is clear to see that the algorithm is O(n) with a very small constant before O(n) (like 3). Also it looks like the space complexity is O(1), because we have only three variable initialized. The problem is that one of these variables is a counter which potentially can grow up to n (when the array consists of the same numbers). And to store the number n you need O(log (n)) space. So from theoretical point of view it is O(n) time and O(log(n)) space. From practical, you can fit 2^128 number in a longint and this number of elements in the array is unimaginably huge.

Also note that the algorithm works only if there is a majority element. If such element does not exist it will still return some number, which will surely be wrong. (it is easy to modify the algorithm to tell whether the majority element exists)

History channel: this algorithm was invented somewhere in 1982 by Boyer, Moore and called Boyer–Moore majority vote algorithm


The majority element (if it exists) will also be the median. We can find the median in O(n) and then check that it is indeed a valid majority element in O(n). More details for implementation link