Check if all vowels exist in pairs of strings

Java openjdk8

Using bit arrays,

Coding each word on an int where if a is present the lowest bit is 1, if e the second is 1, etc.

For example password has a and o, so its code is 1|8 == 9 == 0b01001. As we only need the number of pairs we can store the number of words for each code in an array of size 32.

Then for each pair of different indexes except the last (31) where first is less than second and where bitwise or of indexes is 31, we multiply the number of words with code, and the last is particular because it's the only one that can make a pair with a word of the same code, and also with any word.

public static void main( String[] args ) {
    int[] a = new int[ 32 ];
    int count = 0;
    try (Scanner scan = new Scanner( System.in )) {
        while ( scan.hasNext() ) {
            String s = scan.next();
            a[ code( s ) ]++;
        }
    }

    int t = 0;
    for ( int i = 0 ; i < 31 ; ++i ) {
        count += a[ i ];
        for ( int j = i + 1 ; j < 31 ; ++j ) {
            if ( ( i | j ) == 31 ) {
                t += a[ i ] * a[ j ];
            }
        }
    }

    t += a[ 31 ] * count + a[ 31 ] * ( a[ 31 ] - 1 ) / 2;
    System.out.println( t );
}

// a e i o u
// 1 2 4 8 16
public static int code( String s ) {
    int r = 0;
    for ( char c : s.toCharArray() ) {
        r |= c == 'a' ? 1 : c == 'e' ? 2 : c == 'i' ? 4 : c == 'o' ? 8 : c == 'u' ? 16 : 0;
    }
    return r;
}

TIO

C++

translating to C++ is straightforward

test with 100 first words

test on TIO with 10000 words of dataset

1968503

Real time: 0.419 s
User time: 0.315 s
Sys. time: 0.099 s
CPU share: 98.83 %
Exit code: 0

APL (Dyalog Unicode)

Two solutions designed by my esteemed colleague Marshall, master of speeding up APL. The formulas used are both in '70s APL, so they works in APL\360 on an IBM/360 mainframe. They are anonymous prefix functions taking a character matrix as argument. N is not needed but may be supplied as optional (and ignored) left argument.

The first solution is O(n2)

A naive approach:

{0.5×(+/,(⍉i)∧.∨i)-+/∧⌿i←∨/'aeiou'∘.=⍵}

Try it online!

Processes* the large data set in about 4.5 ms. TIO takes about twice that.

{} "dfn"; the argument is :

'aeiou'∘.=⍵ equality 3D array with vowels along the first axis, words along the middle axis and characters along the last axis

∨/ OR-reduction along the last axis; this gives us a 5-row table with one column per word

i← store that in i (for in)

∧⌿ AND-reduction along the first axis; this gives us a mask of words that contain all vowels

+/ sum; count of words containing all vowels (we need to discount their "self-pairs")

()- subtract that from the following:

  (⍉i)∧.∨i the Boolean matrix where the OR of corresponding masks is all-true.

  , ravel (flatten)

  +/ sum; this gives us the count of ordered pairs

0.5× halve; this gives us the count of unordered pairs

The second solution is O(n)

Breaks even with the above O(n2) solution at about 150 words of 10 evenly distributed characters. It requires 0-based indexing, and a pre-computed 32-by-32 Boolean vowel signature pairing table:

{
    a←0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    a[2⊥∨/'aeiou'∘.=⍵]+←1
    (0.5×lׯ1+l←a[31])++/,p×a∘.×a
}

Processes* the large dataset in 0.09 ms. TIO's timing is too inconsistent to make any conclusions, but it is likely that it takes about twice the time here too.


* 64-bit Dyalog APL 17.0 on 2.6 Ghz i7-4720HQ