Generating integers in ascending order using a set of prime numbers

The basic idea is that 1 is a member of the set, and for each member of the set n so also 2n and 5n are members of the set. Thus, you begin by outputting 1, and push 2 and 5 onto a priority queue. Then, you repeatedly pop the front item of the priority queue, output it if it is different from the previous output, and push 2 times and 5 times the number onto the priority queue.

Google for "Hamming number" or "regular number" or go to A003592 to learn more.

----- ADDED LATER -----

I decided to spend a few minutes on my lunch hour to write a program to implement the algorithm described above, using the Scheme programming language. First, here is a library implementation of priority queues using the pairing heap algorithm:

(define pq-empty '())
(define pq-empty? null?)

(define (pq-first pq)
  (if (null? pq)
      (error 'pq-first "can't extract minimum from null queue")
      (car pq)))

(define (pq-merge lt? p1 p2)
  (cond ((null? p1) p2)
        ((null? p2) p1)
        ((lt? (car p2) (car p1))
          (cons (car p2) (cons p1 (cdr p2))))
        (else (cons (car p1) (cons p2 (cdr p1))))))

(define (pq-insert lt? x pq)
  (pq-merge lt? (list x) pq))

(define (pq-merge-pairs lt? ps)
  (cond ((null? ps) '())
        ((null? (cdr ps)) (car ps))
        (else (pq-merge lt? (pq-merge lt? (car ps) (cadr ps))
                            (pq-merge-pairs lt? (cddr ps))))))

(define (pq-rest lt? pq)
  (if (null? pq)
      (error 'pq-rest "can't delete minimum from null queue")
      (pq-merge-pairs lt? (cdr pq))))

Now for the algorithm. Function f takes two parameters, a list of the numbers in the set ps and the number n of items to output from the head of the output. The algorithm is slightly changed; the priority queue is initialized by pushing 1, then the extraction steps start. Variable p is the previous output value (initially 0), pq is the priority queue, and xs is the output list, which is accumulated in reverse order. Here's the code:

(define (f ps n)
  (let loop ((n n) (p 0) (pq (pq-insert < 1 pq-empty)) (xs (list)))
    (cond ((zero? n) (reverse xs))
          ((= (pq-first pq) p) (loop n p (pq-rest < pq) xs))
          (else (loop (- n 1) (pq-first pq) (update < pq ps)
                      (cons (pq-first pq) xs))))))

For those not familiar with Scheme, loop is a locally-defined function that is called recursively, and cond is the head of an if-else chain; in this case, there are three cond clauses, each clause with a predicate and consequent, with the consequent evaluated for the first clause for which the predicate is true. The predicate (zero? n) terminates the recursion and returns the output list in the proper order. The predicate (= (pq-first pq) p) indicates that the current head of the priority queue has been output previously, so it is skipped by recurring with the rest of the priority queue after the first item. Finally, the else predicate, which is always true, identifies a new number to be output, so it decrements the counter, saves the current head of the priority queue as the new previous value, updates the priority queue to add the new children of the current number, and inserts the current head of the priority queue into the accumulating output.

Since it is non-trivial to update the priority queue to add the new children of the current number, that operation is extracted to a separate function:

(define (update lt? pq ps)
  (let loop ((ps ps) (pq pq))
    (if (null? ps) (pq-rest lt? pq)
      (loop (cdr ps) (pq-insert lt? (* (pq-first pq) (car ps)) pq)))))

The function loops over the elements of the ps set, inserting each into the priority queue in turn; the if returns the updated priority queue, minus its old head, when the ps list is exhausted. The recursive step strips the head of the ps list with cdr and inserts the product of the head of the priority queue and the head of the ps list into the priority queue.

Here are two examples of the algorithm:

> (f '(2 5) 20)
(1 2 4 5 8 10 16 20 25 32 40 50 64 80 100 125 128 160 200 250)
> (f '(2 3 5) 20)
(1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 32 36)

You can run the program at http://ideone.com/sA1nn.


Removing a number and reinserting all its multiples (by the primes in the set) into a priority queue is wrong (in the sense of the question) - i.e. it produces correct sequence but inefficiently so.

It is inefficient in two ways - first, it overproduces the sequence; second, each PriorityQueue operation incurs extra cost (the operations remove_top and insert are not usually both O(1), certainly not in any list- or tree-based PriorityQueue implementation).

The efficient O(n) algorithm maintains pointers back into the sequence itself as it is being produced, to find and append the next number in O(1) time. In pseudocode:

  return array h where
    h[0]=1; n=0; ps=[2,3,5, ... ]; // base primes
    is=[0 for each p in ps];       // indices back into h
    xs=[p for each p in ps]        // next multiples: xs[k]==ps[k]*h[is[k]]
    repeat:
      h[++n] := minimum xs
      for each ref (i,x,p) in (is,xs,ps):
        if( x==h[n] )
          { x := p*h[++i]; }       // advance the minimal multiple/pointer

For each minimal multiple it advances its pointer, while at the same time calculating its next multiple value. This too effectively implements a PriorityQueue but with crucial distinctions - it is before the end point, not after; it doesn't create any additional storage except for the sequence itself; and its size is constant (just k numbers, for k base primes) whereas the size of past-the-end PriorityQueue grows as we progress along the sequence (in the case of Hamming sequence, based on set of 3 primes, as n2/3, for n numbers of the sequence).


The classic Hamming sequence in Haskell is essentially the same algorithm:

h = 1 : map (2*) h `union` map (3*) h `union` map (5*) h

union a@(x:xs) b@(y:ys) = case compare x y of LT -> x : union  xs  b
                                              EQ -> x : union  xs  ys
                                              GT -> y : union  a   ys

We can generate the smooth numbers for arbitrary base primes using the foldi function (see Wikipedia) to fold lists in a tree-like fashion for efficiency, creating a fixed sized tree of comparisons:

smooth base_primes = h   where       -- strictly increasing base_primes  NB!
    h = 1 : foldi g [] [map (p*) h | p <- base_primes]
    g (x:xs) ys = x : union xs ys

foldi f z []     = z
foldi f z (x:xs) = f x (foldi f z (pairs f xs))
 
pairs f (x:y:t)  = f x y : pairs f t
pairs f t        = t

It is also possible to directly calculate a slice of Hamming sequence around its nth member in O(n2/3) time, by direct enumeration of the triples and assessing their values through logarithms, logval(i,j,k) = i*log 2+j*log 3+k*log 5. This Ideone.com test entry calculates 1 billionth Hamming number in 1.12 0.05 seconds (2016-08-18: main speedup due to usage of Int instead of the default Integer where possible, even on 32-bit; additional 20% thanks to the tweak suggested by @GordonBGood, bringing band size complexity down to O(n1/3)).

This is discussed some more in this answer where we also find its full attribution:

slice hi w = (c, sortBy (compare `on` fst) b) where  -- hi is a top log2 value
  lb5=logBase 2 5 ; lb3=logBase 2 3                  -- w<1 (NB!) is (log2 width)
  (Sum c, b) = fold                                  -- total count, the band
      [ ( Sum (i+1),                                 -- total triples w/this j,k
          [ (r,(i,j,k)) | frac < w ] )               -- store it, if inside the band
        | k <- [ 0 .. floor ( hi   /lb5) ],  let p = fromIntegral k*lb5,
          j <- [ 0 .. floor ((hi-p)/lb3) ],  let q = fromIntegral j*lb3 + p,
          let (i,frac) = pr  (hi-q)      ;       r = hi - frac    -- r = i + q
      ]    -- (sum . map fst &&& concat . map snd)
  pr = properFraction 

This can be generalized for k base primes as well, probably running in O(n(k-1)/k) time.


(see this SO entry for an important later development. also, this answer is interesting. and another related answer.)