Scala GroupBy preserving insertion order?

The following would give you a groupByOrderedUnique method that behaves as you sought. It also adds a groupByOrdered that preserves duplicates as others have asked for in the comments.

import collection.immutable.ListSet
import collection.mutable.{LinkedHashMap => MMap, Builder}

implicit class GroupByOrderedImplicitImpl[A](val t: Traversable[A]) extends AnyVal {
  def groupByOrderedUnique[K](f: A => K): Map[K, ListSet[A]] =
    groupByGen(ListSet.newBuilder[A])(f)

  def groupByOrdered[K](f: A => K): Map[K, List[A]] =
    groupByGen(List.newBuilder[A])(f)

  def groupByGen[K, C[_]](makeBuilder: => Builder[A, C[A]])(f: A => K): Map[K, C[A]] = {
    val map = MMap[K, Builder[A, C[A]]]()
    for (i <- t) {
      val key = f(i)
      val builder = map.get(key) match {
        case Some(existing) => existing
        case None =>
          val newBuilder = makeBuilder
          map(key) = newBuilder
          newBuilder
      }
      builder += i
    }
    map.mapValues(_.result).toMap
  }
}

When I use that code like:

import GroupByOrderedImplicit._
  
val range = 0.until(40)
val in = range ++ range.reverse
  
println("With dupes:")
in.groupByOrdered(_ % 10).toList.sortBy(_._1).foreach(println)
  
println("\nUnique:")
in.groupByOrderedUnique(_ % 10).toList.sortBy(_._1).foreach(println)

I get the following output:

With dupes:
(0,List(0, 10, 20, 30, 30, 20, 10, 0))
(1,List(1, 11, 21, 31, 31, 21, 11, 1))
(2,List(2, 12, 22, 32, 32, 22, 12, 2))
(3,List(3, 13, 23, 33, 33, 23, 13, 3))
(4,List(4, 14, 24, 34, 34, 24, 14, 4))
(5,List(5, 15, 25, 35, 35, 25, 15, 5))
(6,List(6, 16, 26, 36, 36, 26, 16, 6))
(7,List(7, 17, 27, 37, 37, 27, 17, 7))
(8,List(8, 18, 28, 38, 38, 28, 18, 8))
(9,List(9, 19, 29, 39, 39, 29, 19, 9))

Unique:
(0,ListSet(0, 10, 20, 30))
(1,ListSet(1, 11, 21, 31))
(2,ListSet(2, 12, 22, 32))
(3,ListSet(3, 13, 23, 33))
(4,ListSet(4, 14, 24, 34))
(5,ListSet(5, 15, 25, 35))
(6,ListSet(6, 16, 26, 36))
(7,ListSet(7, 17, 27, 37))
(8,ListSet(8, 18, 28, 38))
(9,ListSet(9, 19, 29, 39))

Here's one without maps:

def orderedGroupBy[T, P](seq: Traversable[T])(f: T => P): Seq[(P, Traversable[T])] = {
   @tailrec
   def accumulator(seq: Traversable[T], f: T => P, res: List[(P, Traversable[T])]): Seq[(P, Traversable[T])] = seq.headOption match {
     case None => res.reverse
     case Some(h) => {
       val key = f(h)
       val subseq = seq.takeWhile(f(_) == key)
       accumulator(seq.drop(subseq.size), f, (key -> subseq) :: res)
     }
   }
   accumulator(seq, f, Nil)
 }

It could be useful if you only need to access the results sequentially (no random access) and you want to avoid the overhead of creating and using Map objects. Note: I didn't compare the performance against the other options, it could actually be worse.

EDIT: Just to be clear; this assumes your input is already ordered by the group key. My use case is a SELECT ... ORDER BY.


groupBy as defined on TraversableLike produces an immutable.Map, so you can't make this method produce something else.

The order of the elements in each entry is already preserved, but not the order of the keys. The keys are the result of the function supplied, so they don't really have an order.

If you wanted to make an order based on the first occurrence of a particular key, here's a sketch of how you might do it. Say we want to group integers by their value / 2:

val m = List(4, 0, 5, 1, 2, 6, 3).zipWithIndex groupBy (_._1 / 2)
val lhm = LinkedHashMap(m.toSeq sortBy (_._2.head._2): _*)
lhm mapValues (_ map (_._1))
// Map(2 -> List(4, 5), 0 -> List(0, 1), 1 -> List(2, 3), 3 -> List(6))
// Note order of keys is same as first occurrence in original list