Extract duplicate objects from a List in Java 8

If you could implement equals and hashCode on Person you could then use a counting down-stream collector of the groupingBy to get distinct elements that have been duplicated.

List<Person> duplicates = personList.stream()
  .collect(groupingBy(identity(), counting()))
  .entrySet().stream()
  .filter(n -> n.getValue() > 1)
  .map(n -> n.getKey())
  .collect(toList());

If you would like to keep a list of sequential repeated elements you can then expand this out using Collections.nCopies to expand it back out. This method will ensure repeated elements are ordered together.

List<Person> duplicates = personList.stream()
    .collect(groupingBy(identity(), counting()))
    .entrySet().stream()
    .filter(n -> n.getValue() > 1)
    .flatMap(n -> nCopies(n.getValue().intValue(), n.getKey()).stream())
    .collect(toList());

List<Person> duplicates = personList.stream()
  .collect(Collectors.groupingBy(Person::getId))
  .entrySet().stream()
  .filter(e->e.getValue().size() > 1)
  .flatMap(e->e.getValue().stream())
  .collect(Collectors.toList());

That should give you a List of Person where the id has been duplicated.


To indentify duplicates, no method I know of is better suited than Collectors.groupingBy(). This allows you to group the list into a map based on a condition of your choice.

Your condition is a combination of id and firstName. Let's extract this part into an own method in Person:

String uniqueAttributes() {
  return id + firstName;
}

The getDuplicates() method is now quite straightforward:

public static List<Person> getDuplicates(final List<Person> personList) {
  return getDuplicatesMap(personList).values().stream()
      .filter(duplicates -> duplicates.size() > 1)
      .flatMap(Collection::stream)
      .collect(Collectors.toList());
}

private static Map<String, List<Person>> getDuplicatesMap(List<Person> personList) {
  return personList.stream().collect(groupingBy(Person::uniqueAttributes));
}
  • The first line calls another method getDuplicatesMap() to create the map as explained above.
  • It then streams over the values of the map, which are lists of persons.
  • It filters out everything except lists with a size greater than 1, i.e. it finds the duplicates.
  • Finally, flatMap() is used to flatten the stream of lists into one single stream of persons, and collects the stream to a list.

An alternative, if you truly identify persons as equal if the have the same id and firstName is to go with the solution by Jonathan Johx and implement an equals() method.