Java variable type Collection for HashSet or other implementations?

Since your example uses a private field it doesn't matter all that much about hiding the implementation type. You (or whoever is maintaining this class) can always just go look at the field's initializer to see what it is.

Depending on how it's used, though, it might be worth declaring a more specific interface for the field. Declaring it to be a List indicates that duplicates are allowed and that ordering is significant. Declaring it to be a Set indicates that duplicates aren't allowed and that ordering is not significant. You might even declare the field to have a particular implementation class if there's something about it that's significant. For example, declaring it to be LinkedHashSet indicates that duplicates aren't allowed but that ordering is significant.

The choice of whether to use an interface, and what interface to use, becomes much more significant if the type appears in the public API of the class, and on what the compatibility constraints on this class are. For example, suppose there were a method

public ??? getRegisteredListeners() {
    return ...
}

Now the choice of return type affects other classes. If you can change all the callers, maybe it's no big deal, you just have to edited other files. But suppose the caller is an application that you have no control over. Now the choice of interface is critical, as you can't change it without potentially breaking the applications. The rule here is usually to choose the most abstract interface that supports the operations you expect callers to want to perform.

Most of the Java SE APIs return Collection. This provides a fair degree of abstraction from the underlying implementation, but it also provides the caller a reasonable set of operations. The caller can iterate, get the size, do a contains check, or copy all the elements to another collection.

Some code bases use Iterable as the most-abstract interface to return. All it does is allow the caller to iterate. Sometimes this is all that's necessary, but it might be somewhat limiting compared to Collection.

Another alternative is to return a Stream. This is helpful if you think the caller might want to use stream's operations (such as filter, map, find, etc.) instead of iterating or using collection operations.

Note that if you choose to return Collection or Iterable, you need to make sure that you return an unmodifiable view or make a defensive copy. Otherwise, callers could modify your class's internal data, which would probably lead to bugs. (Yes, even an Iterable can permit modification! Consider getting an Iterator and then calling the remove() method.) If you return a Stream, you don't need to worry about that, since you can't use a Stream to modify the underlying source.

Note that I turned your question about the declaration of a field into a question about the declaration of method return types. There is this idea of "program to the interface" that's quite prevalent in Java. In my opinion it doesn't matter very much for local variables (which is why it's usually fine to use var), and it matters little for private fields, since those (almost) by definition affect only the class in which they're declared. However, the "program to the interface" principle is very important for API signatures, so those cases are where you really need to think about interface types. Private fields, not so much.

(One final note: there is a case where you need to be concerned about the types of private fields, and that's when you're using a reflective framework that manipulates private fields directly. In that case, you need to think of those fields as being public -- just like method return types -- even though they're not declared public.)


As with all things, it's a question of tradeoffs. There are two opposing forces.

  • The more generic the type, the more freedom the implementation has. If you use Collection you're free to use an ArrayList, HashSet, or LinkedList without affecting the user/caller.

  • The more generic the return type, the less features there are available to the user/caller. A List provides index-based lookup. A SortedSet makes it easy to get contiguous subsets via headSet, tailSet, and subSet. A NavigableSet provides efficient O(log n) binary search lookup methods. If you return Collection, none of these are available. Only the most generic access functions can be used.

Furthermore, the sub-types guarantee special properties that Collection does not: Sets hold unique items. SortedSets are sorted. Lists have an order; they're not unordered bags of items. If you use Collection then the user/caller can't necessarily assume that these properties hold. They may be forced to code defensively and, for instance, handle duplicate items even if you know there won't be duplicates.

A reasonable decision process might be:

  1. If O(1) indexed access is guaranteed, use List.
  2. If elements are sorted and unique, use SortedSet or NavigableSet.
  3. If element uniqueness is guaranteed and order is not, use Set.
  4. Otherwise, use Collection.

It really depends on what you want to do with the collection object.

Collection<String> cSet = new HashSet<>();
Collection<String> cList = new ArrayList<>();

Here in this case if you want you can do :

cSet = cList;

But if you do like :

Set<String> cSet = new HashSet<>(); 

the above operation is not permissible though you can construct a new list using the constructor.

 Set<String> set = new HashSet<>();
 List<String> list = new ArrayList<>();
 list = new ArrayList<>(set);

So basically depending on the usage you can use Collection or Set interface.