Choosing the number of clusters in heirarchical agglomerative clustering with scikit

Wikipedia is simply making an extreme simplification which has nothing to do with real life. Hierarchical clustering does not avoid the problem with number of clusters. Simply - it constructs the tree spaning over all samples, which shows which samples (later on - clusters) merge together to create a bigger cluster. This happend recursively till you have just two clusters (this is why default number of clusters is 2) which are merged to the whole dataset. You are left alone with "cutting" through the tree to get actual clustering. Once you fit AgglomerativeClustering you can traverse the whole tree and analyze which clusters to keep

import numpy as np
from sklearn.cluster import AgglomerativeClustering
import itertools

X = np.concatenate([np.random.randn(3, 10), np.random.randn(2, 10) + 100])
clustering = AgglomerativeClustering()
clustering.fit(X)

[{'node_id': next(itertools.count(X.shape[0])), 'left': x[0], 'right':x[1]} for x in clustering.children_]