Longest common prefix (LCP) of a list of strings

First, let's start with something related, but much simpler.

:- set_prolog_flag(double_quotes, chars).  % "abc" = [a,b,c]

prefix_of(Prefix, List) :-
   append(Prefix, _, List).

commonprefix(Prefix, Lists) :-
   maplist(prefix_of(Prefix), Lists).

?- commonprefix(Prefix, ["interview", "integrate", "intermediate"]).
   Prefix = []
;  Prefix = "i"
;  Prefix = "in"
;  Prefix = "int"
;  Prefix = "inte"
;  false.

(See this answer, how printing character lists with double quotes is done.)

This is the part that is fairly easy in Prolog. The only drawback is that it doesn't give us the maximum, but rather all possible solutions including the maximum. Note that all strings do not need to be known, like:

?- commonprefix(Prefix, ["interview", "integrate", Xs]).
   Prefix = []
;  Prefix = "i", Xs = [i|_A]
;  Prefix = "in", Xs = [i, n|_A]
;  Prefix = "int", Xs = [i, n, t|_A]
;  Prefix = "inte", Xs = [i, n, t, e|_A]
;  false.

So we get as response a partial description of the last unknown word. Now imagine, later on we realize that Xs = "induce". No problem for Prolog:

?- commonprefix(Prefix, ["interview", "integrate", Xs]), Xs = "induce".
   Prefix = [], Xs = "induce"
;  Prefix = "i", Xs = "induce"
;  Prefix = "in", Xs = "induce"
;  false.

In fact, it does not make a difference whether we state this in hindsight or just before the actual query:

?- Xs = "induce", commonprefix(Prefix, ["interview", "integrate", Xs]).
   Xs = "induce", Prefix = []
;  Xs = "induce", Prefix = "i"
;  Xs = "induce", Prefix = "in"
;  false.

Can we now based on this formulate the maximum? Note that this effectively necessitates some form of extra quantor for which we do not have any direct provisions in Prolog. For this reason we have to limit us to certain cases we know will be safe. The easiest way out would be to insist that the list of words does not contain any variables. I will use iwhen/2 for this purpose.

maxprefix(Prefix, Lists) :-
   iwhen(ground(Lists), maxprefix_g(Prefix, Lists)).

maxprefix_g(Prefix, Lists_g) :-
   setof(N-IPrefix, ( commonprefix(IPrefix, Lists_g), length(IPrefix, N ) ), Ns),
   append(_,[N-Prefix], Ns).   % the longest one

The downside of this approach is that we get instantiation errors should the list of words not be known.

Note that we made quite some assumptions (which I hope really hold). In particular we assumed that there is exactly one maximum. In this case this holds, but in general it could be that there are several independent values for Prefix. Also, we assumed that IPrefix will always be ground. We could check that too, just to be sure. Alternatively:

maxprefix_g(Prefix, Lists_g) :-
   setof(N, IPrefix^ ( commonprefix(IPrefix, Lists_g), length(IPrefix, N ) ), Ns),
   append(_,[N], Ns),
   length(Prefix, N),
   commonprefix(Prefix, Lists_g).

Here, the prefix does not have to be one single prefix (which it is in our situation).

The best, however, would be a purer version that does not need to resort to instantiation errors at all.

Here is how I would implement this:

:- set_prolog_flag(double_quotes, chars).

longest_common_prefix([], []).
longest_common_prefix([H], H).
longest_common_prefix([H1,H2|T], P) :-
    maplist(append(P), L, [H1,H2|T]),
    (   one_empty_head(L)
    ;   maplist(head, L, Hs),
        not_all_equal(Hs)
    ).

one_empty_head([[]|_]).
one_empty_head([[_|_]|T]) :-
    one_empty_head(T).

head([H|_], H).

not_all_equal([E|Es]) :-
    some_dif(Es, E).

some_dif([X|Xs], E) :-
    if_(diffirst(X,E), true, some_dif(Xs,E)).

diffirst(X, Y, T) :-
    (   X == Y -> T = false
    ;   X \= Y -> T = true
    ;   T = true,  dif(X, Y)
    ;   T = false, X = Y
    ).

The implementation of not_all_equal/1 is from this answer by @repeat (you can find my implementation in the edit history).

We use append and maplist to split the strings in the list into a prefix and a suffix, and where the prefix is the same for all strings. For this prefix to be the longest, we need to state that the first character of at least two of the suffixes are different.

This is why we use head/2, one_empty_head/1 and not_all_equal/1. head/2 is used to retrieve the first char of a string; one_empty_head/1 is used to state that if one of the suffixes is empty, then automatically this is the longest prefix. not_all_equal/1 is used to then check or state that at least two characters are different.

Examples

?- longest_common_prefix(["interview", "integrate", "intermediate"], Z).
Z = [i, n, t, e] ;
false.

?- longest_common_prefix(["interview", X, "intermediate"], "inte").
X = [i, n, t, e] ;
X = [i, n, t, e, _156|_158],
dif(_156, r) ;
false.

?- longest_common_prefix(["interview", "integrate", X], Z).
X = Z, Z = [] ;
X = [_246|_248],
Z = [],
dif(_246, i) ;
X = Z, Z = [i] ;
X = [i, _260|_262],
Z = [i],
dif(_260, n) ;
X = Z, Z = [i, n] ;
X = [i, n, _272|_274],
Z = [i, n],
dif(_272, t) ;
X = Z, Z = [i, n, t] ;
X = [i, n, t, _284|_286],
Z = [i, n, t],
dif(_284, e) ;
X = Z, Z = [i, n, t, e] ;
X = [i, n, t, e, _216|_224],
Z = [i, n, t, e] ;
false.

?- longest_common_prefix([X,Y], "abc").
X = [a, b, c],
Y = [a, b, c|_60] ;
X = [a, b, c, _84|_86],
Y = [a, b, c] ;
X = [a, b, c, _218|_220],
Y = [a, b, c, _242|_244],
dif(_218, _242) ;
false.

?- longest_common_prefix(L, "abc").
L = [[a, b, c]] ;
L = [[a, b, c], [a, b, c|_88]] ;
L = [[a, b, c, _112|_114], [a, b, c]] ;
L = [[a, b, c, _248|_250], [a, b, c, _278|_280]],
dif(_248, _278) ;
L = [[a, b, c], [a, b, c|_76], [a, b, c|_100]] ;
L = [[a, b, c, _130|_132], [a, b, c], [a, b, c|_100]];
…

Here's the purified variant of the code proposed (and subsequently retracted) by @CapelliC:

:- set_prolog_flag(double_quotes, chars).

:- use_module(library(reif)).

lists_lcp([], []).
lists_lcp([Es|Ess], Ls) :-
   if_((maplist_t(list_first_rest_t, [Es|Ess], [X|Xs], Ess0),
        maplist_t(=(X), Xs))
       , (Ls = [X|Ls0], lists_lcp(Ess0, Ls0))
       , Ls = []).

list_first_rest_t([], _, _, false).
list_first_rest_t([X|Xs], X, Xs, true).

Above meta-predicate maplist_t/3 is a variant of maplist/2 which works with term equality/inequality reification—maplist_t/5 is just the same with higher arity:

maplist_t(P_2, Xs, T) :-
   i_maplist_t(Xs, P_2, T).

i_maplist_t([], _P_2, true).
i_maplist_t([X|Xs], P_2, T) :-
   if_(call(P_2, X), i_maplist_t(Xs, P_2, T), T = false).

maplist_t(P_4, Xs, Ys, Zs, T) :-
   i_maplist_t(Xs, Ys, Zs, P_4, T).

i_maplist_t([], [], [], _P_4, true).
i_maplist_t([X|Xs], [Y|Ys], [Z|Zs], P_4, T) :-
   if_(call(P_4, X, Y, Z), i_maplist_t(Xs, Ys, Zs, P_4, T), T = false).

First here's a ground query:

?- lists_lcp(["a","ab"], []).
false.                                % fails (as expected)

Here are the queries presented in @Fatalize's fine answer.

?- lists_lcp(["interview",X,"intermediate"], "inte").
   X = [i,n,t,e]
;  X = [i,n,t,e,_A|_B], dif(_A,r)
;  false.

?- lists_lcp(["interview","integrate",X], Z).
   X = Z, Z = []
;  X = Z, Z = [i]
;  X = Z, Z = [i,n]
;  X = Z, Z = [i,n,t]
;  X = Z, Z = [i,n,t,e]
;  X = [i,n,t,e,_A|_B], Z = [i,n,t,e]
;  X = [i,n,t,_A|_B]  , Z = [i,n,t]  , dif(_A,e)
;  X = [i,n,_A|_B]    , Z = [i,n]    , dif(_A,t)
;  X = [i,_A|_B]      , Z = [i]      , dif(_A,n)
;  X = [_A|_B]        , Z = []       , dif(_A,i).

?- lists_lcp([X,Y], "abc").
   X = [a,b,c]      , Y = [a,b,c|_A]
;  X = [a,b,c,_A|_B], Y = [a,b,c]
;  X = [a,b,c,_A|_B], Y = [a,b,c,_C|_D], dif(_A,_C)
;  false.

?- lists_lcp(L, "abc").
   L = [[a,b,c]]
;  L = [[a,b,c],[a,b,c|_A]]
;  L = [[a,b,c,_A|_B],[a,b,c]]
;  L = [[a,b,c,_A|_B],[a,b,c,_C|_D]], dif(_A,_C)
;  L = [[a,b,c],[a,b,c|_A],[a,b,c|_B]]
;  L = [[a,b,c,_A|_B],[a,b,c],[a,b,c|_C]]
;  L = [[a,b,c,_A|_B],[a,b,c,_C|_D],[a,b,c]]
;  L = [[a,b,c,_A|_B],[a,b,c,_C|_D],[a,b,c,_E|_F]], dif(_A,_E) 
…

Last, here's the query showing improved determinism:

?- lists_lcp(["interview","integrate","intermediate"], Z).
Z = [i,n,t,e].                              % succeeds deterministically

Longest common prefix (LCP) of a list of strings

Examples

Tags:

List

Prolog

Logical Purity

Prolog Dif

Prolog Cut

Related

Recent Posts