Check if there is an isomorph substring

Python 2, 338 326 323 321 310 306 297 293 290 289 280 279 266 264 259 237 230 229 226 223 222 220 219 217 (260 238 231 228 225 223 221 220 218 with 0 exit status)

exec'''s=raw_input()
S=[M-s.rfind(c,0,M)for M,c in enumerate(s)]
k=0
j=x=%s
while k<=M+x:
 if S[k]>j<W[j]or S[k]==W[j]:
    k+=1;j+=1;T+=[j]
    if j-L>x:print s[k-j:k];z
 else:j=T[j]
'''*2%('-1;T=[0];W=S;L=M',0)
print'No!'

The algorithm is a variation of KMP, using an index-based test for character matching. The basic idea is that if we get a mismatch at position X[i] then we can fall back to the next possible place for a match according to the longest suffix of X[:i] that is isomorphic to a prefix of X.

Working from left to right, we assign each character an index equal to the distance to the most recent previous occurrence of that character, or if there is no previous occurrence then we take the length of the current string prefix. For example:

MISSISSIPPI
12313213913

To test whether two characters match, we compare indices, adjusting appropriately for indices that are greater than the length of the current (sub)string.

The KMP algorithm becomes a little simplified since we cannot get a mismatch on the first character.

This program outputs the first match if one exists. I use a runtime error to exit in the event of a match, but the code can be easily modified to exit cleanly at the cost of some bytes.

Note: For computing indices, we can use str.rfind (as opposed to my earlier approach using a dictionary) and still have linear complexity, assuming that str.rfind starts searching from the end (which seems the only sane implementation choice) -- for each character in the alphabet, we never have to traverse the same part of the string twice, so there is an upper bound of (size of alphabet) * (size of string) comparisons.

Since the code got fairly obfuscated in the course of golfing, here is an earlier (293 byte) solution that's a bit easier to read:

e=lambda a:a>i<W[i]or a==W[i]
exec('s=raw_input();S=[];p={};M=i=0\nfor c in s:S+=[M-p.get(c,-1)];p[c]=M;M+=1\nW=S;L=M;'*2)[:-9]
T=[0]*L
k=1
while~k+L:
 if e(W[k]):i+=1;k+=1;T[k]=i
 else:i=T[i]
m=i=0
while m+i<M:
 if e(S[m+i]):
    if~-L==i:print s[m:m+L];z
    i+=1
 else:m+=i-T[i];i=T[i]
print'No!'

The e function tests equivalence of characters. The exec statement assigns indices and does some variable initialisations. The first loop processes X for fall back values, and the second loop does the string search.

Update: Here is a version that exits cleanly, at the cost of one byte:

r='No!'
exec'''s=raw_input()
S=[M-s.rfind(c,0,M)for M,c in enumerate(s)]
k=0
j=x=%s
while k<=M+x:
 if S[k]>j<W[j]or S[k]==W[j]:
    k+=1;j+=1;T+=[j]
    if j-L>x:r=k=s[k-j:k]
 else:j=T[j]
'''*2%('-1;T=[0];W=S;L=M',0)
print r

Python 3, 401 bytes

import string,itertools
X=input()
Y=input()
x=len(X)
t=[-1]+[0]*~-x
j=2
c=0
while j<x:
 if X[j-1]==X[c]:c+=1;t[j]=c;j+=1
 elif c>0:c=t[c]
 else:t[j]=0;j+=1
s=string.ascii_letters
*b,=map(s.find,X)
for p in itertools.permutations(s):
 m=i=0
 while m+i<len(Y):
  if p[b[i]]==Y[m+i]:
   if~-x==i:print(Y[m:m+x]);exit()
   else:i+=1
  else:
   if-1<t[i]:m+=i-t[i];i=t[i]
   else:i=0;m+=1
else:print("No!")

This is still mostly ungolfed, but I think it should work. The core algorithm is KMP, plus an additional factor which is factorial in the size of the alphabet (which is fine, since the alphabet is constant). In other words, this is/should be one completely impractical linear algorithm.

Here's a few annotations to help with the analysis:

# KMP failure table for the substring, O(n)
t=[-1]+[0]*~-x
j=2
c=0
while j<x:
 if X[j-1]==X[c]:c+=1;t[j]=c;j+=1
 elif c>0:c=t[c]
 else:t[j]=0;j+=1

# Convert each char to its index in a-zA-Z, O(alphabet * n)
s=string.ascii_letters
*b,=map(s.find,X)

# For every permutation of letters..., O(alphabet!)
for p in itertools.permutations(s):
 # Run KMP, O(n)
 m=i=0
 while m+i<len(Y):
  if p[b[i]]==Y[m+i]:
   if~-x==i:print(Y[m:m+x]);exit()
   else:i+=1
  else:
   if-1<t[i]:m+=i-t[i];i=t[i]
   else:i=0;m+=1
else:print("No!")

For testing, you can replace s with a smaller alphabet than string.ascii_letters.


APL (Dyalog), 32 bytes

This is an infix function, taking X as left argument and Y as right argument.

{(s,⊂'No!')⊃⍨(⍳⍨¨s←⍵,/⍨≢⍺)⍳⊂⍳⍨⍺}

Try it online!

{} anonymous lambda where and represent the arguments (X and Y)

⍳⍨⍺ɩndex selfie of X (ɩndices of the first occurrence of elements of X in X)

 enclose so we can look for that entire pattern

()⍳ɩndex of first occurrence of that in…

  ≢⍺ tally (length) of X

  ⍵,/⍨ all substrings of that size of Y (lit. concatenation reduction of those, but that is a no-op)

  s← store in s (for substrings)

  ⍳⍨¨ɩndex selfie of each of those

 now we have the index of the first pattern, or 1 + the number of patterns if no match was found

()⊃⍨ use that index to pick from…

  ⊂'No!' the enclosed string (so that it functions as a single element)

  s, prepended with s

Tags:

Code Golf