De-Snakify a String

JavaScript (ES6) + SnakeEx, 176 bytes

a=b=>"s{A}:<+>([^ ]<P>)+",b).reduce((c,d)=>(e=c.marks.length-d.marks.length)>0?c:e?d:c.x+c.y>d.x+d.y?d:c).marks.reduce((c,d,e,f)=>e%2?c+b.split`\n`[d][f[e-1]]:c,"")

Remember SnakeEx? Good, because neither do I! Golfing suggestions welcome.

MATL, 80 bytes

Thanks to @LevelRiverSt for a correction


Input is as a 2D array of char, with rows separated by ;. The test cases in this format are

['Hel         ';'  l      rin';'  o,IAmASt g';'           S';'       !ekan']
['P  ngPu  Code ';'r  i  z  d  G ';'o  m  z  n  o ';'gram  lesA  lf']
['   ~ zyx tsr XWVUTSR';'   }|{ wvu q Y     Q';'!          p Z `ab P';'"#$ 6789:; o [ _ c O';'  % 5    < n \]^ d N';'(''& 432  = m     e M';')     1  > lkjihgf L';'*+,-./0  ?         K';'         @ABCDEFGHIJ']
['  tSyrep    ';'  r    p    ';'  in Sli    ';'   g    Sile';'   Snakes  n';'Ser      ylt';'a eh   ilS  ';'fe w   t    ';'   emo h    ';'     Sre    ']

Try it online!


The coordinates of each nonspace character is represented by a complex number. For each current character, the next is obtained as that which is closest (minimum absolute difference of their complex coordinates).

To determine the initial char, the two endpoints need to be found. This is done as follows. An endpoint is a nonspace char that has exactly one nonspace neighbour. The number of neighbours is obtained by 2D convolution with a suitable mask. The initial point is the endpoint whose complex coordinate has the least sum of real and imaginary parts; i.e. is closest in Manhattan distance to the complex number 0, or equivalently to 1+1j, which is the complex coordinate of the upper left corner.

32>      % Take input as 2D char array. Compute 2D array with true for nonspace,
         % false for space
2#f      % Arrays of row and column indices of nonspaces
J*+      % Convert to complex array. Real part is row, imaginary part is column
X:       % Convert to column array
4Mt      % Push arrays of zeros and ones again. Duplicate
1Y6      % Push array [0 1 0; 1 0 1; 0 1 0]. Used as convolution mask to detect
         % neighbours that are nonspace
Z+       % 2D convolution with same size as the input
1=       % True for chars that have only one neighbour (endpoints)
*        % Multiply (logical and): we want nonspaces that are endpoints
2#fJ*+   % Find complex coordinates (as before)
ttXjwYj  % Duplicate. Take real and imaginary parts
+        % Add: gives Manhattan distance to (0,0)
K#X<     % Arg min. Entry with minimum absolute value has least Manhattan
         % distance to (0,0), and thus to (1,1) (top left corner)
)        % Apply as an index, to yield complex coordinate of initial endpoint
wt!      % Swap, duplicate, transpose.
         % The stack contains, bottom to top: complex coordinates of initial
         % endpoint, column array with all complex coordinates, row array of all
         % coordinates. The latter is used (and consumed) by the next "for"
         % statement to generate that many iterations
"        % For loop. Number of iterations is number of nonspaces
  tbb    %   Duplicate, bubble up twice (rearrange is stack)
  6#Yk   %   Find which of the remaining points is closest to current point. This
         %   is the next char in the string
  2#)    %   Remove the point with that index from the array of complex
         %   coordinates. Push that point and the rest of the array
  yw     %   Duplicate extracted point, swap
]        % End for
xx       % Delete top two elements of the stack
v        % Concatenate all stack contents into a column array. This array
         % contains the complex coordinates of chars sorted to form the string
tXjwYj   % Extract real part and imaginary part
GZy1)    % Number of rows of input. Needed to convert to linear index
*+       % Convert rows and columns to linear index
Gw       % Push input below that
)        % Index to get the original chars with the computed order
1e       % Force the result to be a row array (string). Implicitly display

C 198 190 179 180 181 bytes

Edit: Used the tip by user81655 and removed the parenthesis in the ternary operator, thanks! I also changed the cumbersome (S&1) test for evenness for the more appropriate (and shorter!) S%2.

Edit2: Using the *a addressing style heavily, made me blind to the obvious optimizations in the definition of S, i.e., replacing *(a+m) by a[m] etc. I then replaced S itself by T, which essentially does half of what S does. The code also now takes advantage of the return value of putchar.

Edit3: Fixed bug that has been around from the beginning, the Manhattan-search stopping criteria a < b+m is correct only if a has already been decremented. This adds 2 bytes, but one is regained by making the definition of m global.

Edit4: My golfing has passed the minimum and is going the wrong way now. Another bug fix related to the Manhattan-search. I originally had in-bound checks in place and without those the search continues for large input arrays (somewhere round 50x50) beyond the array b. Hence that array has to be expanded to at least twice the previous size, which adds one more byte.

#define T(x)+x*((a[x]>32)-(a[-x]>32))
m=103;main(){char b[m<<8],*e=b,*a=e+m;while(gets(e+=m));for(e=a;(T(1)T(m))%2**a<33;a=a<b+m?e+=m:a)a-=m-1;for(;*a;a+=T(1)T(m))*a-=putchar(*a);}

Ungolfed and explained:

/* T(1)T(m) (formerly S) is used in two ways (we implicitly assume that each cell has
   one or two neighbors to begin with):
   1. (in the first for-loop) T(1)T(m) returns an odd (even) number if cell a has one (two)
      neighbors with ASCII value > 32. For this to work m must be odd.
   2. (in the second for-loop) at this stage each cell a in the array at which T(1)T(m) is
      evaluated has at most one neighboring cell with ASCII value > 32. T(1)T(m) returns the
      offset in memory to reach this cell from a or 0 if there is no such cell.
      Putting the + in front of x here saves one byte (one + here replaces two
      + in the main part)

#define T(x)+x*((a[x]>32)-(a[-x]>32))

  /* A snake of length 100 together with the newlines (replaced by 0:s in memory) fits
     an array of size 100x101, but to avoid having to perform out-of-bounds checks we
     want to have an empty line above and the array width amount of empty lines below
     the input array. Hence an array of size 202x101 would suffice. However, we save
     a (few) bytes if we express the array size as m<<8, and since m must be odd
     (see 1. above), we put m = 103. Here b and e point to the beginning of the (now)
     256x103 array and a points to the beginning of the input array therein */


  char b[m<<8],*e=b,*a=e+m;

  /* This reads the input array into the 256x103 array */


  /* Here we traverse the cells in the input array one
     constant-Manhattan-distance-from-top-left diagonal at a time starting at the top-left
     singleton. Each diagonal is traversed from bottom-left to top-right since the starting point
     (memory location e) is easily obtained by moving one line downwards for each diagonal
     (+m) and the stopping point is found by comparing the present location a to the input array
     starting position (b+m). The traversal is continued as long as the cell has either
     an ASCII value < 33 or it has two neighbors with ASCII value > 32 each (T(1)T(m)
     is even so that (T(1)T(m))%2=0).
     Note that the pointer e will for wide input arrays stray way outside (below) the
     input array itself, so that for a 100 cell wide (the maximum width) input array
     with only two occupied cells in the bottom-right corner, the starting cell
     will be discovered 98 lines below the bottom line of the input array.
     Note also that in these cases the traversal of the diagonals will pass through the
     right-hand side of the 103-wide array and enter on the left-hand side. This, however,
     is not a problem since the cells that the traversal then passes have a lower
     Manhattan distance and have thereby already been analyzed and found not to contain
     the starting cell. */


  /* We traverse the snake and output each character as we find them, beginning at the
     previously found starting point. Here we utilize the function T(1)T(m), which
     gives the offset to the next cell in the snake (or 0 if none), provided that
     the current cell has at most one neighbor. This is automatically true for the
     first cell in the snake, and to ensure it for the rest of the cells we put the
     ASCII value of the current cell to 0 (*a-=putchar(*a)), where we use the fact
     that putchar returns its argument. The value 0 is convenient, since it makes the
     stopping condition (offset = 0, we stay in place) easy to test for (*a == 0). */