Constructing transition probability matrix

You can use the package CrossTabulate.m. (More detailed references are given in this MSE answer.)

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/CrossTabulate.m"]

cmat = CrossTabulate[Partition[x, 2, 1]];

MatrixForm[cmat]

enter image description here

cmat["SparseMatrix"] = cmat["SparseMatrix"]/Total[cmat["SparseMatrix"], {2}];
MatrixForm[cmat]

enter image description here

ArrayRules[cmat["SparseMatrix"]]

(* {{1, 1} -> 5/9, {1, 5} -> 1/9, {1, 4} -> 2/9, {1, 3} -> 1/
  9, {2, 5} -> 2/3, {2, 1} -> 1/3, {3, 2} -> 1/5, {3, 1} -> 1/
  5, {3, 3} -> 2/5, {3, 4} -> 1/5, {4, 4} -> 3/8, {4, 3} -> 1/
  8, {4, 2} -> 1/4, {4, 1} -> 1/8, {4, 5} -> 1/8, {5, 4} -> 2/
  5, {5, 5} -> 2/5, {5, 3} -> 1/5, {_, _} -> 0} *)

You can use SparseArray with additive assembly as follows:

x = RandomChoice[Alphabet["English", "IndexCharacters"], 1000000];
data = Flatten[ToCharacterCode[x]] - (ToCharacterCode["A"][[1]] - 1); // AbsoluteTiming // First
A = With[{
 spopt = SystemOptions["SparseArrayOptions"]},
  Internal`WithLocalSettings[
   (*switch to additive assembly*)
   SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> Total}],
   
   (*assemble matrix*)
   SparseArray[
    Partition[data, 2, 1] -> 1, 
    Max[data] {1, 1}
    ]
   
   , 
   (*reset "SparseArrayOptions" to previous value*)
   SetSystemOptions[spopt]]
  ]; // AbsoluteTiming // First

0.739454

0.114682

Remarks

  • As the timings suggest, it is worthwhile to avoid strings in the first place.

  • Formerly, I used LetterNumber, but ToCharacterCode is much, much faster.

  • It is "TreatRepeatedEntries" -> Total which enables summing of entries. Count is not needed anymore. Developer`ToPackedArray might speed up things a bit if x is very long. The other hokus-pokus is for making things bulletproof against aborts (i.e., options are reset even if computations are interrupted). See also (37566) and (136017).


You can also use EstimatedProcess and MarkovProcessProperties as follows:

states = DeleteDuplicates[x];
ordering = Ordering[states]; 
data = ArrayComponents @ x ;
estproc = EstimatedProcess[data, DiscreteMarkovProcess[Length@states]];
tm = MarkovProcessProperties[estproc, "TransitionMatrix"][[ordering, ordering]] 

TeXForm[TableForm[tm, TableHeadings -> {states[[ordering]], states[[ordering]]}]]

$\begin{array}{cccccc} & \text{A} & \text{B} & \text{C} & \text{D} & \text{E} \\ \text{A} & \frac{5}{9} & 0 & \frac{1}{9} & \frac{2}{9} & \frac{1}{9} \\ \text{B} & \frac{1}{3} & 0 & 0 & 0 & \frac{2}{3} \\ \text{C} & \frac{1}{5} & \frac{1}{5} & \frac{2}{5} & \frac{1}{5} & 0 \\ \text{D} & \frac{1}{8} & \frac{1}{4} & \frac{1}{8} & \frac{3}{8} & \frac{1}{8} \\ \text{E} & 0 & 0 & \frac{1}{5} & \frac{2}{5} & \frac{2}{5} \\ \end{array}$