How do you go about formalizing a concept?

There is a wide scientific literature about the use of formal systems to describe human behaviour. In most cases, these formalization techniques have been applied to interactive systems, e.g. to understand the causes of human errors in order to predict their occurrence in human reliability analyses, to explore human-automation interaction and identify mechanisms of task failure by temporal logic, to design systems that favour safe human work in complex, safety-critical systems, and so on.

The idea of applying Hofstadter's system to formalize human behaviour is intriguing. The content of his formal system is substantially typographical in nature, since it is characterized by strings of symbols which lack any inherent meaning. On the other hand, looking at the system from outside, we can establish one-to-one correspondences with real or abstract elements (Hofstadter calls them "interpretations"), incuding those necessary to model human interactions. Hofstadter worked on this interesting field in the years following the publication of GEB, and carried out several studies to apply his theories to human mind and behaviour. For your specific question, I would strongly suggest you to read a book published in 1995 by him and other members of the Fluid Analogies Research Group, entitled "Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought". This is a very interesting collection of papers dedicated to the use of formal systems and computer modeling to describe human mind dynamics. In particular, you could focus on chapter 5 (The Copycat Project: A Model of Mental Fluidity and Analogy-making), where a description of the architecture of the so-called "Copycat" program is provided. Although the final objective of this program (creating a sort of artificial intelligence) is considerably more ambitious and challenging than that of simple formalization, this chapter deals with several important concepts that characterize human cognition processes/ interactions and that must be taken into account to reproduce them in formal models, such as the concept of "analogy making" and that of "parallel terraced scan" (see below). Also, the paper illustrates how any computer model that aims at describing human mind dynamics should include three elements to capture the parallel and random nature of human cognition: a slipnet (a network composed of nodes and links representing permanent concepts), a workspace (an operative working area), and a coderack (set of codelets that can modulate activations in the slipnet and build structures in the workspace).

Taking into account all these considerations, a good method to apply Hofstadter system to human interaction could be to follow these sequential steps:

1 - Define your alphabet, which means the set of symbols that you think appropriate for your purpose. For example, you could decide to choose as possible symbols all alphabet letters from $A$ to $K$, all digits from $0$ to $7$, all capital greek letters, and other symbols (e.g. taken by Hofstadter propositional calculus, such as $\sim$ or $'$, or any other that you believe appropriate);

2 - Define your syntax, that is to say a set of "formation rules" or limitations on what is allowed and what is not allowed in a string (e. g., rules may be "any letter must be followed by an even digit", "two consecutive equal symbols are not allowed", and so on);

3- Moving yourself outside your formal system, attribute a meaningful one-to-one correspondence (e.g. an interpretation) to any element of the system. In this specific case, we could interpret symbols as topics of conversation: for example, we could attribute to the symbol $A$ the topic of weather, to $B$ that of politics, to $C$ that of taxes, to $\alpha$ that of jokes, and so on. In this phase, you could also create subsets grouping topics by categories on the basis of their similarities, and assigning symbols accordingly to facilitate successive analyses (e.g, using $\alpha'$ for politic jokes, $\alpha''$ for professional humour, $\alpha'''$ for toilet humour, and so on);

4 - Decide your "production rules": this is a set of rules that define/predict how strings are progressively built. These rules have to be defined by taking into account the interpretations of each symbol/string (note the difference with the formation rules above, which only state whether a string is allowed or not). The application of these rules should directly allow to obtain a formalization of human interaction. This is the last step and surely the most difficult one: the complexity of rules needed to describe human interactions is largely higher than that shown, for example, in the famous three-symbol "MIU" system used by Hofstadter to illustrate production rules. For this step, I would suggest you to take into account the above mentioned concepts of "analogy making" and "parallel terraced scan". The first one can be defined (using Hofstadter's words) as "the perception of two or more non-identical objects or situations as being the `same' at some abstract level". This is clearly a pivotal factor in determining the sequence of topics within a human interaction. The second one refers to another key aspect of human thoughts, interactions, and conversations, according to which the different ways/possibilities of asking a question, providing an answer or a comment, and changing the topic during an interaction are explored "in parallel" by the human mind in order to provide the most appropriate one. In other words, for each possibility, resources are allocated by human mind in real time according to some feedback about its current promise (whose estimation is updated continually as new information is obtained): these analyses are carried out simultaneously by the human mind, which finally makes its choice and determines the path of the interaction.

Extending these concepts to the formalization of a possible sequence of topics within a human interaction would therefore imply to define a precise sequence of rules that takes into account a number of issues, including similarities between topics, appropriateness and "convenience" of topic changes, individual experience, randomness (another important element of human cognition), and so on. A good way to achieve the last step of our formalization could be to identify a set of functions that predict the probability of each possibility of topic starting, topic changing, and discussion stopping by taking into account those issues. Since probabilities clearly vary among individuals, these functions should be determined by starting from the characteristics of the two subjects involved in the interaction. In this regard, I would suggest you to start by drawing a model similar to those typically used in structural equation modeling (SEM). This is a statistical technique often used in psychological studies, where the relationships between a set of measurable variables (called "manifest" variables) and another set of non-measurable variables (called "latent" variables) are explored by taking into account all possible interrelations between factors (it can be seen as a very general regression analysis). This multiple relationship structure is then graphed using boxes, arrows, double arrows, etc..., and the strength of each association is calculated using specific parameters. In our case, the manifest variables are the characteristics of the two individuals (age, gender, cultural level, socioeconomic class, etc), whereas the latent variables are the probabilities of starting, changing, or stopping any given topic during the interaction. If you have quantitative observational data (i.e., a sufficient number of observations for which the characteristics of the individuals and the topic path are known), you could directly infer a SEM model that includes quantitative parameters. Alternatively, if observational data are not available, you might draw a hypothesis of your model (as typically made in the initial phases of confirmatory analysis) and then try to estimate the strength of relationships on the basis of your experience and previous studies (in this second case, these estimates should be successively validated in a future observational study). In both cases, the resulting model can be used to generate a set of functions that express the probability of starting, changing, and stopping for each topic, given the characteristics of the two individuals involved. As stated above, this function network should also include some component of randomness (there are a number of studies on unpredictability and indeterminism in human behaviour and interactions). Once these functions are defined, their application to our predefined symbols, syntax, and interpretations can directly provide a formalization of that specific human interaction in the Hofstadter style. Also note that, increasing the complexity of these functions and appropriately choosing symbols, syntax, and interpretations, you could also include in the formal description of the sequence of topics additional features of the interaction (for example, duration of each topic, individual that starts/changes/stops each topic, sequence of verbal interventions, length of conversation, and so on).