Who is this chatbot?

Java

New Edit: Still more tweaking in spare time. I've started a new branch, where I've been playing with the DFS algorithm. Officially the branch is to act as core to a new BFS algorithm I'm planning, but in the meantime I wanted to get a better handle on what the DFS is doing, and how it's making its decisions. To that end, I've added a suppression function that begins to decay the value of a new word, regardless of topic-ness or not, as sentences grow longer. As well, all words contribute value to the sentence now, but words that aren't in the topic or sentence topic list contribute just 25% of their frequency value. An example conversation can be found here and it's quite good, where we talk about physics, the human nature of Chatbrains, and other fascinating topics. Check out the branch code here.

Edit: I've been tweaking the code a bit. Instead of posting the revisions here, check it out at my github repository where you will find the latest revisions. I've also added a new conversation against the most recent version, where we discuss chatbots, depth first search, and how programming should be used to build living things!.

I decided to embrace this challenge holistically. My chatbot knows very few things starting out -- no words, no syntax, no nothing. It knows how to parse standard English into words, and how to identify non-word characters as punctuation. That's it. Everything it knows it learns from interaction with the user. As you interact with it, it pays attention to the connections between words, and constructs sentences using that information. Of course, reference the source for more information. I've greatly exceeded the recommended length-of-program expectations of this challenge, but to a good purpose. Here are some highlights of the program:

  • Chatbot starts with no knowledge (follows "Rules":3)
  • Frequency of word occurrence is tracked
  • Word frequencies are "decayed" so that conversation can move from topic to topic (follows "Bonus":3 and 4)
  • The arrangement of words in observed sentences is recorded, so "phrases" are implicitly kept track of (e.g. if you use a lot of prepositional phrases when chatting with the bot, the bot will use a lot of them too!)
  • Sentences are built by preferring to follow most frequently observed connections between words, with random factors to inject variation
  • The sentence construction algorithm is a Depth First Search, that attempts to maximize occurrence of topic words in the output sentence, with a small preference for ending sentences (this follows "Bonus":1 -- I use a pretty damn cool learning algorithm, that shifts over time and retains knowledge of harvested word connections)
    • edit: Topic words are now drawn from both global knowledge of reoccurring words, and from the most recent sentence
    • edit: Words weights are now computed using log base 4 of the word length, so longer words are weighted more strongly, and shorter words, more weakly -- this is to make up for the lack of a true corpus to use in both weighting and eliminating high-frequency, low value words as one can easily do with a corpus.
    • edit: As the sentence length grows during construction, a suppression function begins to decrease the value of additional words.
    • edit: Sentence "ending" is less valuable now, as it was causing a preponderance of short silly sentences.
    • edit: All words now contribute value, although off-topic words only contribute at 25% of global frequency value.
    • There is a built-in depth maximum to prevent too much looping and too much time spent because of my use of word precedent to build a sentence
    • Loops are detected directly while building a sentence, and while they are technically allowed, there is a high chance that loops will be avoided
    • Tune-able timeout is used to encourage both branch pruning and statement finalization, and also to prevent going past the 5-10 second "acceptable delay" in the rules

To summarize my connection to the rules:

  • For "Rules":1, I chose Java, which is verbose, so be gentle.
  • For "Rules":2, user input alone is leveraged, although I have some stub code to add brain saving/loading for the future
  • For "Rules":3, there is absolutely no pre-set vocabulary. The ChatBot knows how to parse English, but that's it. Starting out, it knows absolutely nothing.
  • For "Mandatory Criteria":1, my program is longer, but packs a lot of awesome. I hope you'll overlook.
  • For "Mandatory Criteria":2, I have a timeout on my sentence construction algorithm to explicitly prevent more than 5-6 seconds search time. The best sentence so far is returned on timeout.
  • For "Mandatory Criteria":3, Topics generally solidify in about 10 sentences, so the Bot will be on-topic by then, and by 20 sentences will be responding to statements with some fascinating random constructs that actually make a bit of sense.
  • For "Mandatory Criteria":4, I borrowed nothing from the reference code. This is an entirely unique construction.
  • For "Bonus":1, I like to think this bot is quite exceptional. It won't be as convincing as scripted bots, but it has absolutely no limitations on topics, and will move gracefully (with persistence) from conversation topic to topic.
  • For "Bonus":2, this is strictly round-robin, so no bonus here. Yet. There's no requirement within my algorithm for response, so I'm planning a Threaded version that will address this bonus.
  • For "Bonus":3, initially this bot will mimic, but as the conversation progresses beyond the first few sentences, mimicing will clearly end.
  • For "Bonus":4, "moods" aren't processed in any meaningful way, but as the bot preferences topic following, it will shift moods.
  • For "Bonus":5, saving and loading brain is not currently in place.

So, I've met all base rules, all mandatory rules, and provisionally bonus rules 1, 3, and 4.

As another bonus, I've commented all throughout the code, so feel free to borrow or make recommendations for improvements. Clearly, as I have no built-in dialogue and no "structural" knowledge, conversations will be weird for longer then other bots, but I think I meet the rules pretty well.

Now, on to the code (Some comments redacted to fit body limit) or follow it on GitHub, as I continue to improve it:

import java.util.*;
import java.util.regex.*;

public class LearningChatbot {
    /**
     * Static definition of final word in a statement. It never has 
     * any descendents, and concludes all statements. This is the only
     * "starting knowledge" granted the bot.
     */
    public static final ChatWord ENDWORD = new ChatWord("\n");

    /**
     * The Brain of this operation.
     */
    private ChatbotBrain brain;

    /**
     * Starts LearningChatbot with a new brain
     */
    public LearningChatbot() {
        brain = new ChatbotBrain();
    }

    /**
     * Starts LearningChatbot with restored brain.
     */
    public LearningChatbot(String filename) {
        throw new UnsupportedOperationException("Not yet implemented");
    }

    /**
     * Invocation method.
     */
    public void beginConversation() {
        ChatbotBrain cb = new ChatbotBrain();

        Scanner dialog = new Scanner(System.in);

        boolean more = true;

        while (more) {
            System.out.print("    You? ");
            String input = dialog.nextLine();

            if (input.equals("++done")) {
                System.exit(0);
            } else if (input.equals("++save")) {
                System.out.println("Saving not yet implemented, sorry!");
                System.exit(0);
            } else if (input.equals("++help")) {
                getHelp();
            }else {
                cb.decay();
                cb.digestSentence(input);
            }

            System.out.print("Chatbot? ");
            System.out.println(cb.buildSentence());
        }
    }

    /**
     * Help display
     */
    public static void getHelp() {
        System.out.println("At any time during the conversation, type");
        System.out.println("   ++done");
        System.out.println("to exit without saving.");
        System.out.println("Or type");
        System.out.println("   ++save");
        System.out.println("to exit and save the brain.");
        System.out.println();
    }

    /**
     * Get things started.
     */
    public static void main(String[] args) {
        System.out.println("Welcome to the Learning Chatbot");
        System.out.println();
        getHelp();

        LearningChatbot lc = null;
        if (args.length > 0) {
            System.out.printf("Using %s as brain file, if possible.", args[0]);
            lc = new LearningChatbot(args[0]);
        } else {
            lc = new LearningChatbot();
        }
        lc.beginConversation();
    }

    /**
     * The ChatbotBrain holds references to all ChatWords and has various
     * methods to decompose and reconstruct sentences.
     */
    static class ChatbotBrain {
        /**
         * A tracking of all observed words. Keyed by the String version of
         * the ChatWord, to allow uniqueness across all ChatWords
         */
        private Map<String,ChatWord> observedWords;

        /**
         * This brain is going to be able to keep track of "topics" by way of
         * a word frequency map. That way, it can generate sentences based
         * on topic-appropriateness.
         */
        private Map<ChatWord, Double> wordFrequencyLookup;

        /**
         * This holds the actual word frequencies, for quick isolation of
         * highest frequency words.
         */
        private NavigableMap<Double, Collection<ChatWord>> wordFrequency;

        /**
         * This holds the count of words observed total.
         */
        private int wordCount;

        /**
         * This holds the current "values" of all words.
         */
        private double wordValues;

        /**
         * A "word" that is arbitrarily the start of every sentence
         */
        private ChatWord startWord;

        /**
         * Rate of decay of "topics".
         */
        private double decayRate;

        // These values configure various features of the recursive 
        // sentence construction algorithm.
        /** Nominal (target) length of sentences */
        public static final int NOMINAL_LENGTH = 10;
        /** Max length of sentences */
        public static final int MAX_LENGTH = 25;
        /** Sentence creation timeout */
        public static final long TIMEOUT = 5000;
        /** Topic words to match against */
        public static final int TOPICS = 3;
        /** Minimum branches to consider for each word */
        public static final int MIN_BRANCHES = 3;
        /** Maximum branches to consider for each word */
        public static final int MAX_BRANCHES = 5;
        /** % chance as integer out of 100 to skip a word */
        public static final int SKIP_CHANCE = 20;
        /** % chance as integer to skip a word that would cause a loop */
        public static final int LOOP_CHANCE = 5;
        /** % chance that punctuation will happen at all */
        public static final int PUNCTUATION_CHANCE = 25;
        /** % chance that a particular punctuation will be skipped */
        public static final int PUNCTUATION_SKIP_CHANCE = 40;

        /**
         * Convenience parameter to use a common random source 
         * throughout the brain.
         */
        private Random random;

        /**
         * Gets the Chatbot started, sets up data structures necessary
         */
        public ChatbotBrain() {
            observedWords = new HashMap<String,ChatWord>();
            observedWords.put("\n",ENDWORD);
            startWord = new ChatWord("");
            observedWords.put("",startWord);

            wordFrequencyLookup = new HashMap<ChatWord, Double>();
            wordFrequency = new TreeMap<Double, Collection<ChatWord>>();
            decayRate = 0.05;
            wordCount = 0;
            wordValues = 0.0;
            random = new Random();
        }

        /**
         * More complex digest method (second edition) that takes a sentence,
         * cuts it pu, and links up the words based on ordering.
         */
        public void digestSentence(String sentence) {
            Scanner scan = new Scanner(sentence);

            ChatWord prior = null;
            ChatWord current = null;
            String currentStr = null;
            String currentPnc = null;
            while (scan.hasNext()) {
                currentStr = scan.next();
                Pattern wordAndPunctuation = 
                        Pattern.compile("([a-zA-Z\\-_'0-9]+)([^a-zA-Z\\-_'0-9]?)[^a-zA-Z\\-_'0-9]*?");
                Matcher findWords = wordAndPunctuation.matcher(currentStr);
                //  Basically this lets us find words-in-word typos like this:
                //  So,bob left his clothes with me again.
                //  where "So,bob" becomes "So," "bob"
                while (findWords.find()) {
                    currentStr = findWords.group(1);
                    currentPnc = findWords.group(2);
                    if (currentStr != null) {
                        if (observedWords.containsKey(currentStr)) {
                            current = observedWords.get(currentStr);
                        } else {
                            current = new ChatWord(currentStr);
                            observedWords.put(currentStr, current);
                        }

                        incrementWord(current);

                        if (currentPnc != null && !currentPnc.equals("")) {
                            current.addPunctuation(currentPnc.charAt(0));
                        }

                        if (prior != null) {
                            prior.addDescendent(current);
                        }
                        if (prior == null) {
                            startWord.addDescendent(current);
                        }

                        prior = current;
                    }
                }
            }
            if (prior != null) { // finalize.
                prior.addDescendent(ENDWORD);
            }
        }

        /**
         * Increments the value of a word (catalogues a new sighting).
         */
        public void incrementWord(ChatWord word) {
            Double curValue;
            Double nextValue;
            Collection<ChatWord> freqMap;
            if (wordFrequencyLookup.containsKey(word)) {
                curValue = wordFrequencyLookup.get(word);
                freqMap = wordFrequency.get(curValue);
                freqMap.remove(word);
            } else {
                curValue = 0.0;
            }
            nextValue=curValue+1.0;
            wordFrequencyLookup.put(word, nextValue);

            freqMap = wordFrequency.get(nextValue);
            if (freqMap == null) {
                freqMap = new HashSet<ChatWord>();
                wordFrequency.put(nextValue, freqMap);
            }

            freqMap.add(word);
            wordCount++;
            wordValues++;
        }

        /**
         * Decays a particular word by decay rate.
         */
        public void decayWord(ChatWord word) {
            Double curValue;
            Double nextValue;
            Collection<ChatWord> freqMap;
            if (wordFrequencyLookup.containsKey(word)) {
                curValue = wordFrequencyLookup.get(word);
                freqMap = wordFrequency.get(curValue);
                freqMap.remove(word);
            } else {
                return;
            }
            wordValues-=curValue; // remove old decay value
            nextValue=curValue-(curValue*decayRate);
            wordValues+=nextValue; // add new decay value
            wordFrequencyLookup.put(word, nextValue);

            freqMap = wordFrequency.get(nextValue);
            if (freqMap == null) {
                freqMap = new HashSet<ChatWord>();
                wordFrequency.put(nextValue, freqMap);
            }

            freqMap.add(word);
        }

        /**
         * Decay all word's frequency values. 
         */
        public void decay() {
            for (ChatWord cw : wordFrequencyLookup.keySet()) {
                decayWord(cw);
            }
        }

        /**
         * Gets a set of words that appear to be "top" of the frequency
         * list.
         */
        public Set<ChatWord> topicWords(int maxTopics) {
            Set<ChatWord> topics = new HashSet<ChatWord>();

            int nTopics = 0;
            for (Double weight: wordFrequency.descendingKeySet()) {
                for (ChatWord word: wordFrequency.get(weight)) {
                    topics.add(word);
                    nTopics++;
                    if (nTopics == maxTopics) {
                        return topics;
                    }
                }
            }
            return topics;
        }

        /**
         * Uses word frequency records to prefer to build on-topic
         * sentences.
         */
        public String buildSentence() {
            int maxDepth = NOMINAL_LENGTH+
                    random.nextInt(MAX_LENGTH - NOMINAL_LENGTH);
            ChatSentence cs = new ChatSentence(startWord);
            // We don't want to take too long to "think of an answer"
            long timeout = System.currentTimeMillis() + TIMEOUT;
            double bestValue = buildSentence(cs, topicWords(TOPICS), 0.0, 0, maxDepth, timeout);
            return cs.toString();
        }

        public double buildSentence(ChatSentence sentence, 
                Set<ChatWord> topics, double curValue,
                int curDepth, int maxDepth, long timeout){
            if (curDepth==maxDepth || System.currentTimeMillis() > timeout) {
                return curValue;
            }
            // Determine how many branches to enter from this node
            int maxBranches = MIN_BRANCHES + random.nextInt(MAX_BRANCHES - MIN_BRANCHES);
            // try a few "best" words from ChatWord's descendent list.
            ChatWord word = sentence.getLastWord();
            NavigableMap<Integer, Collection<ChatWord>> roots =
                    word.getDescendents();
            // Going to keep track of current best encountered sentence
            double bestSentenceValue = curValue;
            ChatSentence bestSentence = null;
            int curBranches = 0;
            for (Integer freq : roots.descendingKeySet()) {
                for (ChatWord curWord : roots.get(freq)) {
                    if (curWord.equals(ENDWORD)) {
                        // let's weigh the endword cleverly
                        double endValue = random.nextDouble() * wordFrequency.lastKey();

                        if (curValue+endValue > bestSentenceValue) {
                            bestSentenceValue = curValue+endValue;
                            bestSentence = new ChatSentence(sentence);
                            bestSentence.addWord(curWord);
                        }
                        curBranches++;
                    } else {
                        int chance = random.nextInt(100);
                        boolean loop = sentence.hasWord(curWord);
                        /* Include a little bit of chance in the inclusion of
                         * any given word, whether a loop or not.*/
                        if ( (!loop&&chance>=SKIP_CHANCE) ||
                                (loop&&chance<LOOP_CHANCE)) {
                            double wordValue = topics.contains(curWord)?
                                    wordFrequencyLookup.get(curWord):0.0;
                            ChatSentence branchSentence = new ChatSentence(sentence);
                            branchSentence.addWord(curWord);
                            addPunctuation(branchSentence);
                            double branchValue = buildSentence(branchSentence,
                                    topics, curValue+wordValue, curDepth+1,
                                    maxDepth, timeout);
                            if (branchValue > bestSentenceValue) {
                                bestSentenceValue = branchValue;
                                bestSentence = branchSentence;
                            }
                            curBranches++;
                        }
                    }
                    if (curBranches == maxBranches) break;
                }
                if (curBranches == maxBranches) break;
            }
            if (bestSentence != null) {
                sentence.replaceSentence(bestSentence);
            }
            return bestSentenceValue;
        }

        /**
         * Adds punctuation to a sentence, potentially.
         */
        public void addPunctuation(ChatSentence sentence) {
            ChatWord word = sentence.getLastWord();
            NavigableMap<Integer, Collection<Character>> punc = word.getPunctuation();
            if (punc.size()>0 && random.nextInt(100)<PUNCTUATION_CHANCE){
                Integer puncMax = punc.lastKey();
                Collection<Character> bestPunc = punc.get(puncMax);
                Character puncPick = null;
                for (Integer freq : punc.descendingKeySet()) {
                    for (Character curPunc : punc.get(freq)) {
                            if (random.nextInt(100)>=PUNCTUATION_SKIP_CHANCE) {
                                puncPick = curPunc;
                                break;
                            }
                    }
                    if (puncPick != null) break;
                }
                if (puncPick != null) {
                    sentence.addCharacter(puncPick);
                }
            }
        }

        @Override
        public String toString() {
            StringBuilder sb = new StringBuilder();
            sb.append("ChatBrain[");
            sb.append(observedWords.size());
            sb.append("]:");
            for (Map.Entry<String,ChatWord> cw : observedWords.entrySet()) {
                sb.append("\n\t");
                sb.append(wordFrequencyLookup.get(cw.getValue()));
                sb.append("\t");
                sb.append(cw.getValue());
            }
            return sb.toString();
        }

    }

    /**
     * Useful helper class to construct sentences.
     */
    static class ChatSentence implements Cloneable {
        /**
         * List of words.
         */
        private List<Object> words;
        /**
         * Quick search construct to have O(ln) lookup times.
         */
        private Set<Object> contains;

        /**
         * Starts to build a sentence with a single word as anchor
         */
        public ChatSentence(ChatWord anchor) {
            if (anchor == null) {
                throw new IllegalArgumentException("Anchor must not be null");
            }
            words = new ArrayList<Object>();
            contains = new HashSet<Object>();
            words.add(anchor);
            contains.add(anchor);
        }

        /** 
         * Starts a sentence using an existing ChatSentence. Also used for
         * cloning.
         */
        public ChatSentence(ChatSentence src) {
            words = new ArrayList<Object>();
            contains = new HashSet<Object>();
            appendSentence(src);
        }

        /**
         * Adds a word to a sentence
         */
        public ChatSentence addWord(ChatWord word) {
            if (word == null) {
                throw new IllegalArgumentException("Can't add null word");
            }
            words.add(word);
            contains.add(word);
            return this;
        }

        /**
         * Adds a character to a sentence.
         */
        public ChatSentence addCharacter(Character punc) {
            if (punc == null) {
                throw new IllegalArgumentException("Can't add null punctuation");
            }
            words.add(punc);
            contains.add(punc);
            return this;
        }

        /**
         * Replace a sentence with some other sentence.
         * Useful to preserve references.
         */
        public ChatSentence replaceSentence(ChatSentence src) {
            words.clear();
            contains.clear();
            appendSentence(src);
            return this;
        }

        public ChatSentence appendSentence(ChatSentence src) {
            words.addAll(src.getWords());
            contains.addAll(src.getWords());
            return this;
        }

        /**
         * Get last word of the sentence.
         */
        public ChatWord getLastWord() {
            for (int i=words.size()-1; i>=0; i--) {
                if (words.get(i) instanceof ChatWord) {
                    return (ChatWord) words.get(i);
                }
            }
            throw new IllegalStateException("No ChatWords found!");
        }

        /**
         * Checks if the sentence has a word
         */
        public boolean hasWord(ChatWord word) {
            return contains.contains(word);
        }

        /**
         * Counts the number of words in a sentence.
         */
        public int countWords() {
            int cnt = 0;
            for (Object o : words) {
                if (o instanceof ChatWord) {
                    cnt++;
                }
            }
            return cnt;
        }

        /**
         * Gets all the words of the sentence
         */
        private List<Object> getWords() {
            return words;
        }

        /**
         * Returns the sentence as a string.
         */
        @Override
        public String toString() {
            StringBuffer sb = new StringBuffer();
            for (Object o : words) {
                if (o instanceof ChatWord) {
                    ChatWord cw = (ChatWord) o;
                    sb.append(" ");
                    sb.append( cw.getWord() );
                } else {
                    sb.append(o);
                }
            }
            return sb.toString().trim();
        }

        /**
         * Clones this sentence.
         */
        @Override
        public Object clone() {
            return new ChatSentence(this);
        }
    }

    /**
     * ChatWord allows the creation of words that track how they are
     * connected to other words in a forward fashion. 
     */
    static class ChatWord {
        /** The word. */
        private String word;
        /** Collection of punctuation observed after this word */
        private NavigableMap<Integer, Collection<Character>> punctuation;
        /** Lookup linking observed punctuation to where they are in ordering */
        private Map<Character, Integer> punctuationLookup;
        /** Punctionation observation count */
        private Integer punctuationCount;

        /** Collection of ChatWords observed after this word */
        private NavigableMap<Integer, Collection<ChatWord>> firstOrder;
        /** Lookup linking observed words to where they are in ordering */
        private Map<ChatWord, Integer> firstOrderLookup;
        /** First order antecedent word count */
        private Integer firstOrderCount;

        /**
         * Creates a new ChatWord that is aware of punctuation that
         * follows it, and also ChatWords that follow it.
         */
        public ChatWord(String word){
            this.word = word;

            this.firstOrder = new TreeMap<Integer, Collection<ChatWord>>();
            this.firstOrderLookup = new HashMap<ChatWord, Integer>();
            this.firstOrderCount = 0;

            this.punctuation = new TreeMap<Integer, Collection<Character>>();
            this.punctuationLookup = new HashMap<Character, Integer>();
            this.punctuationCount = 0;
        }

        protected NavigableMap<Integer, Collection<ChatWord>> getDescendents() {
            return firstOrder;
        }

        /**
         * Returns how many descendents this word has seen.
         */
        protected int getDescendentCount() {
            return firstOrderCount;
        }

        /**
         * Gets the lookup map for descendents
         */
        protected Map<ChatWord, Integer> getDescendentsLookup() {
            return firstOrderLookup;
        }

        /** As conversation progresses, word orderings will be encountered.
         * The descendent style of "learning" basically weights how often
         * words are encountered together, and is strongly biased towards
         * encountered ordering.
         */
        public void addDescendent(ChatWord next) {
            if(next != null){
                firstOrderCount++;
                int nextCount = 1;
                Collection<ChatWord> obs = null;
                // If we've already seen this word, clean up prior membership.
                if(firstOrderLookup.containsKey(next)){
                    nextCount = firstOrderLookup.remove(next);
                    obs = firstOrder.get(nextCount);
                    // Remove from prior obs count order
                    obs.remove(next);
                    nextCount++;
                }
                obs = firstOrder.get(nextCount);
                if (obs == null) { // we don't have this order yet
                    obs = new HashSet<ChatWord>();
                    firstOrder.put(nextCount, obs);
                }
                firstOrderLookup.put(next, nextCount);
                obs.add(next);
            }
        }

        /**
         * Some words have punctuation after them more often than not. 
         * This allows the ChatBrain to record occurrences of punctuation
         * after a word.
         */
        public void addPunctuation(Character punc) {
            if(punc != null){
                punctuationCount++;
                int puncCount = 1;
                Collection<Character> obs = null;
                // If we've already seen this punc, clean up prior membership.
                if(punctuationLookup.containsKey(punc)){
                    puncCount = punctuationLookup.remove(punc);
                    obs = punctuation.get(puncCount);
                    // Remove from prior obs count order
                    obs.remove(punc);
                    puncCount++;
                }
                obs = punctuation.get(puncCount);
                if (obs == null) { // we don't have this order yet
                    obs = new HashSet<Character>();
                    punctuation.put(puncCount, obs);
                }
                punctuationLookup.put(punc, puncCount);
                obs.add(punc);
            }
        }

        /**
         * Including this for now, but I don't like it -- it returns all
         * punctuation wholesale. I think what would be better is some
         * function that returns punctuation based on some characteristic.
         */
        protected NavigableMap<Integer, Collection<Character>> getPunctuation() {
            return punctuation;
        }

        /**
         * Gets count of punctuation encountered.
         */
        protected int getPunctuationCount() {
            return punctuationCount;
        }

        /**
         * Gets lookup of punctuations encountered.
         */
        protected Map<Character, Integer> getPunctuationLookup() {
            return punctuationLookup;
        }

        /**
         * Gets the String backing this ChatWord.
         */
        public String getWord() {
            return word;
        }

        /**
         * ChatWords are equivalent with the String they wrap.
         */
        @Override
        public int hashCode() {
            return word.hashCode();
        }

        /**
         * ChatWord equality is that ChatWords that wrap the same String
         * are equal, and a ChatWord is equal to the String that it contains.
         */
        @Override
        public boolean equals(Object o){
            if (o == this) {
                return true;
            }
            if (o instanceof ChatWord) {
                return ((ChatWord)o).getWord().equals(this.getWord());
            }
            if (o instanceof String) {
                return ((String)o).equals(this.getWord());
            }

            return false;
        }

        /**
         * Returns this ChatWord as a String.
         */
        @Override
        public String toString() {
            StringBuilder sb = new StringBuilder();
            sb.append("ChatWord[");
            sb.append(word);
            sb.append("]desc{");
            for (Integer key : firstOrder.keySet() ) {
                Collection<ChatWord> value = firstOrder.get(key);
                sb.append(key);
                sb.append(":[");
                for (ChatWord cw : value) {
                    sb.append(cw.getWord());
                    sb.append(",");
                }
                sb.append("],");
            }
            sb.append("}punc{");
            for (Integer key : punctuation.keySet() ) {
                Collection<Character> value = punctuation.get(key);
                sb.append(key);
                sb.append(":[");
                for (Character c : value) {
                    sb.append("\"");
                    sb.append(c);
                    sb.append("\",");
                }
                sb.append("],");
            }
            sb.append("}");
            return sb.toString();
        }
    }
}

Sample conversation:

Linked b/c of post character limits

Conversation where the Bot tells me I should program living things

Latest conversation where the Bot talks about the true nature of Chatbrains, physics, the physical universe, and how I am most likely also a Chatbrain

and so on. I have a few things I'm going to add -- for instance, because of the commonality of simple words, they tend to dominate uncurated topic lists. I'm going to add in a percentage skip to topic words so that common words are skipped.


C++

Now I just need to write the algorithm for carrying on a conversation. My first machine-learning exercise.

#include <iostream>
#include <string>
#include <vector>
#include <ctime>
#include <fstream>
using namespace std;

int randint(int n) {return rand()%n;}

bool old_response(string r, vector<string>v){
  for(int i = 0; i < v.size(); i++) if(r == v[i]) return true;
  return false;
}

void output(vector<string>v) {cout<< "CHATBOT:\t" << v[randint(v.size())] << endl;}

void delay(int sum_response, int sum_time) {
  if(sum_response != 0) {
    int t = time(0);
    while(true)if(time(0) == t+sum_time/sum_response)break;
  }
}

int main() {

  string name = "";
  cout<<"Please enter your name: ";
  getline(cin, name);

  vector<string> responses;

  ifstream ifs("responses.txt");
  if(ifs) {
    do {
      string s = "";
      getline(ifs, s);
      if(ifs.eof()) break;
      responses.push_back(s);
    } while(!ifs.eof());
  }

  string response = "";
  getline(cin, response);
  int time1 = time(0);
  int sum_time = 0;
  int sum_response = 0;

  do {

    if(!old_response(response, responses)) responses.push_back(response);

    delay(sum_response, sum_time);
    output(responses);
    
    cout << name << ":\t";
    getline(cin, response);
    sum_time += time(0)-time1;
    sum_response++;
    time1 = time(0);

  } while(response != "goodbye");

  cout<<"goodbye, " << name;

  ofstream ofs("responses.txt", ios_base::app);
  for(int i = 0; i < responses.size(); i++) ofs << responses[i] << endl;

  return 0;
}

Edit:

All my tries were turning out ridiculous, so I think I will leave it this way. The others were just as ridiculous as this anyway:

enter image description here


C++

I aimed for optional bonus 3: "Less mimic, bot behaviour is different from user behaviour, separating the perception of the bot attitude from the user attitude.". The result was a really stubborn bot who can't switch the subject easy and drives you crazy.

It takes some time to get a discussion going, after a while a discussion can be like this:

 Mindless:   Why would I run? I am not scared of you.
      YOU:   Who are you and why do you keep asking stupid questions?

 Mindless:   I never said that you should run.
      YOU:   I never said I will run.

 Mindless:   I am the hidden, who are you?
      YOU:   I am god.

 Mindless:   I will
      YOU:   You will obey me.

 Mindless:   Why would I run? I am not scared of you.
      YOU:   You should be scared of me.

 Mindless:   I never said I will run.
      YOU:   Who said you will run?

 Mindless:   Who are you and why do you keep asking stupid questions?
      YOU:   I will not anymore.

 Mindless:   I never said that you should run.
      YOU:   I never said I will run.

 Mindless:   I am the hidden, who are you?
      YOU:   I can see you!

 Mindless:   Why would I run? I am not scared of you.
      YOU:   I am hunting you down

 Mindless:   You should be scared of me.
      YOU:   That is a lie.

 Mindless:   What can't you do?
      YOU:   Why can't I do what?

 Mindless:   You don't.
      YOU:   Well, you are talking nonsense.

 Mindless:   How do I take it seriously.
      YOU:   Stop talking to me

 Mindless:   Well, I can hide, but you can't.
      YOU:   This is not a funny conversation.

 Mindless:   Yes, I will. Stay on line.
      YOU:   Do you want to eat somewhere?

 Mindless:   I laugh
      YOU:   Are you laughing at me?

 Mindless:   Yes, you do.
      YOU:   Can I join you laghs?

 Mindless:   I can hide, but you can't.
      YOU:   Why would you hide? I asked if you want to eat?

 Mindless:   Well, I can see you!
      YOU:   Yes, and I see you. eat something maybe?

 Mindless:   I will not anymore.
      YOU:   Have you stopped eating?

 Mindless:   I never said I will run.
      YOU:   Well, I was talking about eating. Is dinner a good thing to do?

 Mindless:   I take it seriously.
      YOU:   Okay, take it seriously and follow me to the restaurant and get some food to eat.

 Mindless:   Have you stopped eating?
      YOU:   NO absolutely not! A am starving because of YOU who refuse to eat

 Mindless:   I never said I will run.
      YOU:   Well, we can walk to the restaurant instead.

 Mindless:   Well, I was talking about eating. Is dinner a good thing to do?
      YOU:   Yes, that is perfect. What are the best fried chickens around here?

 Mindless:   I laugh
      YOU:   You laugh at the chicken?

 Mindless:   Are you laughing at me?
      YOU:   No, I am eating. Chicken.

 Mindless:   You are a thinker?
      YOU:   No, I am a eater.

 Mindless:   Yes, I will. Stay on line.
      YOU:   Stay. On course. To the restaurant!

 Mindless:   What can't you do?
      YOU:   I can't wait for you anymore. You are really slow.

 Mindless:   You can try run, but you can't hide
      YOU:   I will run.

 Mindless:   I am hunting you down
      YOU:   Ha ha. Make sure you look at the restaurant first. I might be there eating.

 Mindless:   I never said I will run.
      YOU:   Okay, but when you get there, I may still be there eating. I feel like I can eat a ton.

 Mindless:   That is a lie.
      YOU:   No, it is true.

 Mindless:   Why can't I do what?
      YOU:   You can eat a ton too, if you ever get there. And if you have the money.

 Mindless:   I never said I will run.
      YOU:   Okej, suit yourself.

 Mindless:   That is a lie.
      YOU:   God

 Mindless:   You laugh at the chicken?
      YOU:   Yes, I laugh at everything. You are the most lazy stubborn person I have ever met in life.

 Mindless:   Stop talking to me
      YOU:   Okay, now you are really rude.

 Mindless:   This is not a funny conversation.
      YOU:   Okay, think about it. Try tell me something new.

 Mindless:   Stay. On course. To the restaurant!
      YOU:   Sure! I am already there staying on course eating the food.

 Mindless:   You can eat a ton too, if you ever get there. And if you have the money.
      YOU:   I have already paid for my food, and I will not share it with you.

 Mindless:   I can see you!
      YOU:   Oh god, what the hell is this?

 Mindless:   I am hunting you down
      YOU:   I will hide under the table.

 Mindless:   Yes, and I see you. eat something maybe?
      YOU:   Sure, have a chicken

 Mindless:   Have you stopped eating?
      YOU:   No, for the love of god, sit down and share my chicken

 Mindless:   I have already paid for my food, and I will not share it with you.
      YOU:   What? I paid for it, not you!

 Mindless:   I never said I will run.
      YOU:   Well, you better not run here in the restaurant.

 Mindless:   I am hunting you down
      YOU:   I am not going anywhere, chill out,

 Mindless:   I will hide under the table.
      YOU:   Why would you do something like that? I will not hurt you.

 Mindless:   Oh god, what the hell is this?
      YOU:   That is chicken!

 Mindless:   Okay, take it seriously and follow me to the restaurant and get some food to eat.
      YOU:   Wait a minute now! we ARE at the restaurant! Can't you see all people sitting at the tables eating?

 Mindless:   Well, we can walk to the
      YOU:   ... talk to the who?

 Mindless:   I never said I will not hurt you.
      YOU:   What the ?

 Mindless:   That is chicken!
      YOU:   Great! Have fun.

 Mindless:   Yes, you do.
      YOU:   You too.

 Mindless:   Stop talking to me
      YOU:   Okay.

 Mindless:   Why would I run? I am god.
      YOU:   Good night!

The approach is to store everything into groups of 3 connected words. Every group is weightened and re-weightened into a 1000 dimensional matrix of word-groups. Sourcecode:

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

using namespace std;
vector<string> words;

class c_wordgroup {
public:
    int                 sorting[1000];
    string              words[5];

    void resetsort(int ns)
    {
        for(int i = 0; i<1000;i++)
            sorting[i] = ns;
    }
};

class c_wglist {
public: int                 c;
     vector<c_wordgroup>    wg;
     vector<int>            historywg;

     

     int wgexists(c_wordgroup nwg)
     {
         vector<c_wordgroup>::iterator it;
         int cc = 0;
         for (it = wg.begin(); it != wg.end(); ++it) {

             // Check if words is the same
             if(it->words[0] == nwg.words[0])
                 if(it->words[1] == nwg.words[1])
                     if(it->words[2] == nwg.words[2])
                        return cc;
                        
             cc++;
         }
         return -1;
     }

     int getbestnext(int lastwg)
     {
         vector<c_wordgroup>::iterator  it;
         int cc = 0;
         
         int bv = -1;
         int bwg = 0;

         for (it = wg.begin(); it != wg.end(); ++it) {

             bool cont = false;
             for (int iti = 0; iti<((int)historywg.size()/50+5);iti++)
                 if((int)wgl.historywg.size()-1-iti>=0)
                    if (cc==wgl.historywg[(int)wgl.historywg.size()-1-iti])
                        cont = true;
                
             if(cont==true) {cc++;continue;};

             int cv = 100000000;

             // Check if words is the same
             if(it->words[0] == wgl.wg[lastwg].words[1])
             {
                 for(int si=0;si<1000;si++)
                     if ((int)wgl.historywg.size()-1-si>=0)
                     {
                            int tmpwg = wgl.historywg[(int)wgl.historywg.size()-1-si];
                            cv -= abs(it->sorting[si]-wgl.wg[tmpwg].sorting[si])/(si+1);
                     }
             } else cv -= 1000 * wgl.c/2;

             if(it->words[1] == wgl.wg[lastwg].words[2])
             {
                 for(int si=0;si<1000;si++)
                    if ((int)wgl.historywg.size()-1-si>=0)
                     {
                            int tmpwg = wgl.historywg[(int)wgl.historywg.size()-1-si];
                            cv -= abs(it->sorting[si]-wgl.wg[tmpwg].sorting[si])/(si+1);
                     }
             } else cv -= 1000 * wgl.c/2;
        
            if(bv == -1 || cv > bv)
            {
                bwg=cc;
                bv = cv;
            }
            cc++;
         }
         return bwg;
     }
} wgl;

void answer2() 
{
    vector<string> lastwords;
    lastwords.insert(lastwords.end(), words[words.size()-3]);
    lastwords.insert(lastwords.end(), words[words.size()-2]);
    lastwords.insert(lastwords.end(), words[words.size()-1]);

    int bestnextwg;
    
    cout << "\n Mindless:   ";
    for(int ai=0;ai<20;ai++)
    {
        bestnextwg=wgl.getbestnext(wgl.historywg[(int)wgl.historywg.size()-1]);
        
        if(wgl.wg[bestnextwg].words[2]=="[NL]")
            ai=20;
        else
            cout << wgl.wg[bestnextwg].words[2] << " ";
        wgl.historywg.insert(wgl.historywg.end(), bestnextwg);
    }
    
}

int collect2(string const& i) 
{
    istringstream iss(i), iss2(i), iss3(i);
    vector<string> nwords;
    nwords.insert(nwords.end(), words[words.size()-2]);
    nwords.insert(nwords.end(), words[words.size()-1]);

    copy(istream_iterator<string>(iss),
             istream_iterator<string>(),
             back_inserter(words));

    copy(istream_iterator<string>(iss3),
             istream_iterator<string>(),
             back_inserter(nwords));

    int a = distance(istream_iterator<string>(iss2), istream_iterator<string>());
    
    c_wordgroup nwg;

    for (int c=0;c<a;c++)
    {
        nwg.resetsort(wgl.c+1);
        nwg.words[0] = nwords[0+c];
        nwg.words[1] = nwords[1+c];
        nwg.words[2] = nwords[2+c];

        int wge=wgl.wgexists(nwg);

        if(wge>=0) {
            for(int hi=0; hi<1000; hi++)
                if(((int)wgl.historywg.size()-hi-1)>=0)
                {   
                    int iwg = wgl.historywg[(int)wgl.historywg.size()-hi-1];
                    wgl.wg[wge].sorting[hi] = (wgl.wg[wge].sorting[hi] + wgl.wg[iwg].sorting[hi])/2;
                }

            wgl.historywg.insert(wgl.historywg.end(), wge);
            
        } else {
            wgl.c++;
            // adjust history wordgroup sortings.
            for(int hi=0; hi<1000; hi++)
                if(((int)wgl.historywg.size()-hi-1)>=0)
                {   
                    int iwg = wgl.historywg[(int)wgl.historywg.size()-hi-1];
                    wgl.wg[iwg].sorting[hi]+=10;
                    nwg.sorting[hi]=wgl.wg[iwg].sorting[hi];
                }

            wgl.wg.insert(wgl.wg.end(), nwg);
            wgl.historywg.insert(wgl.historywg.end(), wgl.c);

        }
    }
    return a;
}

int main() {
    string i;
    wgl.c = 0;
    c_wordgroup nwg;
    nwg.resetsort(0);
    for(int i =0;i<3;i++) 
        {
            words.insert(words.end(), "[NL]");
            wgl.historywg.insert(wgl.historywg.end(), 0);
        }
    
    wgl.wg.insert(wgl.wg.end(), nwg);

    do { 
        cout << "\n      YOU:   ";
        getline(cin, i);
        collect2(i + " [NL]");
        answer2();
    } while (i.compare("exit")!=0);

    return 0;
}