Einstein Sentiment - Not precise

TL; DR

The results from the supplied Einstein (Metamind) Sentiment model will only be as good as the input data that Salesforce used to create it. If your text varies significantly from the training data you are unlikely to get the expected result. You can't refine this model with more examples.

You could create a new language dataset with the type "text-sentiment". Then you would be free to train it as required based on actual examples.


Einstein Sentiment analysis is currently based on a model that:

"was trained on data aggregated from multiple sources. Mostly short snippets of text and comparable to what you might find in a public forum." Source.

What they mean here is that the CommunitySentiment model is something that has been pre-created for this task. They collected many examples of text (thousands, hundreds of thousands, millions... ?) and then manually categorized them as being either positive, negative, or neutral. They then used this input data to build the model such that when text like the training data came in a similar response came out. This is known as supervised learning.

Those probabilities are telling you how similar the input text is to the categories that were applied to the training text.

It's probably easier to understand this process with something like Einstein Vision API. Yes, it is imaged rather than text based, but the underlying principle is the same. You train a model on some input data (the dataset). Then use that model to make predictions. The closer the input is to the training data the better the results you'll get. Conversely, when you put something into the model that it hasn't been trained on then you are likely to get a poor result.

The challenge with the Einstein Sentiment Analysis is that you don't get to modify the model with further training data. It's a fixed thing provided by Salesforce. This is great in that you don't need to build up a massive training data set, but it does limit your ability to refine it with actual examples from your specific data.

Here are some slides from my Mad Catter talk that hopefully clarify things:

  1. You start of with some labels. In the sentiment analysis case that is positive, negative, and neutral. You then collect many example inputs and manually apply the labels to them.

enter image description here

  1. Now you feed those labels and examples into your AI structured learning tool to produce a model.

enter image description here

enter image description here

  1. That model only knows about the labels you trained it on. Feed it anything else, and it won't know what to make of it.

enter image description here

  1. Often internally these models are made up of a number of layers that are then connected by weighted edges. The strength of those edges is altered by the training process so the model produces results as close as possible to the examples.

enter image description here

  1. When you do give that model a new input its the process through those layers and edges that produces the probabilities. Give if something similar to the training data and you will get a high probability for the corresponding label.

enter image description here


As an aside, it appears that the Sentiment Model in Social Studio can be "tuned".


Lets take a look at your results and try to make sense of them. This takes a bit of understanding of statistics. The sentiment analysis has broken down different parts of what it's read and evaluated each part, returning the probability that each part is either positive, negative or neutral.

I'm going to round your numbers to two digits for simplicity.

phrase           negative  positive  neutral
------           --------  --------  -------
too bad            .72       .24       .04
Battery real good  .75       .22       .03
Phone hangs        .57       .33       .10
Overall Great      .04       .95       .01
Worst              .89       .11       .00
not too good       .48       .46       .06
Good               .32       .59       .09

Total              3.77      3.77       .33
Avg                 .54       .54       .05

What this really tells you is that when evaluating the sentiment on the text that was originally entered for this analysis, the sentiment is essentially neutral. I think it's obvious the sentiment analysis on the 2nd phrase was incorrect, but those things are going to happen.

I don't know the context of that phrase nor do I know the context of "what" was or was not "too bad", so either of those could have been easily incorrectly evaluated. It's also why this isn't an exact science and there's a probability associated with each of them as having been positive or neutral instead of negative.

I didn't do a weighted mean average on this or any other statistical analysis. Instead, I just kept it simple so it would be easy to understand. What I'll say is there is a lot of variance in the data. The negatives range from .04 to .89 and the positives from .11 to .95. (about the same variance)

This wide range of variance has a lot of significance in telling you the data was inconsistent. It didn't consistently trend in one direction or another. This tends to support the conclusion the sentiment is neutral.

I hope this helps explain your model and the results you obtained from it.