I applaud your use of the log relative odds, but why fuck with the math gratuitously and redefine terms incorrectly? If p is the probability, o = p/(1-p) is the odds ratio, since it is the odds expressed as a ratio rather than as p : 1-p in whole number terms. If you have two probabilities p1 and p2, o1/o2 is the relative odds, and log(o1/o2) is the log relative odds. If, as with word frequencies in texts, the probabilities are small, you can approximate (1-p) by 1, so the log relative odds simplifies to log(p1/p2).

There is no natural metric for comparing probabilities, but since log odds is just the full real line, you can use the Euclidean metric to define a distance in log odds space, which is what you’ve done.

Since you are calculating log relative odds, you needn’t have bothered with removing stop words. In fact, differences in their frequencies might be worth looking into, I wouldn’t be surprised if Trump overuses stop words, particularly “I”.

Finally, if the probabilities are equal, the log relative odds is zero, not null.

I stop to miau to cats.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store