Introduction
I often feel when talking to coworkers, practitioners and people in general that the Bayesian philosophy has to prove it’s worth or justification as if it’s a contender to the existing way. In reality, of course, the more prevalent way of working with probabilities is a specialization of the Bayesian formalism. Not the other way around. As such, I often find myself wondering why this came to be. Truth be told, I think the majority of the scientific community just forgot the reason for working with pure maximum likelihood. It’s a good reason, and was on top of that necessary to make any kind of progress on interesting problems at the time. The mathematical tooling as well as the computational power has increased dramatically since the introduction of pure maximum likelihood, and the reason for using it basically no longer exist. So, in this post I hope to get you more familiar with the Bayesian way of thinking and hopefully also showing you that it’s not really that scary at all. Specifically, this post will be dedicated to show you for a couple of model how the results are affected by your prior assumptions as well as the size of your data.
Let’s start off with the basics. In general throughout this post I will refer to the prior ](θ), likelihood p(y|θ) and posterior p(θ|y). We will be looking into a convenient Bayesian model called the Beta binomial model which is useful for modeling a discrete number of positive outcomes out of binary trials. This features a Beta prior distribution. We’ll further work with the assumption that we’re modeling the number of Startups that made it to Unicorns, i.e., they’re worth a billion dollars.