A quick look at the table reveals a number of pathologies. If we look at the intercept we can see that it’s 198 percent off. For the
x1 and
x2 variables we’re -94 and -87 percent off respectively. The interaction effect ends up being 9 percent off target which is not much. All in all though, we’re significantly off the target. This is not surprising though. In fact, I would have been surprised had we succeeded. So what’s the problem? Well, the problem is that our basic assumption of independence between variables quite frankly does not hold. The reason why it doesn’t hold is because the generated data is indeed correlated. Remember our covariance matrix in the two dimensional multivariate gaussian.
Let’s try to fix our analysis. In this setting we need to introduce context and the easiest most natural way to deal with that are priors. To do this we cannot use our old trusted friend “lm” in R but must resort to a bayesian framework.
Stan makes that very simple. This implementation of our model is not very elegant but it will neatly show you how easily you can define models in this language. We simply specify our data, parameters and model. We set the priors in the model part. Notice here that we don’t put priors on everything. For instance. I might know that a value around 1 is reasonable for our main and interaction effects but I have no idea of where the intercept should be. In this case I will simple be completely ignorant and not inject my knowledge into the model about the intercept because I fundamentally believe I don’t have any. That’s why
β0 does not appear in the model section.