One (sort of Bayesian) approach to noise and smoothing is as follows: At the beginning, suppose all we know is aggregate data that 1/3 of all houses are evil. Then, in the absence of any further data, if we were asked for the probability for any house, we would say it is 1/3. Now let us say that we get a piece of data: person A who belongs to House Blue is evil. Does that mean that we instantly predict the probability of House Blue being evil is 1, and of course correspondingly reduce the probabilities for the other houses? This is where noise and “small” data comes in, and we are ignoring the a priori probability.
A better approach would be to decide that we want for example 3 incidents before we start becoming confident in our predictions. Then when we get data for a particular House Blue that it has 4 people P of whom 3 are evil E, the noisy, unsmoothed approach would be to say p(Blue = evil) = E/P = 3/4. Note that we have jumped from 0.33 to 0.75 on very little data. Instead, p_corr(Blue = evil) = (E+1)/(P+3) = 4/7 = .57 which is a smooth interpolation. Note that this exactly how Beta distributions add. The “3” incidents is a tunable hyperparameter which you can decide based on how much risk you want to take.
Hope that helps.