# Some Thoughts About Statistical Learning in AI Systems Part II: Challenges

How many Bayesians do we need to change a light bulb? We are not sure, come to think about it, we are not sure about the probability that the light bulb is burn out :) I know, I know, AI people should stay away form comedy :) but hopefully you got the point. Today, I would like to continue out analysis about statistical learning by highlighting some of its limitations in real world artificial intelligence(AI) systems.

The previous part of this essay explained some of the fundamentals and principal techniques used in statistical learning. From Siri to DeepMind’s recent breakthroughs, statistical learning has become of the most prevalent models in AI agents. However , there are many well-known challenges when we rely solely on statistics to acquire knowledge.

**The AI Bureaucrat**

Sometimes I like to compare statistical learning models with polite bureaucrats that always have some form of an answer to every questions but can never give you a straight answer. The only question to never ask a bureaucrat is “Why?” says an old wisdom quote. Similarly to bureaucrats, statistical learning systems run into trouble trying to reach deterministic conclusions. To some extent, Bayesians or statistical learning really means that you can never be completely sure of anything. as it turns out, that level of uncertainty can be troublesome in many real world AI scenarios.

**The Inference Dilemma**

Statistical learning models such as Bayes networks or Markov Chains rely on connecting states in an environment via probabilities and compute future probabilities associated with potential actions. However, the fact that we are able to represent probabilistic distributions, it doesn’t always mean that we can reason effectively through them. This is known as the Inference Dilemma.

To explain the Inference Dilemma, let’s use a basketball example. Suppose that, in an NBA team, we are trying to predict the likelihood that out All-Star power-forward will score over 35 points against a rival team. Our power-forward typically scores over 35 points in games when he plays over 40 minutes and our starting point-guard has over 10 assists. Obviously, we need to compute those probabilities. The point-guard has recently struggled against zone-defense so, in order to estimate the his chance of getting 10 assist, we need to factor in the time that the opposite team will play zone defense. Out power-forward also seems to be 30% more effective shooting from the right side of the ring so we need to compute the probability that he can play towards to the right side of the ring for at least 50% of his possessions. However, there is a backup power-forward in the opposite team that is notorious for forcing players to drive to the left side of the ring so we also need to factor in the minutes of this player. I can keep going for another hour but hopefully you see how this model can easily get out of control.

The complexity of statistical learning can get even worse if we start including the so called “invisible connections” between states. For instance, in ur basketball scenario, the number of assists of our starting point-guard and shooting-guard seem apparently independent. However, every time the shooting-guard assists on a play, it indirectly means that the point-guard didn’t get that assist so there is an invisible, unquantifiable relationship between those states.

The key to solve the Inference Dilemma is to try to model a statistical network as a tree without the trunk getting o thick. Easier said that done though :) One of the most common techniques that address the Inference Dilemma is what is known as Markov Chain Monte Carlo(MCMC). Algorithmically, MCMC is a fairly complex process that tries to simulate random navigations through a statistical network but doing so in a way that tthe number of time each state is visited is proportional to its probability. an optimal MCMC should converge to a stable distribution that produces approximately the same answers of the original network but its much easier to navigate.

**The Absence of Logic**

Maybe the biggest drawback of statistical learning models is that they rely on large probabilistic distributions to build knowledge that, in many cases, can be simply expressed by a series of logic functions. To express the knowledge contained in a few If…Then statements we might need a massive statiscal network. Obviously, statiscal learning models tend to be used in scenarios that operate in uncertain, incomplete environments in which logic can result useless. As a result, many AI scenarios try to combine statistics and logic to achieve superior forms of knowledge building.