# Deep Diving Into Natural Language Processing Part I

Interpreting and understanding written and spoken natural language is one of the best applications of modern deep learning models. From messaging bots to advanced digital assistants, natural language is a fundamental element of our daily interactions with artificial intelligence(AI).

In deep learning theory, the group of techniques used to interact with natural language are known as natural language processing(NLP) or natural language understanding(NLU). In some domains, AI experts like to draw a distinction between NLP and NLU with the latter having a more profound impact in contextual and semantic analysis of language expressions. However, in most scenarios, the terms NLP and NLU can be used interchangeably.

These days, there are plenty of frameworks and platforms that simplify the implementation of NLP models. From cloud AI platforms such as Watson Developer Cloud Conversation Service, Microsoft’s LUIS or Google’s API.ai to messaging runtimes like Facebook’s Wit.ai to innovative startups like MonkeyLearn, there isn’t a lack of options to build NLP applications. In addition to that, deep learning frameworks such as TensorFlow, Caffe2 or Theano provide libraries that enable the implementation of sophisticated NLP algorithms. Those deep learning frameworks are typically used in many domain-specific scenarios that require more advanced conversational capabilities or even custom NLP algorithms. For that reason, there is a tremendous value on understanding at least the fundamental deep learning techniques behind the current generation of NLP platforms.

NLP algorithms typically focus on processing sequential data that represents a sentence in a natural language. From that perspective, many of the techniques such as recurrent or recursive neural networks(RNNs)(see my previous series RNNs) specialized on processing sequential vectors are relevant in the NLP universe. The history of NLP goes back to the late 1980s and early 1990s when computer scientists like Dyer and Schmidhuber started applying neural networks to understand the syntax of natural language sentences. However, is is not until a decade later with the work of deep learning pioneers like Yoshua Bengio that NLP started to gain momentum in real world applications. Most modern NLP models draw inspiration from a simple technique known as n-grams.

**n-grams**

Conceptually, n-grams focus on determining the probability distribution of tokens such as words in a natural language sentence. Specifically, n-grams define the conditional probability of the occurrence of a token or word in the nth position of a sentence based on the probability of the previous n-1 words. n-grams leverage statistical techniques based on the Bayes theorem to determine the probability of a sequence of words. The Bayes based expression looks something like this:

**P(Xt | Xt-n +1, ….,Xt-1)= Pn(Xt-n+1,…,Xt)/Pn-1(Xt-n+1,…,Xt-1)**

**Where P(x) expresses the probability of X and (Xt, Xt-1,…,X1) is the input dataset.**

Let’s illustrate n-grams in practice by taking the following sentence “AI IS THE FUTURE”. Running that sentence through a n-grams model looks something like this:

**P(AI IS THE FUTURE)= P3(AI IS THE)*P3(IS THE FUTURE)/P2(IS THE)**

**Where Pn(x) is the probability of nth order of x.**

n-grams has been an important step in the evolution of NLP but it also has severe limitations when applied to large natural language texts. Historically, n-grams has been very vulnerable to the curse of dimensionality.

Tomorrow we will cover new NLP techniques that have been created as an alternative to n-grams.