Deep NLP — Using recurrent neural networks to classify wine reviews

Gabriel Jarosi
5 min readFeb 2, 2021
Image taken from:https://thumbs.dreamstime.com/z/nlp-natural-language-processing-cognitive-computing-technology-concept-virtual-screen-natural-language-scince-concept-nlp-154696606.jpg

Deep natural language processing (Deep NLP) is a technique used to extract meaning and context from a body of text. That is made possible by using Word Embeddings, which are a type of vectorization strategy that computes word vectors from a body of text and uses those vectors to capture the semantic meaning of words.

In this post, I will create a model that can classify low quality and high quality wines based on their expert reviews and introduce how to implement Word Embeddings using Word2Vec and GloVe, as well as introducing LSTM, Double LSTM, GRU, and Bidirectional layers used in modeling.

IMPORTS:
Below are all the packages imported for this task.

DATA:

The dataset being analyzed in this project is called “Wine Reviews” and was taken from https://www.kaggle.com/zynicide/wine-reviews

This dataset consists of 150,930 observations and 10 unique features.

Since this is a NLP project, the features that will be used are the wine reviews located in the description column and the review score located in the points column. The reviews are the text data that will be analyzed and the scores will be the targets.

In order to make this classification problem simpler, the points will be divided into categories. If the points were broken down into the categories in the 100-points system, this project would be a 6-class classification problem and the model would have a difficult time predicting the classes.

A wine expert was consulted and stated that wines start to standout from the rest at a score of 88 or higher. After reading some reviews from the data frame, I also noticed that the words used to describe such wines change and become more positive and inviting. I decided to make this project a Binary Classification problem, with wines with scores lower than 88 being low quality and wines with scores 88 or higher being high quality.

Before starting to model the neural networks, the data needs to be pre-processed and Word2Vec needs to be trained.

For this project, pre-processing was kept to a minimum. simple_process was used to lower case and tokenize the text. No StopWords were removed.

Now let’s train Word2Vec using the processed data.

Again, Word Embeddings are used to capture the semantic meaning of words, so let’s see if the Word2Vec model learned the meaning of the word fruity and looking at the most similar words.

The most_similar words are indeed similar to fruity, so it seems that the Word2Vec model was trained properly.

Now let’s move to the NN model. First, the dataset needs to be separated into the target (y) and the data being analyzed (X). Then train-test-split will be performed, class weights will be calculated, the target columns will be one-hot encoded, and text will be tokenized.

Now, everything is ready for modeling. All models will be evaluated using a pre-made evaluation function. None of them were optimized, better results can be achieved by changing the number of nodes, hidden layers, and optimizing function. All models used an earlystop function to prevent overfitting and to get the best epoch.

Model with Regular Embeddings using LSTM layer.

Results:

Model with Word2Vec Embedding using LSTM layer

Results:

Model with Glove Embedding using LSTM layer

Results:

Model Using GRU layer

Results:

Model using Double LSTM layers

Results:

Model using Bidirectional LSTM layer

Results:

Conclusion

This post introduced the concept of Word Embeddings by using Word2Vec and GloVe and also introduced different layers that can be used in order to make the best Deep NLP model.

In this project, the best model was the one that used double LSTM layers, with an accuracy of 86.24%, but was also one of the most computationally expensive. Just because this setup worked well for this project that it means that this setup is the optimal for every single case. Every dataset is different and models might perform better using different strategies.

The full Notebook for this project can be found at https://github.com/gabrieljarosi/dsc-mod-4-project-v2-1-onl01-dtsc-pt-052620.

--

--