Yaron Vazana

NLP, Algorithms, Machine Learning, Data Science, tutorials, tips and more

  • About
  • Blog
  • Projects
  • Medium

Contact Me

yaronv99 [at] gmail.com

Powered by Genesis

You are here: Home / Data Science / How to Create a Simple WhatsApp Chatbot in Python using Doc2vec

How to Create a Simple WhatsApp Chatbot in Python using Doc2vec

December 25, 2018 by Yaron Leave a Comment

Almost all of us use whatsapp on a daily basis. Those conversations are basically unstructured text that we can use in order to learn and experiment. In this tutorial I will show how to create a very simple chatbot, that you can chat with, simply by training a doc2vec model using all the messages you already have on you phone.

whatsapp chatbot with python

Disclaimer: This post and implementation is based on the following great post which appeared in toward-data-science

If you’re just interested in the full python notebook, it’s right here (I changed the original names)

At a high level, the steps would include:

  • Loading your whatsapp conversation into a python DataFrame
  • Preparing a training set of (text, response) tuples – so the chatbot will be able to respond to your input
  • Training a Doc2Vec model
  • Implementing the chatbot conversation in python

Let’s start…

Loading your whatsapp conversation into a python DataFrame

Start by downloading your selected whatsapp conversation into your computer.

To do that, go into the conversation in the mobile app. Inside the settings menu you’ll see an “export chat” button, just save the file to your google drive, and copy it to your local computer.

Each row in the file looks like this:

3/5/17, 12:58 - ${full-name}: ${message}

In order to parse each line, and retrieve the information out of it, we define a function called “isMessage” which gets a single line and return an array with all the parsed info [date, time, name, message]

In the code above, we define the regex that matches each message line. Then, we extract the groups content and return an array with the data. In case the input line is not a message, we return None.

After we have a DataFrame with all the messages, we construct a new training set DataFrame with our training data. In our example, we take all the available tuples of 2 consecutive messages, and treat them as [input, output] pairs.

In the code above we construct a new DataFrame that will be our training data, we will use the following columns:

  • id: this will be an incremental index of the message
  • text: this is the input message (for the machine learning algorithm this will be the input X)
  • response: this is the output text (for the machine learning algorithm this will be the output Y)
  • name: the name of the person who wrote this message (just for visualization if we want)

Training the Model with Gensim

In the code above, we initialize a Doc2Vec model with the training data, and train it for 20 epochs. Doc2Vec basically learn a vector representation for each token in the vocabulary as well as a vector for each message in the training set.

Implementing the ChatBot

Lastly, we will write the chatbot loop that receives an input from the user, searches the most similar response, and output it back to the screen

Cheers

Subscribe to Blog

Subscribe to get the latest posts to your inbox

Filed Under: Algorithms, Data Science, Python Tagged With: Data Science, doc2vec, python, word2vec

I am a data science team lead at Darrow and NLP enthusiastic. My interests range from machine learning modeling to solving challenging data related problems. I believe sharing ideas is where we all become better in what we do. If you’d like to get in touch, feel free to say hello through any of the social platforms. More About Yaron…

SUBSCRIBE TO BLOG

Subscribe to Blog

Subscribe to get the latest posts to your inbox

Recent Posts

  • Training an AutoEncoder to Generate Text Embeddings
  • Using Dockers for your Data Science Dev Environment
  • Identifying Real Estate Opportunities using Machine Learning
  • How to Create a Simple WhatsApp Chatbot in Python using Doc2vec
  • Average Word Vectors – Generate Document / Paragraph / Sentence Embeddings
  • Visualizing Vectors using TensorBoard
  • Training a Doc2Vec Model with Gensim
 

Loading Comments...