Yaron Vazana

NLP, Algorithms, Machine Learning, Data Science, tutorials, tips and more

  • About
  • Blog
  • Projects
  • Medium

Contact Me

yaronv99 [at] gmail.com

Powered by Genesis

You are here: Home / Archives for Python

Training an AutoEncoder to Generate Text Embeddings

September 28, 2019 by Yaron Leave a Comment

Calculating sentences / paragraphs vectors can be done in many ways. For example a simple method  is to average all the words vectors and retrieve a single vector for the entire piece of text, of course this forces you to have pre-calculated word embeddings (I already wrote about it here).

In this post I will show a different approach that uses an AutoEncoder

The aim of an AutoEncoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise”.

autoencoder model diagram
[Read more…]

Filed Under: Data Science, Python Tagged With: autoencoder, Data Science, machine learning, python

Using Dockers for your Data Science Dev Environment

May 15, 2019 by Yaron Leave a Comment

This post is a bit different than all other posts I usually publish. In this post I decided to talk about my data science development workflow, and how I utilize dockers in my daily work. I truly believe in great tools which make us more productive and help us focus on our main problem at hand.

I will start with why would we even want to use dockers? and what’s wrong with the usual way data scientists work with notebooks?

[Read more…]

Filed Under: Data Science, Python Tagged With: Data Science, docker, python

Identifying Real Estate Opportunities using Machine Learning

January 26, 2019 by Yaron Leave a Comment

Real Estate investments have always been something I was really interested in. The geographical factors, together with humans’ behavior patterns, have the power to determine whether one place is more “wanted” than another.

My quest began when I decided to utilize Machine Learning techniques in the Real Estate domain, in order to help me find my best “next investment”.

In addition, I was also very curious about regression analysis, since most of my time I’m dealing with classification tasks. So I thought what could be better than doing an EDA on the topic.

real estate using machine learning
real estate together with machine learning
[Read more…]

Filed Under: Algorithms, Data Science, Python Tagged With: Data Science, python, real estate, regression

How to Create a Simple WhatsApp Chatbot in Python using Doc2vec

December 25, 2018 by Yaron Leave a Comment

Almost all of us use whatsapp on a daily basis. Those conversations are basically unstructured text that we can use in order to learn and experiment. In this tutorial I will show how to create a very simple chatbot, that you can chat with, simply by training a doc2vec model using all the messages you already have on you phone.

whatsapp chatbot with python

Disclaimer: This post and implementation is based on the following great post which appeared in toward-data-science

If you’re just interested in the full python notebook, it’s right here (I changed the original names)

At a high level, the steps would include:

  • Loading your whatsapp conversation into a python DataFrame
  • Preparing a training set of (text, response) tuples – so the chatbot will be able to respond to your input
  • Training a Doc2Vec model
  • Implementing the chatbot conversation in python

Let’s start…

[Read more…]

Filed Under: Algorithms, Data Science, Python Tagged With: Data Science, doc2vec, python, word2vec

Average Word Vectors – Generate Document / Paragraph / Sentence Embeddings

September 20, 2018 by Yaron Leave a Comment

Using the strength of word vectors and applying it to larger text formats, such as documents, paragraphs or sentences, is a very common technique in many NLP use cases.

Let’s look at the basic scenario where you have multiple sentences (or paragraphs), and you want to compare them with each other. In that case, using fixed length vectors to represent the sentences, gives you the ability to measure the similarity between them, even though each sentence can be of a different length.

In this post, I will show a very common technique to generate new embeddings to sentences / paragraphs / documents, using an existing pre-trained word embeddings, by averaging the word vectors to create a single fixed size embedding vector.

average word vectors
average word vectors

[Read more…]

Filed Under: Algorithms, Data Science, Python

  • 1
  • 2
  • Next Page »

SUBSCRIBE TO BLOG

Subscribe to Blog

Subscribe to get the latest posts to your inbox

Recent Posts

  • Training an AutoEncoder to Generate Text Embeddings
  • Using Dockers for your Data Science Dev Environment
  • Identifying Real Estate Opportunities using Machine Learning
  • How to Create a Simple WhatsApp Chatbot in Python using Doc2vec
  • Average Word Vectors – Generate Document / Paragraph / Sentence Embeddings
  • Visualizing Vectors using TensorBoard
  • Training a Doc2Vec Model with Gensim
 

Loading Comments...