Yaron Vazana

NLP, Algorithms, Machine Learning, Data Science, tutorials, tips and more

  • About
  • Blog
  • Projects
  • Medium

Contact Me

yaronv99 [at] gmail.com

Powered by Genesis

You are here: Home / Archives for Data Science

Training an AutoEncoder to Generate Text Embeddings

September 28, 2019 by Yaron Leave a Comment

Calculating sentences / paragraphs vectors can be done in many ways. For example a simple method  is to average all the words vectors and retrieve a single vector for the entire piece of text, of course this forces you to have pre-calculated word embeddings (I already wrote about it here).

In this post I will show a different approach that uses an AutoEncoder

The aim of an AutoEncoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise”.

autoencoder model diagram
[Read more…]

Filed Under: Data Science, Python Tagged With: autoencoder, Data Science, machine learning, python

Using Dockers for your Data Science Dev Environment

May 15, 2019 by Yaron Leave a Comment

This post is a bit different than all other posts I usually publish. In this post I decided to talk about my data science development workflow, and how I utilize dockers in my daily work. I truly believe in great tools which make us more productive and help us focus on our main problem at hand.

I will start with why would we even want to use dockers? and what’s wrong with the usual way data scientists work with notebooks?

[Read more…]

Filed Under: Data Science, Python Tagged With: Data Science, docker, python

Identifying Real Estate Opportunities using Machine Learning

January 26, 2019 by Yaron Leave a Comment

Real Estate investments have always been something I was really interested in. The geographical factors, together with humans’ behavior patterns, have the power to determine whether one place is more “wanted” than another.

My quest began when I decided to utilize Machine Learning techniques in the Real Estate domain, in order to help me find my best “next investment”.

In addition, I was also very curious about regression analysis, since most of my time I’m dealing with classification tasks. So I thought what could be better than doing an EDA on the topic.

real estate using machine learning
real estate together with machine learning
[Read more…]

Filed Under: Algorithms, Data Science, Python Tagged With: Data Science, python, real estate, regression

How to Create a Simple WhatsApp Chatbot in Python using Doc2vec

December 25, 2018 by Yaron Leave a Comment

Almost all of us use whatsapp on a daily basis. Those conversations are basically unstructured text that we can use in order to learn and experiment. In this tutorial I will show how to create a very simple chatbot, that you can chat with, simply by training a doc2vec model using all the messages you already have on you phone.

whatsapp chatbot with python

Disclaimer: This post and implementation is based on the following great post which appeared in toward-data-science

If you’re just interested in the full python notebook, it’s right here (I changed the original names)

At a high level, the steps would include:

  • Loading your whatsapp conversation into a python DataFrame
  • Preparing a training set of (text, response) tuples – so the chatbot will be able to respond to your input
  • Training a Doc2Vec model
  • Implementing the chatbot conversation in python

Let’s start…

[Read more…]

Filed Under: Algorithms, Data Science, Python Tagged With: Data Science, doc2vec, python, word2vec

Visualizing Vectors using TensorBoard

August 11, 2018 by Yaron Leave a Comment

All machine learning algorithms require your data to be represented as vectors (usually they’re high dimensional).

Many times, visualizing those vectors in order to get insights, even before you run them through a machine learning process, is something which can tell you if you’re heading toward the right solution – or at least let you know if you don’t.

This python notebook contains a small script that can take a set of any n-dimensional vectors and “project” them onto a 2D/3D plain using Tensorboard.

After visualizing your vectors, you can explore and cluster them using PCA / TSNE

clustering using tensorboard

[Read more…]

Filed Under: Data Science, Python Tagged With: Data Science, doc2vec, word2vec

  • 1
  • 2
  • Next Page »

SUBSCRIBE TO BLOG

Subscribe to Blog

Subscribe to get the latest posts to your inbox

Recent Posts

  • Training an AutoEncoder to Generate Text Embeddings
  • Using Dockers for your Data Science Dev Environment
  • Identifying Real Estate Opportunities using Machine Learning
  • How to Create a Simple WhatsApp Chatbot in Python using Doc2vec
  • Average Word Vectors – Generate Document / Paragraph / Sentence Embeddings
  • Visualizing Vectors using TensorBoard
  • Training a Doc2Vec Model with Gensim
 

Loading Comments...