Real Estate investments have always been something I was really interested in. The geographical factors, together with humans’ behavior patterns, have the power to determine whether one place is more “wanted” than another.
My quest began when I decided to utilize Machine Learning techniques in the Real Estate domain, in order to help me find my best “next investment”.
In addition, I was also very curious about regression analysis, since most of my time I’m dealing with classification tasks. So I thought what could be better than doing an EDA on the topic.
TL;DR
As always, if you’re just interested in the python notebook, here’s the link
Before you continue reading, I will say that all the cool / interesting stuff is in the notebook itself
The Intuition
In order to identify real estate opportunities, I used the algorithm described in this great arxiv paper.
In general, the idea is to train a model that can predict house prices (in a certain district). Then, predict the prices of all the listings which are “for-sell”, and identify those with a low price-tag but high market value.
Getting The Data
In this analysis I used data taken from Madlan (which is the Israeli equivalent to the well known Zillow Real-Estate website).
The Data consists of 288 of “sold” records – which will be used as the training set, and another 286 “for-sell” items – this will be the set where we will find the opportunities.
The Workflow
Since this blog post is just a companion summary to the python notebook, I’m only going to outline the main steps I covered in the code:
- Data Cleaning & Feature Engineering
- Remove unneeded columns
- Handle missing values
- Handle date features
- Generate more useful features
- Features scaling and normalization
- Univariate analysis (features histograms)
- Bivariate analysis (scatter plots of features with the target variable)
- 1-hot encode categorical features
- Modeling
- Linear Regression
- Ridge Regression
- Lasso Regression
- Random Forest
- SVM Regressor (SVR)
- Deep Neural Network
- Predicting the best real estate opportunities
Link to the notebook on Github
Cheers