Data Science Archives - Page 2 of 2

Training a Doc2Vec Model with Gensim

January 20, 2018 by Yaron 2 Comments

Representing unstructured documents as vectors can be done in many ways. One very common approach is to use the well-known word2vec algorithm, and generalize it to documents level, which is also known as doc2vec.

A great python library to train such doc2vec models, is Gensim. And this is what this tutorial will show.

[Read more…]

Scala Website Crawler

April 13, 2017 by Yaron Leave a Comment

All Machine Learning models require a large amount of data, both for training and for testing.

Getting the data, even before dealing with the ML stuff, can be hard and tedious, depending on the source you have and the accessibility of the data itself.

Crawling a website or a blog is a convenient way for getting the data you need. With a relatively small effort, you can generate a CSV file containing all the data, and analyze it easier using state of the art tools.

[Read more…]

« Previous Page
1
2