All machine learning algorithms require your data to be represented as vectors (usually they’re high dimensional).
Many times, visualizing those vectors in order to get insights, even before you run them through a machine learning process, is something which can tell you if you’re heading toward the right solution – or at least let you know if you don’t.
This python notebook contains a small script that can take a set of any n-dimensional vectors and “project” them onto a 2D/3D plain using Tensorboard.
After visualizing your vectors, you can explore and cluster them using PCA / TSNE
Explaining the Parameters
The TF_visualizer object gets 4 parameters:
- dimension (int): this is the dimension of the vectors
- vecs_file (str): this is the path to the file that contains all the vectors – each line is a single vector, each vector component is separated by a comma “,” (sampe-vecs-file)
- metadata_file (str) : this is a path to a metadata file that can contain useful information you can have associated to the vectors (for example, each vector can have an id / class_label etc…(sample-metadata-file)
- output_path (str): this is the path to the location where all the outputs of tensorboard will be created – later you will start tensorboard with a parameter to this path
You can download those sample files to see how vecs_file and metadata_file look like
The notebook is also available on my github
Starting TensorBoard
In order to start the tensorboard service, just enter the command below into your command line (make sure to change the ${output-path} parameter to match yours)
tensorboard --logdir=${output-path}
Cheers