Things I learnt from teaching deep learning
I completed teaching “Deep learning and Artificial Intelligence with Tensorflow” for the first term at UCSC Extension ( In the course, students were expected to solve 3 homework problems and a final project in addition to 2 quizzes. In this blog post, I will summarize my findings and some of the difficulties students faced during when solving the homework and project.
  1. The learning rate appears in the weight update equation and hence has the biggest impact on all the hyper-parameters. It needs to be the first thing that needs to be tuned.
  2. If the batch size is too small, the loss curve is noisy. The same behavior is exhibited when the learning rate is a tad too high. It can be tricky to distinguish the two cases. The batch size must be selected based on the amount of memory in the system. Smaller batch size needs less memory and larger batch size needs more memory.
  3. Pick an architecture based on the data. If the data is not complex such as MNIST, one can use a 3-layer multi-layer perceptron. However, the same architecture will not yield a good result for the CIFAR-10 dataset, which is relatively complex.
  4. Tensorflow throws a large error message. If it feels overwhelming (especially to newcomers), I recommend reading the information at the bottom of the error message before panicking.
  5. Not all time-series data have to be processed with Recurrent Neural Network (RNN) or Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU). Some students used audio data, which is a time series data and converted them into a spectrogram and the used a CNN for a classification.
  6. Make sure that the training and test dataset distribution are the same. Let’s consider a case of classifying cat images. Let’s say that the training data set contains only white cats and no black cats while the test images contain only black cats and no white cats. Considering that all hyperparameters are perfect, the training accuracy will be high and the test accuracy will be low. Does this mean the model overfit? Perhaps not. In this case, the problem is the disjoint distribution between the training and test dataset.
  7. For any classification problem, it would be helpful to check the balance of the dataset across all labels for both training and test dataset before training.
  8. In any CNN architecture, the creators would have specified the input image size. For example in VGG16, the image size specified is 224x224x3. The fully-connected network for that image size is a vector of length 4096. If the images that you are using are not of size 224x224x3, I highly recommend recalculating the size of all the intermediate matrices and vectors using paper-pencil before making changes to the code. I have seen students spending countless hours trying to debug Tensorflow errors that could have been avoided if they recalculated the sizes before coding. Similarly, when defining a Python generator to produce a batch of data, it is advisable to calculation all dimensions of tensors by hand before programming.
  9. Pick the correct cost and accuracy metric.
  10. After reading or augmenting the data, visualize the data and ensure that it is ready for your deep learning model's consumption.
  11. Ensure that you are using consistent data types such float32, float64 etc in your Tensorflow code.