Running a transfer learning IPython notebook with GPU on a High Performance Computing server

It is common to run IPython notebooks locally or using Google Colab, but if you have access to a High Performance Computing (HPC) server with GPUs available, you may drastically reduce the notebook running time effortlessly. In this example we will use a transfer learning IPython notebook to perform a local test, and then run this notebook on a HPC server using Shell script.

Performing the local test

We will use a transfer learning example, but any other deep learning code saved as a IPython notebook will work as well. First, we need to install a conda package manager on linux, such as miniconda. After the installation you may see “(base)” near your username the next times you open the terminal. This means that the conda package manager was successfully installed, and you are on the default environment. From this point, create a new environment, activate it, and install jupyterlab and tensorflow-gpu. The tensorflow-gpu includes tensorflow, cuDNN, and CUDA toolkit. Note that you will need a NVIDIA GPU. Type the following codes in the linux terminal.

conda create -n jupyter
conda env list
conda activate jupyter
conda install -c conda-forge jupyterlab
conda install -c anaconda tensorflow-gpu

You need to activate an environment to use it. The trick here is to use the created environment, open JupyterLab, and install the python packages needed to run the notebook. Thus, the packages will be added in the environment recipe, which can be then exported to install everything we need on another computer.

To open JupyterLab, type in the terminal.

jupyter-lab &

Until the end of this section, type the following codes in a notebook. We will need runipy to run the whole notebook via command line later (next section), and the other python libraries are required to run the notebook code. Install these libraries and check if tensorflow can find the GPU. You should get the message “Found GPU at: /device:GPU:0”. The following codes must be used on the JupyterLab notebook.

!pip install runipy
!pip install tensorflow_datasets
!pip install matplotlib
import numpy as np
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
from tensorflow.keras import layers
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

We will use code from the keras transfer learning tutorial. This tutorial uses the cats vs dogs dataset, which contains 25,000 images. We will split all the dataset as follows: 70% for training, 15% for validation, and 15% for testing.

train_ds, validation_ds, test_ds = tfds.load(
split=["train[:70%]", "train[70%:85%]", "train[85%:100%]"],
as_supervised=True, # Include labels
print("Number of training samples: %d" %
print("Number of validation samples: %d" %
print("Number of test samples: %d" %

The images are labeled using 1 for dogs and 0 for cats. As a classification problem, these labels will be predicted for the other sets of images. Let’s check the first 9 images from the training dataset.

plt.figure(figsize=(10, 10))
for i, (image, label) in enumerate(train_ds.take(9)):
ax = plt.subplot(3, 3, i + 1)

As the images have different dimensions, we need to resize them. Let’s resize the images to the dimension 150 x 150.

size = (150, 150)
train_ds = x, y: (tf.image.resize(x, size), y))
validation_ds = x, y: (tf.image.resize(x, size), y))
test_ds = x, y: (tf.image.resize(x, size), y))

We have 16,283 images in the training dataset. We will use 32 batches to load the data, thus, the model will backpropagate 509 times. A low batch size usually increases the model convergence speed, reducing the number of epochs needed to achieve a high accuracy.

batch_size = 32train_ds = train_ds.cache().batch(batch_size).prefetch(buffer_size=10)
validation_ds = validation_ds.cache().batch(batch_size).prefetch(buffer_size=10)
test_ds = test_ds.cache().batch(batch_size).prefetch(buffer_size=10)

To slow down overfitting we will use data augmentation. This will perform some transformations on random images to expose the model to different aspects. This code will flip and rotate images, but there are other operations available.

data_augmentation = keras.Sequential([
for images, labels in train_ds.take(1):
plt.figure(figsize=(10, 10))
first_image = images[0]
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
augmented_image = data_augmentation(tf.expand_dims(first_image, 0), training=True)

The transfer learning will be performed using the keras Xception model using weights pre-trained on ImageNet. Keras has some models available, being the Xception model able to achieve a good accuracy. The ImageNet is a database containing a large number of classes, including animals such as cats and dogs.

base_model = keras.applications.Xception(
weights="imagenet", # Load weights pre-trained on ImageNet.
input_shape=(150, 150, 3),
) # Do not include the ImageNet classifier at the top.

# Freeze the base_model
base_model.trainable = False

# Create new model on top
inputs = keras.Input(shape=(150, 150, 3))
x = data_augmentation(inputs) # Apply random data augmentation

# Pre-trained Xception weights requires that input be normalized
# from (0, 255) to a range (-1., +1.), the normalization layer
# does the following, outputs = (inputs - mean) / sqrt(var)
norm_layer = keras.layers.experimental.preprocessing.Normalization()
mean = np.array([127.5] * 3)
var = mean ** 2
# Scale inputs to [-1, +1]
x = norm_layer(x)
norm_layer.set_weights([mean, var])

# The base model contains batchnorm layers. We want to keep them in inference mode
# when we unfreeze the base model for fine-tuning, so we make sure that the
# base_model is running in inference mode here.
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dropout(0.2)(x) # Regularize with dropout
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)


The np.array([127.5] * 3) is used to define the center of the RGB pixel range, being used to normalize the values between -1 and 1. Now we need to train the unfrozen layers.


epochs = 20
history =, epochs=epochs, validation_data=validation_ds)

Save the in a variable to allow the plot of the binary accuracy and loss.

plt.title('model accuracy')
plt.ylabel('binary accuracy')
plt.legend(['train', 'val'], loc='upper left')
plt.title('model loss')
plt.legend(['train', 'val'], loc='upper left')

We can check the final binary accuracy and the maximum achieved.


Now let’s train the entire model.

base_model.trainable = True

optimizer=keras.optimizers.Adam(1e-5), # Low learning rate

epochs = 10
history =, epochs=epochs, validation_data=validation_ds)

And save the model."model.h5")

Run the notebook on the HPC server

First we need to export the recipe of our environment. Type the following codes in the linux terminal.

conda activate jupyter
conda env export -f jupyter.yaml

Copy the recipe to the HPC server, install the conda package manager, and import the environment.

conda env create -f jupyter.yaml

We need to edit the notebook, removing or commenting the libraries installation.

!pip install runipy
!pip install tensorflow_datasets
!pip install matplotlib

This example using Xception and ImageNet will not work if the computing nodes from the HPC server are not able to access the internet. As an alternative, you may copy the dataset and the model weights to the HPC storage. The dataset cats_vs_dogs was locally downloaded at “~/tensorflow_datasets” by tfds.load(), so it is possible to copy this folder to the HPC server, or to download it in the storage while you are in the login node. The model weights may be downloaded using wget, and the path passed to the Xception weights argument, replacing “imagenet”.

HPC servers usually use the Slurm Workload Manager to manage jobs. We need to create a Shell script informing the use of GPU. Create a Shell script and give execution permission to it.

chmod u+x

Edit the content of this script to allocate one node and one gpu for 5 hours. The “time” Unix command will be used to check the running time, and runipy to run the whole notebook.

#SBATCH --nodes=1
#SBATCH --job-name=tlgpu
#SBATCH --time=00-05:00
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
/usr/bin/time -f "%E time_elapsed\n%S time_sys\n%U time_user\n" runipy hpc.ipynb

Submit the job via command line.


Now you are able to run your deep learning notebooks on a HPC server. The recipe and the notebooks used in this article are available at


Authors contributions

Themístocles Negreiros led the coding at colab and Diego morais led the writing at medium and the tests on the HPC server. All authors contributed to the drafts and approved the final article and python notebook versions.

Authors github



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store