Quickly load large amounts of data in Google Colab

Good day, Habr. I decided to share my knowledge of how to quickly upload a large number of files to Google Colab with Google Drive.

Everyone knows that Google Colab is a great free platform for learning and experimenting on Neural Networks.

On the Google Colab platform, you will be provided with a powerful video card for free on which you can experiment with training your neural network for about 12 hours.
Then the session will be interrupted, but the next day from Google you can again get a video card and continue your experiments.

Neural networks require a lot of data for training, especially when it comes to neural networks working with images.

To train such neural networks, it is necessary to load thousands and hundreds of images into the training and validation samples. Unfortunately, if you download these images directly from your Google Drive, it takes an indecent time - tens of minutes or even hours. After all, each request for a file in Google Drive and receiving a response from it with the contents of the file occurs sequentially and not quickly.

It's a shame to spend time accessing a free video card to download data, and it’s not reasonable.

And we are reasonable people, so once we turn to Google Drive, we consider our data packed in advance in a zip archive, unpack the resulting zip archive into Google Colab memory and consider our data at a speed hundreds of times faster than with Google Drive in a single file.

For an experiment with the speed of loading data into Colab, I took my Airplanes database for a segmentation neural network.

In this database there is a folder with images “airplanes” and a folder “segmentation”, where masks of images of airplanes from the above folder are stored.
Each folder contains 1,005 images of 1920 * 1080.
In total, we have to upload 2010 files.
I previously uploaded to myself on Google Drive both the database with images and its zip archive.

Training Base Structure:



So, let's get down to speed downloading data from Google Drive:

  1. We launch Google Colab and import the libraries and modules we need for this

  2. Run the command to connect to Google Drive

  3. Follow the link to select your Google account

  4. Choosing your account on Google

  5. Colab Google Drive

  6. Google Drive

  7. Google Drive

  8. Colab

  9. , .





  10. , 2010 c 1920*1080 0,96 .

    , , .

    Colab, , Google Drive.
  11. zip Google Drive


As we can see, it took us 1500 seconds to download files stored on Google Drive from the 2010 catalog, and this is 25 minutes.

This is 25 minutes of downtime of your experiments with a neural network.

I hope the article was useful to you and now downloading a large number of files from Google Drive to Colab is no longer a problem.

Upload your training data hundreds of times faster than you did before.

Just four easy steps.

  1. Pack the Learning Base in a zip archive.
  2. Upload the zip file with the Learning Base to yourself on Google Drive
  3. Unzip the zip file with the Learning Base into Colab's memory
  4. Read all Colab memory files into your program

For all questions, write me an email

alexeyk500@yandex.ru

For those who need the code described in the article, welcome to me on GitHub .

All Articles