Machine Learning (ML) on Google Cloud

A subfield of Artificial Intelligence, Machine Learning (ML) is everywhere in today’s world. Like social apps suggesting you items to buy based on your search, the power of data is brought in a new way. It allows programs or applications to learn through experience, from collecting data. Machine Learning helps to make predictions based on construction of an algorithm, in other words, that algorithm learns from the data in an iterative process. It is not self-responsive, rather, it is trained to be responsive.

Benefits of ML on Cloud

There are three major players within the cloud computing platform including Amazon (AWS), Microsoft (Azure) and Google (GCP). Google unveiled their new machine learning platform in 2016 allowing for TensorFlow to be available to data scientists and developers for cloud-based machine learning. The major players continue to develop ML innovation.

Out of the many benefits of using the cloud platform over a traditional system, the following can be highlighted:

  1. Training large amounts of data with plenty of compute power
  1. Experimentation with machine learning capabilities with ability to scale up as the demand increases
  1. The major player provides many machine learning options that don’t require deep knowledge of AI or machine learning theory or a team of data scientists

This article focuses primarily on Supervised Machine Learning, in which, during training you give your model a training input. After your model is trained it gives out ‘prediction’ as the output. Google has been using machine learning for a long time for its own applications like Google Keyboard, Google Photos, Google Maps, Google Chrome, Google Mail, Google Play Music, YouTube, among others, and now they have given their technology to the world to explore.

Machine Learning on cloud does not require any extensive study or compute resources like early days of the quest. Nowadays, any developer or data scientist with a computer and a machine learning problem can get extraordinary results with cloud computing. The end-to-end cycle of a machine learning model in GCP is shown in figure below.

Source: Google docs-AI & Machine Learning products

How ML is carried out today?

Machine Learning in today’s world is mainly carried out by scikit learn, XGBoost, Keras or TensorFlow, or maybe writing codes in Jupyter’s notebooks. Enterprises that are considering ML, might be experimenting, building proof of concepts (PoC’s) or they are scaling for production after training. No matter what their existing ML toolkit package is, GCP takes care of everything. Moreover, GCP offers resources for app developers as well as data scientists and ML practitioners, whether they want pre-trained models or custom models.

Source: Google I/O ‘18

Machine Learning API service offered by Google is mainly for developers, as it requires less knowledge of machine learning and the API is easily embedded in apps. ML API’s gives access to pre-trained models with a single REST API request.

In January of 2018, Google announced their new Machine Learning resource called Cloud AutoML, which enables developers with limited machine learning expertise to train high-quality models specific to their business needs. Cloud AutoML sets somewhere between the app developers and data scientists. To create custom models and have good control of machine learning, ML Cloud Engine is a better option as it allows you to train and serve your model at scale.

Another service keeping not only data scientists but software and data engineers in mind as well, is Kubeflow. If a team wants to share models and ML workflows within an organisation, they can use this service as it integrates models into different parts of the business.

Finally, the most do it yourself (DIY) solution is GCE & GKE (Google Compute Engine & Google Kubernetes Engine). If you have a ML framework other than TensorFlow and you want to use GCP services, the GCE & GKE plays a fair role in those cases.

We can understand the differences and where the services could be used in the real world, or in applications, to better comprehend the need of prediction.

Machine Learning as an API:

GCP has 5 API’s that gives access to pre-trained models to accomplish common machine learning tasks.

  1. Cloud Vision:

The Cloud Vision API lets you detect text in images OCR (optical character recognition), handwriting in images (OCR), text in files (OCR -PDF/TIFF), crop hints, faces, image properties, labels, landmarks, logos, multiple objects, explicit content (SafeSearch), web entities and pages, batch image annotation offline, small batch file annotation online and offline by using cloud vision with spring framework and base64-encoding for sending a local image.

  1. Cloud Speech:

It accurately converts speech into text. One can transcribe content with accurate captions. It can be useful to improve service by gaining insights from customer interactions.

  1. Cloud Natural Language:

It helps in providing natural language understanding technologies, such as sentiment analysis, entity sentiment analysis, entity recognition and other text annotations.

  1. Cloud Video Intelligence:

This API helps developers to annotate videos using features like object tracking shot analysis, labelling, explicit content detection, tracking objects, recognising logos, recognising text, performing speech transcription in videos, face detection, person detection, streaming annotation on a video file and streaming annotation from a live stream.

  1. Cloud Translation:

This API helps translates texts into 100+ languages for website and apps.

Some of the companies that are using ML APIs into production are GIPHY, Hearst Newspapers, Descript, Seenit, Maslo and many more.


AutoML comes into play when you want to train the APIs on your own custom data. For example, it will let you train the model on your own image data. Some of its features include generating high-quality training data in which Google’s human labelling service can annotate and clean the labels for you to make sure models are being trained on high-quality data. Some of the AutoML’s products include: AutoML Vision, AutoML Video Intelligence (Beta), AutoML Natural Language, AutoML Translation, AutoML Tables (Beta). An example of the companies integrating AutoML Vision are Disney, Urban Outfitters, ZSL and many more.

Cloud ML Engine:

This engine will help you to accomplish a custom prediction task specific to your dataset or use case. It offers training and prediction services and can be use together or individually. The tools to build, train and serve your own model are TensorFlow and ML Engine. TensorFlow is used to build and ML Engine to train and serve the models at scale. The implementation of Cloud ML Engine has transformed the business companies identifying clouds in satellite images, companies ensuring food safety, and companies who care about customer satisfaction and respond to them four time faster with emails.

The complex models can be trained by leveraging the power of GPUs and TPUs (Tensor Processing Unit – used as an AI accelerator application-specific integrated circuit developed by Google specifically for NN). As an outcome, a fully trained machine learning model becomes ready to be hosted in other environments including on-premises and public cloud. Also, this service can also be used in deploying a model trained in external environments.

Apart from training and hosting on other platforms, the hyper parameters can also be tuned to improve the accuracy and other metrics of the model. If the option to modify hyper parameter was not there, data scientists would have to experiment with multiple values while evaluating the accuracy of results.  

There are basically two types of custom models: the first one is transfer learning and the second model is train from scratch. In transfer learning, if you do not have enough data points and want to train your custom model, it takes the predictions or weights from pre-trained models (it is trained on what you are trying to do) that is trained on hundreds and millions of images or data points which add a layer on your custom model to train it. Building a model from zero, you can train your model on any environment and train it from scratch.

The figure below shows the steps involved in training and predicting from a machine learning model deployed in Google Cloud ML Engine.

Source: Google documentation Cloud ML Engine
  1. Data Preparation:

It involves ingestion of data and preparing it for machine learning experiments. This is placed outside ML Engine. The data is analysed and explored by machine learning practitioners or data scientists making it easier to evolve in a model. The typical steps involved in analysing and exploring are identifying missing values, splitting existing columns, removing duplicates, removing anomalies and so on. All these steps can be fulfilled by services provided at GCP like BigQuery, Cloud Dataproc, Cloud Dataflow, Cloud Dataprep and others. Furthermore, the data is directed to Cloud Storage bucket which makes it available to the distributed training job initiated by ML Engine. Datasets prepared outside GCP can also be added to Cloud Storage.

  1. Model Creation:

It is the most important phase as machine learning practitioners/ data scientists code the model in their local environment. Python based toolkits are used in ML Engine for creating machine learning models. The supported toolkits include Scikit-learn, SGBoost and TensorFlow. After testing the code, this is submitted to ML Engine.

  1. Model Training:

This phase in lifecycle of ML model is a critical one as it involves evaluation and tuning of parameters to increase the accuracy of model. Cloud ML Engine also provides the hyperparameter tuning of complex models such as Artificial Neural Networks. The final model is ready to be deployed when the predicted labels match with the actual labels for most of the data points in the data. In most cases, the training is done by using cloud resources utilising clusters of GPU and CPU.

  1. Model Deployment:

The model is uploaded to Cloud Storage Bucket after training. It is serialized into a format that is supported, for example the TensorFlow models are serialized into Checkpoint or ProtoBuf files. With time, new trained models can be found in Cloud Storage Bucket as they are evolved with new data points as new versions.

  1. Monitor:

After the model is deployed, it can be predicted into two ways – online prediction and batch prediction. Online prediction deals with one datapoint at a time whereas batch prediction can work on the entire dataset. The predictions can be monitored with the help of the Stackdriver which is integrated with Cloud ML Engine as the monitoring tool of GCP.


If your team is working on ML workflows based on Kubernetes, then Kubeflow can make the deployment simple, portable and scalable. Kubeflow Pipeline enables you to port data into an accessible format and runs data cleaning, analyse components and trained models, scale the trained models and much more. The Kubeflow pipeline consist of:

  • A UI (user interface) for managing jobs and tracking experiments and runs.
  • It offers an engine which is allows to schedule multi-step ML workflows.
  • For manipulating and defining pipelines and components, you get an SDK (Software Development Kit).
  • Notebooks for interacting with a system that is using SDK.

The Kubeflow Pipeline makes it easy for a user to try numerous ideas and techniques plus manage various experiments or trails. These pipelines can be used easily for end to end solutions, you do not have to re-build them each time. It is helpful in running multiple or hybrid environments (swapping between on-premises and cloud) and helps reuse across different workflows. It also provides support for visualisation and collaboration in ML workflow.

ML on Google Cloud: Conclusion

Since Google created open-sourced TensorFlow, it is the most widely used among ML enthusiasts. Both Amazon and Microsoft support TensorFlow in their ML services as well. Google provides a good-general purpose and specialised ML services; from which you can choose and go with the platform you are already working on. At the time, the framework with the broadest support is TensorFlow, but it is a rapidly changing field and even Google has now introduced support for scikit-learn and XGBoost.

The Migration Company has an extensive team of members covering all the resources and services you need for AI (Artificial Intelligence) and Deep Learning for all three platform, whether it is GCP, AWS or Microsoft Azure. We are regularly updating our resources with the new technology based on our customer needs. If you want to know about integrating any of the above paths with your technology, please let us know. All the best for your machine learning efforts!


  • Unruh, A. (2018). Getting started with Kubeflow Pipelines. [online] Google Cloud Blog. Available at: [Accessed 17 Jul. 2020].
  • Kubeflow. (n.d.). Overview of Kubeflow Pipelines. [online] Available at: [Accessed 18 Jul. 2020].
  • Academy, C. (2018). What are the Benefits of Machine Learning in the Cloud? [online] Medium. Available at: [Accessed 19 Jul. 2020].
  • Hormozi, E., Hormozi, H., Akbari, M.K. and Javan, M.S. (2012). Using of Machine Learning into Cloud Environment (A Survey): Managing and Scheduling of Resources in Cloud Systems. [online] IEEE Xplore. Available at: [Accessed 19 Jul. 2020].
  • Google Cloud Platform (2018). Intro to machine learning on Google Cloud Platform (Google I/O ’18). YouTube. Available at: [Accessed 19 Jul. 2020].
  • lbarra, F. (2016). Google takes Cloud Machine Learning service mainstream. [online] Google Cloud Blog. Available at: [Accessed 19 Jul. 2020].
  • MSV, J. (2018). Google Cloud ML Engine: Train, and Deploy Machine Learning Models. [online] The New Stack. Available at: [Accessed 19 Jul. 2020].
  • AI Hub. (n.d.). Cloud Machine Learning Engine. [online] Available at: [Accessed 19 Jul. 2020].

trending posts:


Follow us on Instagram

View on Instagram
View on Instagram
View on Instagram
View on Instagram