Machine Learning Pipelines with Kubeflow and Kubernetes

October 11, 2021

Kubernetes is increasing in popularity as a machine learning platform. This article walks through why you’d use Kubeflow for machine learning and introduces various platforms for hosting.

Taurai Mutimutema
Systems Analyst & Engineer

Managing the discovery, learning, simulation, and implementation of machine learning (ML) applications could be a lot simpler. I wouldn’t lay the blame on you for not doing an absolutely perfect job overseeing more than one ML project from start to finish. It’s complicated and messy, and there needs to be a mainstream solution.

Coming from a systems analysis and development background, continuous integration and deployment (CI/CD) with Kubernetes (K8s) sometimes seems wasted on overly simple procedures.

K8s and all it's glory

If only there was a way to apply it to ML projects.

Enter Kubeflow, an ML pipeline orchestration platform with end-to-end solutions for each stage of the typical data science project value chain. With Kubeflow, you’ll no longer be scrambling to get a “good enough” solution for your ML project, but will instead be able to attain that “perfect job” benchmark.

This article is the perfect launchpad for future Kubeflow power users. You’ll discover how to use Kubeflow through machine learning use cases. In the process, you can expect a brief comparison of the various environments and hosts that support Kubeflow at the time of writing.

K8s for Machine Learning: the Concept

If you’ve used K8s before, you’d know how much load it carries for a typical project. The official term is orchestration. Here’s why you might get interested when considering K8s for ML:

  1. Multi-platform compatibility
  2. Sandboxed Operations
  3. Automation with K8s
  4. Standardization of Processes
  5. Continous Deployment

Multi-platform Compatibility

All cloud platforms now support K8s. This opens the entire vendor spectrum and its features for use in your projects.

Sandboxed Operations

K8s processes execute on schedule and can be decoupled from their parent pipelines. This could come in handy when running iterations of a single stage of ML pipeline until you attain desired confidence levels. This also allows you to allocate enough GPUs and compute resources for specified performance areas.

Automation with K8s

Imagine not having to manually start each node in your ML process. While management celebrates cost savings, you enjoy spending less time babysitting processes. Now think how much easier the teaching and testing processes get when parallelism kicks in. You can scale your learning process across previously dreamy numbers of clients at the same time. You can even scale as needed with a few mouse clicks.

Standardization of Processes

If you were using K8s for machine learning, most processes would be documented as job variables. This makes life easier for other developers to join and contribute to the project in the future. The same cannot be said for manually handled ML projects. Often the devs present when the project kicks in are the only ones who can make sense of its progress and current state at any given time.

Continuous Deployment

Perhaps one of the most attractive attributes of K8s managed pipelines is how new versions are deployed. Your projects would stay fresh with pushed changes joining the main branch each time a test runs successfully.

Honestly, we could go on writing about the benefits of covering ML projects with the K8s blanket. Of interest is how the Kubeflow project materializes our daydream into actual features and benefits for ML projects.

Kubeflow Features At a Glance

Without rewriting the Kubeflow documentation space, it should suffice to picture the project as a set of tools that transform ordinary ML projects into MLOps workflows.

The gateway to all Kubeflow components

Kubeflow consists of seven distinct components:

  1. A dashboard for central reports.
  2. Notebooks for transparent containerization and authentication with Jupyter.
  3. The actual **pipelines. **Manageable through an interactive GUI.
  4. KFServe for the serverless deployment of projects.
  5. Katib for automated hyperparameter tuning.
  6. A set of training operators, including TFlow, PyTorch, and MXNET.
  7. Resource sharing and profile isolation with multi-tenancy.

Ideally, you can lay them end-to-end and create an ML pipeline. However, you can also cherry-pick and layout custom workflows and pipelines fitting your requirements.

To cut the learning process short, Kubeflow provides sample pipelines and use cases for you to review. You can follow through with the quickstart and examples provided in this Kubeflow documentation section for a hands-on experience of the platform.

Kubeflow Use Cases

Machine learning project instances differ from orthodox software projects mostly in how they start. The experimental and tuning phase is crucial to how the project owners benefit in the long run. This sensitivity and easy-to-complicate nature of ML projects are why Kubeflow exists.

Let’s discuss some application areas you can later match to your use cases in the future.

  1. Complex Machine Learning Systems
  2. Extensive Experimentation Instances
  3. Decouped ML Systems
  4. Time Sensitive ML Projects
  5. Automated ML Project Versioning

Complex Machine Learning Systems

The easiest resolve for complexity has always been throwing resources at it. Kubeflow throws this approach out of the window. Effectively simplifying ML projects by automating action points previously left to the data scientist.

As ML projects grow, the pressure to scale increases. That, along with how tough and destructive such ambitions can turn out. Kubeflow takes charge of the scaling process and oversees any additional containers and experimental instances you may elect to add to your project.

Extensive Experimentation Instances

Thanks to the multiple options available as training operators, you can run parallel processes on the same inputs. This allows you to observe wide arrays of results from as many operators as you deem necessary. What’s impressive with Kubeflow is how all this happens in less time than previously possible.

Decoupled ML Systems

Adding to the two cases above, you could be on the lookout for specifically isolated processes along a Machine Learning project’s workflow. For example, you could start a project on 127.0.0.1. However, your laptop GPUs are far from sufficient to run experiments effectively. For that phase, you’d be wise to use on-premise hardware. The deployment stage is best hosted in the cloud for better availability.

Such a hybrid pipeline makes the best use of separate environments all to uphold policy without compromising on performance.

Time Sensitive ML Projects

Without Kubeflow, something as simple as changing hyperparameters to match your euphoric conditions can bring processes to a halt. Placed at ends, these abrupt disruptions can cost you days in man-hours. These are almost always billable.

The Katib component in Kubeflow replaces manual changes to training parameters. Automating these alterations cancels out inertia, fatigue, and error allowances consistent with human interference.

Automated ML Project Versioning

To thoroughly understand how Kubeflow users continuously deploy fresh versions of their systems, let’s take a deep dive into the platform’s pipeline component.

What Is a Pipeline in Kubeflow?

Represented by a clean user graphic interface, a pipeline is a set of components included in the typical ML project’s procession. A detailed relationship is rendered from connected stops along the said parade. Each stop is a Kubeflow component or contained operators, with inputs and expected output cleared specified. Kubeflow allows you to view your pipelines from a central reporting interface.

The main objectives consistent with every Kubeflow pipeline are:

  1. Orchestrating ML project steps from start to finish.
  2. Streamline complex experimentation phases making them easy to scale.
  3. Document and standardize ML project workload variables for reusability.

Every pipeline specifies end steps—deployment directives. These actualize only when allowable results are attained from experiments. You can also initiate rollbacks by tracing logs and reversing deployment jobs.

It should be easy to imagine how every stage in a pipeline is mentioned where processes take place. We described a hybrid use case earlier. Now imagine a multi-vendor network: using either AWS, GCP, Azure, and IBM for the various stops as a project matures.

Cloud Options That Run Kubeflow

Every cloud platform boasts specialized Machine Learning tools exclusively available to its users. For this reason, you may find an MLOps project overlapping two or three vendors. Kubeflow is stable and executes smoothly across all available cloud service providers.

Let’s investigate why you might need a blended provider loadout.

  • AWS ML and Kubeflow:  AWS brings SageMaker to the table. Along with an end-to-end workflow orchestration tool, SageMaker offers a library of use cases that you can clone and implement into your projects. Learn more about AWS here.
  • Google Cloud Platform and Kubeflow: GCP is taking ML mainstream with no-code model training. Vertex IA leverages Google’s resources to pre-run experimentations over as wide a space as you need to shave chunks of time off your budget. Learn more about GCP here.
  • IBM and Kubeflow: Import Watson’s features into your Kubeflow pipelines and make your experiments more perceptive to real-life data such as voice, and video.
  • Azure and Kubeflow: Inherit comprehensive compliance and security features from Azure. Learn more about Azure here.

A Case For Kubeflow and MLOps


A case for machine learning and Kubeflow

So whether your enterprise choice for your next machine learning project is AWS, Azure, GCP, or IBM, you gain more from having Kubeflow in the mix than not. Just having an ML-specific version of K8s should be enough to get you started with quick installs and dissecting examples to infuse AI into your projects.

Unless you want to keep checking on your projects and fine-tuning for performance manually, Kubeflow makes so much sense. The fact that you can install and deploy ML projects on local machines, coupled with a growing community of data scientists contributing to its growth makes today an exciting opportunity to join in the fray.

With Kubeflow you can pull the best APIs and features from preferred cloud platforms and their AI solutions. And when things get complex, use ContainIQ to monitor metrics and events from all directions.

Article by

Taurai Mutimutema

Systems Analyst & Engineer

Taurai is a systems analyst with a knack for writing, which was probably sparked by the need to document technical processes during code and implementation sessions. He enjoys learning new technology and talks about tech even more than he writes.

Read More