The time to responsibly scale AI is now. Here’s how enterprises can empower data scientists to deploy ML models at scale while avoiding common pitfalls and positioning the business for stronger ROI.
Customers tell us, and analysts agree, that artificial intelligence (AI) and machine learning (ML) tools are critical to surfacing business insights, making data-driven decisions, and informing enterprise transformation strategies.
Yet despite enterprises’ efforts to build skilled data analytics and data science teams, a majority of AI projects fail to make it into production. In Google’s conversations with enterprise leaders, we’ve found that this difficulty stems from many culprits, such as disparate teams operating in silos, with mismatched skill sets and disconnected hand-off points in the workflow. The bottom-line consequences can be costly: poor talent retention, missed opportunities, wasted resources, and worst of all, low ROI. For example, a recent study from Accenture shows only 32% of survey respondents are generating tangible value from their data investments.
Both within Google and working with our customers, we’ve found that to avoid these pitfalls, organizations need to give data scientists a top-quality experience that lets them focus on their work, not on toggling between different interfaces, chasing down data, or trying to work across organizational silos. As this article will elaborate, creating these experiences may include things like centralizing creation and management of ML models (as a unified strategy is generally a prerequisite to data efforts and ML efforts working hand in hand), as well as embracing MLOps strategies and frameworks for responsible, sustainable AI programs.
According to Matt Ferrari, Head of Ad Tech, Customer Intelligence, and Machine Learning at eCommerce provider Wayfair, one of the main principles of ML implementation at their organization is ensuring that systems empower data scientists to be productive.
“We’re doing ML at a massive scale, and we want to make that easy. That means accelerating time-to-value for new models, increasing reliability and speed of very large regular re-training jobs, and reducing the friction to build and deploy models at scale,” he said, explaining that Wayfair chose Google Cloud services such as Vertex AI to give his teams a unified, extensible platform for ML development, so they could increase experimentation, reduce coding, and ultimately move more models into production. “Certain large model training jobs are 5-10x faster with Vertex AI, and it offers our data scientists hyperparameter tuning. This enables us to weave ML into the fabric of how we make decisions.”
“We have found that flexibility is key to empowering our talent. The particular tools and techniques we’re most familiar with make us most productive. To be able to include this in our ML platform is the best way to enable everybody to be as productive as possible.”
Similarly, Jaime Espinosa, Head of ML at Twitter, said adopting a flexible and open platform was critical to enabling their engineers to be as productive as possible. “We have found that flexibility is key to empowering our talent. The particular tools and techniques we’re most familiar with make us most productive. To be able to include this in our ML platform is the best way to enable everybody to be as productive as possible. One of the biggest predictors for success of our projects is how well the talent understands the ML and the infrastructure, and the base product designs.”
The business impact that ML models deliver is inextricably linked to the tools and environments available to ML practitioners. So, how do organizations take steps to empower their teams, and thus to increase the usefulness of the ML models they build and deploy? Here are three steps.
Related: Practitioners Guide to MLOps: A framework for continuous delivery and automation of machine learning
3 steps to increase the viability of your ML models
Table of Contents
1. Invest in a centralized AI platform
We recommend that enterprises invest in a centralized platform to:
- Create and manage ML models within one unified environment. Why? This helps data scientists to build and train models without needing to context switch, which can hamper productivity.
- Streamline and scale collaboration across all levels of technical expertise. Why? Providing the flexibility within one platform for individuals of diverse skill sets to perform their best work leads to stronger business outcomes.
- Transition to managed cloud services. Why? This helps data science teams focus on work, rather than operational IT challenges, and enables more diverse teams to participate in building and delivering ML models.
- Ensure data efforts and AI efforts are symbiotic. Why? This helps to break down silos between data and AI. Forming a unified strategy means that enterprises can get the best out of their data, and the best out of their ML systems for higher performing models.
These recommendations have shaped how we’ve developed products such as Vertex AI Workbench, a platform that provides data scientists with a single environment for the entire data-to-ML workflow. Model development duration, scalable deployment and model management are three of the top five challenges in scaling AI initiatives, said Ritu Jyoti, group vice president, AI and Automation Research Practice at IDC, in the company’s AI StrategiesView 2021 research.
Jyoti notes that, “Vertex AI Workbench provides a collaborative development environment for the entire ML workflow—connecting data services such as BigQuery and Spark on Google Cloud to Vertex AI and MLOps services. As such, data scientists and engineers will be able to deploy and manage more models, more easily and quickly, from within one interface.”
2. Embrace MLOps
We also recommend that organizations Invest in centralized MLOps—that is, in not just building models but also deploying tools for governance, security, and auditability. In addition to being crucial for compliance in regulated industries, these tools are essential for protecting data, understanding why given models fail, and determining how models can be improved.
The balance can be delicate. Without robust processes and resources for MLOps, increased model production can easily come at the cost of model management and governance. Similarly, if MLOps are not centralized, silos and heavy-handed governance can hamper production of new models before they ever get off the ground.
This is why Vertex AI features a full set of MLOps capabilities. In the case of auditability, for example, Vertex ML Metadata tracks the inputs and outputs of an ML pipeline as well as the lineage of artifacts. Once models are in production, Vertex Model Monitoring ensures models are behaving as expected, alerting data scientists to data drift or other problems. These capabilities speed up debugging and create the visibility required for regulatory compliance and good data hygiene in general. Likewise, the platform is built with security at the core, with features such as Private Endpoints to minimize the risk of breaches and provide peace of mind that models, and their underlying data, are protected.
3. Invest in responsible AI
Alongside the remarkable opportunity for innovation, AI technology also raises important questions about the fairness, interpretability, privacy, and security of these systems. Developing responsible AI requires an understanding of the possible issues, limitations, or unintended consequences, and in order to truly scale, organizations should invest in an approach to build and use AI responsibly.
At Google Cloud, we’ve learned firsthand that establishing a responsible AI governance process, is a critical step in assessing the multifaceted ethical issues that arise in AI projects, products and deals, and we recommend that organizations implement a governance process that meets the needs of their business.
Defining responsible AI commitments, along with an approach to operationalize those commitments, can provide a common foundation to evaluate AI models in a systematic, repeatable way across product areas and geographies. Google’s AI Principles serve as our company-wide commitments.
Once an organization has established a responsible AI governance process, Vertex offers tools to inspect and understand AI models at key stages along the development lifecycle. Vertex’s built-in tools–such as Explainable AI, What-If Tool, Fairness Indicators and Model Monitoring–can help organizations scope a project effectively, explore and understand their data, train, test and improve models for responsible AI goals, monitor a model in production, and share information transparently with stakeholders. These tools have gone through Google’s own responsible AI review process so organizations can trust they are built with ethical governance.
Ensuring AI is developed and used responsibly matters. At Google, we believe it is not only the right thing to do, it is a critical component of creating successful AI at scale and earning customer trust. Investing in a responsible AI approach and process.
The time to scale with AI is now
Establishing well-tuned and appropriately managed ML systems has historically been challenging, even for highly skilled data scientists with sophisticated systems. This has led to some ambivalence about the value ML can deliver. However, with three key principles of investment (centralized ML platforms, MLOps, and robust responsible AI principles) enterprises can better position themselves for strong returns on investments, all while avoiding many of the common pitfalls that have plagued development of ML frameworks.
Moreover, every data scientist should be able to leverage the tools that enable them to do their best work, and deliver tangible business impact. The time to scale with AI is now.
Unlock the value of your data with Google Cloud. Learn more.