Take, for example, a platform that bids for online advertisement placements. This is a platform that would require some human input in the form of what advertisements they want to display, some optional hard bidding limits, and a configurable aggressiveness factor. The system itself could be driven by a machine-learned agent that makes bids, monitors the click-rates of an ad, and potentially makes online adjustments to itself to optimize bidding patterns.
Product-focused data science and machine learning comes with a whole new set of challenges that typical data science projects are not constrained by. In product-focused projects, data scientists work with multidisciplinary teams of designers, software engineers, and product owners to make sure their models are aligned with business objectives, created within the constraints of the system, and delivered in an agile timeframe.
Over the years we have seen some common scenarios across multiple projects and have accumulated some techniques on how to mitigate risk, deliver value quickly, and build a robust plan for the future.
Here are some points of advice for those looking to build a data-driven product.
Acquire and analyze data early
Successful machine learning and data science products live and die by data. In Kaggle competitions and some domains of academic research, data is clean, accessible, trustworthy, and abundant enough to train a model from. Industrial data science data, however, is typically unformatted, noisy, and strictly governed.
One of the biggest challenges we’ve faced has been cutting the red tape just to get our hands on the right dataset. Enterprises have a treasure trove of data sitting idly in their warehouses, but between your team and that data sits multiple departments (legal, IT, governance) who needs to approve the transfer, a potential negotiation process to buy the access rights, and a team of data engineers to settle on the data contract all before the data is transferred to your team.
Without access to this data, a data scientist can only conjecture what they can do with it. It’s impossible to correctly assume that this data is ready for modeling or even has the signal needed to hit a target KPI. Getting the data early allows the team to return quick feedback before going too deep down a potentially unfeasible modeling path.
We recommend starting each data-driven product with a short “proof of value” phase. This is where a small team goes through the ropes of acquiring the data needed, establishes a baseline with an initial naive model, and sets attainable model KPIs based on that model. This is a low-risk way to verify the problem you are solving is possible with a small pool of resources.
Empathize with your end-users
When you’re building a product, you’re really building a tool to solve a problem to be leveraged by end-users. Users work in and interact with a product in a multitude of ways. Data-driven products add a focus on the process of receiving suggestions from your models and giving models feedback to learn from. To build a successful data-driven product, it’s crucial to first understand how your user plans to interact with your product, what they expect to see from the model, what control they have over outputs, and how they can provide feedback to the system.
The web application space has refined its design process for successful products by heavily incorporating a research phase. This phase typically includes building personas, gaining an understanding of both users and the machine through empathy mapping, and conducting user interviews. The output of this phase is the design of an interface that a product owner can be confident about and a team of engineers (and data scientists!) can execute on.
At Dialexa we’ve successfully injected data-focused prompts and questions into these tools to get insight into what a user actually wants from a model. These new data points give the data scientists metrics to hit, requirements on model architectures, and many times new features that they may have never considered!
One great example of an intelligent feature in a product is AirBnB’s listing price suggestion. Some great takeaways from this feature that could be discovered in the research phase are:
- It’s just a suggestion, give the user control of the final price
- They give top factors on why a price was selected
- They allow users to give direct feedback on their pricing models
This feature isn’t perfect and has been criticized for pricing listings too low among other complaints. These could be addressed by again empathizing with your user’s concerns. I believe there are multiple areas for improvement on this feature based on the feedback. One way to gain trust in their users would be to invest in a model that outputs a confidence interval with the decision. They may have to sacrifice some accuracy but, as long as it’s still acceptably accurate, the end-users would likely be happier with the feature as a whole.
Empathize with your models
Just like the end-users, models need love too. These models aren’t standalone — the whole team has to get on the same page so the engineers can write supporting software, the designers can wireframe UIs, and the stakeholders can set their delivery expectations. It’s crucial to get the whole team on the same page by gathering requirements before going gung-ho on building a model or a model-based feature.
One of the approaches that we’ve picked up comes from our research and design team. We’ve adapted the user empathy map to empathize with a model-based feature. Here’s a great article describing the process in-depth. The gist of the exercise is to get the team thinking about the feature and take notes on the following:
- Senses — What data and variables does the model need?
- Does — What does the model output and what actions are taken?
- Says — How does the user know why the model made a decision?
- Thinks — What hard rules does the feature have to follow?
- Feels — How do we know the feature is doing what we expect?
These are our interpretations of the categories that have worked well for our team. Some categories like “says” and “feels” can be particularly hard to wrap your head around. We prime the team to start thinking in the right direction by providing examples of a similar feature. For example, some sticky notes for the AirBnB price suggestion tool could be:
- Location data of the rented unit
- Day of the week of the listing
- A suggested price
- A range of good prices
- Similar listings in the area
- Breakdown of pricing factors
- Can’t go below the minimum break-even price
- Are there legal considerations?
- Direct user feedback from the tool
- Are users staying within the range?
The output of this session is a shared understanding and a clear set of requirements for all players on this feature. At a high level, data scientists can start designing a model architecture, engineers can plan work for the new data feeds and API endpoints, designers can wireframe components, and the product owner knows exactly what’s going to be delivered. What an incredible exercise!
Start with simplicity, expand with complexity
On a product team, many times a data scientist’s work is a dependency of other team member’s work. A backend engineer can’t effectively develop and test software to support their model until they have access to it. On top of this, a product owner might want to push out the feature for beta testing sooner than the team can optimize the model.
The first and most important action to take in this situation is to communicate with your team. Document the expected inputs and outputs to work around and set expectations on when a model will be ready. This should be flushed out at a high level after a model empathy map! The next option to consider is to not use a machine learning model or drastically simplify the approach.
One of the hardest realities for a tried-and-true machine learning engineer to cope with on a product team is that machine learning is a means to an end, not the end itself. As a machine learning engineer who loves to read and learn about the bleeding-edge advances in the field — it pains me to write that. But in reality, most features are supported successfully by a naive model or even a heuristic — you don’t need deep learning to solve every problem.
Quickly deploying a simple model, or at least the interface for a model, unblocks the rest of the team to get their gears turning. Engineers can quickly start developing off of that model with confidence and stakeholders can monitor the KPIs of the model in the product and turn on the full feature when it’s acceptable.
Take for example the AirBnB price suggestion model again. After defining the full-fledged feature, the team can build and deploy a quick heuristic engine by using the averaging listing prices in the surrounding area. The engineers can develop off of that heuristic-based model and hide it behind a feature flag, waiting to be turned on in production. Meanwhile, the data science team, SMEs, and product owners can work together to iterate on the model until it’s ready and then release it to the end-user.
This is a process that has worked wonders for us. We’ve been able to quickly iterate through more and more advanced models, test the models in a production-like environment, and release model-driven features with complete safety.
These are just a few of the many techniques and processes our teams have adopted for delivering data-driven products. There are many more lessons learned along the way that we’re eager to share and help other product teams implement.