A Brief Introduction

  • A/B tests are also called split tests. In much of science, these are also referred to as randomized controlled trials.
  • They are a data-driven way to prove to the business stakeholders that your solution (say a new prediction model) gives a tangible lift to appropriate business metric (e.g., sales, conversion, engagement etc).
  • The greatest advantage and dis-advantage of A/B testing is that they are easy to understand as well as easy to mis-understand.
  • Using them while doing ML model deployments is a great way to hedge risks (of rolling out something that does not work) as well as demonstrate real value to stakeholders (on a subset of users).
  • There are better tools such as multi-armed bandits and variants to address regret (loss due to using sub-optimal treatments/decisions in hindsight).

Vendors

Bandit Feedback

  • This is a term borrowed from the ML research community, which itself borrows it from the casino/gambling vocabulary.
  • The idea is that we only receive feedback (say from a user) on the decision that we took and we have no idea what would have happened if we took a different decision. This essentially is called a bandit feedback.
  • In the full information feedback setting (which is the alternative), one would not only be able to score how well the decision they took, but retroactively assess how any other decision would score.
Example 1
  • First lets look at an example of full information feedback. Consider an image tagging service, that can also collect feedback.
  • In this service, a user uploads an image, and the ML model predicts it class and outputs that as the tag (e.g., a cat or a dog or a fox).
  • The user can accept that the tag is correct or let us know what the right tag is.
  • This is considered full information feedback.
  • If the model predicted the tag as a cat and the user gives a feedback saying that the tag is actually dog. Then we can know how our prediction performance would be if we had predicted a cat or a dog or a fox.
  • What is the user only gives feedback that the tag is correct or wrong. This would be an instance of partial or bandit feedback. Here, if the tag turns out to be wrong, we may never know what the right tag is (unless someone tells it explicitly).
Example 2
  • The following example illustrates how one can consider the feedback obtained in a recommendation system as both full information or bandit feedback (based on assumptions)
  • Lets say we deployed a model and it recommends list of movies.
  • The recommended list is (sometimes after post-processing based on business rules) is shown to the user.
  • The user may click/act on some of the recommendations.
  • We can collect this information and figure out how effective our current recommendation list was (i.e., we can obtain a score).
  • This score is a function of the recommendation list we provided and the items that the user actually liked.
  • If we have this score function, we can also score how well any other recommendation list would have been. In this case, the feedback is called full information feedback.
  • But what if we did not have such a scoring function? Then we only know how the current recommended list fared, but cannot find a score for how any other recommendation list would have worked.