A practical example of how to run Thompson Sampling in R
Thompson Sampling is an algorithm for decision problems where actions are taken in sequence balancing between exploitation which maximizes immediate performance and exploration which accumulates new information that may improve future performance. There is always a trade-off between exploration and exploitation in all Multi-armed bandit problems.
Currently, Thompson Sampling has increased its popularity since is widely used in on-line campaigns like Facebook, Youtube, and Web Campaigns where many variants are served at the beginning, and as time passes, the algorithm gives more weight to the strong variants.
In Bayesian probability theory, if the posterior distributions p(θ | x) are in the same probability distribution family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function p(x | θ)
Let us focus on the binomial case since in digital marketing we care mostly about CTR and Conversion Rates which follow binomial distribution. According to Bayesian Theory, the conjugate prior distribution of a Binomial Distribution is Beta Distribution with Posterior hyperparameters α and β which are the successes and the failures respectively.
Notice that The exact interpretation of the parameters of a beta distribution in terms of the number of successes and failures depends on what function is used to extract a point estimate from the distribution. The mean of a beta distribution is α/(α+β), which corresponds to α successes and β failures Bayesians generally prefer to use the posterior mean rather than the posterior mode as a point estimate, justified by a quadratic loss function, and the use of α and β is more convenient mathematically, while the use of α -1 and β -1 has the advantage that a uniform Beta(1,1) prior corresponds to 0 successes and 0 failures. The same issues apply to the Dirichlet distribution.
In practice, we want to maximize the expected number of rewards (i.e. clicks, conversions) given the actions (i.e. variants) and the current context. Conceptually, this means that at the beginning we serve the variants randomly and then we re-adjust our strategy (i.e. the distribution of the variants) based on the results.
The question is how do we get the weights for every state and the answer is with Monte Carlo Simulation. We assume the variants follow the Binomial distribution and according to Bayesian theory, their Conjugate prior distribution follows the Beta distribution with hyperparameters α = responses+1 and β = no responses + 1.
Let’s give an example of three variants:
- Variant 1: N=1000, responses=100 (i.e RR=10%). The conjugate prior will be Beta(a=101, b=901)
- Variant 2: N=1000, responses=110 (i.e RR=11%). The conjugate prior will be Beta(a=111, b=891)
- Variant 3: N=100, responses=10 (i.e RR=10%). The conjugate prior will be Beta(a=11, b=91)
Every time we should serve the variant with the highest RR. This is done by Monte Carlo Simulation. In the example above, we sample 5000 observations of each beta distribution by picking each time the variant with the highest value. The weights we got were:
Once we adjust the weights and we get new results, we follow the same approach again.
1. Microsoft Azure Machine Learning x Udacity — Lesson 4 Notes
2. Fundamentals of AI, ML and Deep Learning for Product Managers
3. Roadmap to Data Science
4. Work on Artificial Intelligence Projects
It is quite easy to apply the Thompson Sampling in R. We will work with the three variants of our example above. Notice that the variant 3 has fewer impressions than the other two that is why is less steep as a distribution.
# vector of impressions per variant
b_Sent<-c(1000, 1000, 100)
# vector of responses per variant
b_Reward<-c(100, 110, 10) msgs<-length(b_Sent)
# number of simulations
N<-5000
# simulation of Beta distributions (success+1, failures+1)
B<-matrix(rbeta(N*msgs, b_Reward+1, (b_Sent-b_Reward)+1),N, byrow = TRUE)
# Take the percentage where each variant
# was observed with the highest rate rate
P<-table(factor(max.col(B), levels=1:ncol(B)))/dim(B)[1]
P
and we get:
> P 1 2 3
0.1368 0.4496 0.4136
Notice that since we are dealing with a Monte Carlo Simulation, if you do not set a random seed you will get slightly different results each time.
Above we showed what would be the weights the suggested weights based on the observed results of the three variants. Let’s provide an example where we are dealing with 4 Variants where we know their ground truth probability and let’s see how the Thompson Sampling will adjust the weights in every step.
Let’s assume that the ground truth conversion rates of the 4 variants are:
Our strategy is to update the weights in every 1000 impressions and let’s assume that the total sample size is 10000 (i.e. we will update the weights ten times).
output<-{}
b_Probs<-c(0.10,0.11,0.12, 0.13)
b_Sent<-rep(0, length(b_Probs))
b_Reward<-rep(0, length(b_Probs))
batch_size<-1000
N<-10000
steps<-floor(N/batch_size)
msgs<-length(b_Probs) for (i in 1:steps) {
B<-matrix(rbeta(1000*msgs, b_Reward+1, (b_Sent-b_Reward)+1),1000, byrow = TRUE)
P<-table(factor(max.col(B), levels=1:ncol(B)))/dim(B)[1]
# tmp are the weights for each time step
tmp<-round(P*batch_size,0)
# Update the Rewards
b_Reward<-b_Reward+rbinom(rep(1,msgs), size=tmp, prob = b_Probs) #Update the Sent
b_Sent<-b_Sent+tmp #print(P) output<-rbind(output, t(matrix(P))) }
# the weights of every step
output
And we get:
> output
[,1] [,2] [,3] [,4]
[1,] 0.239 0.259 0.245 0.257
[2,] 0.002 0.349 0.609 0.040
[3,] 0.003 0.329 0.533 0.135
[4,] 0.001 0.078 0.386 0.535
[5,] 0.001 0.016 0.065 0.918
[6,] 0.002 0.016 0.054 0.928
[7,] 0.002 0.095 0.154 0.749
[8,] 0.003 0.087 0.067 0.843
[9,] 0.001 0.059 0.069 0.871
[10,] 0.006 0.058 0.116 0.820
As we can see, even from the 5th step the algorithm started to assign more weight to the variant 4 and almost nothing to the variant 1 and variant 2.
Let’s see graphically how the Weights evolve across time.
library(tidyverse)df<-as.data.frame(output)%>%mutate(Round=as.factor(row_number()))my_plot<-df%>%gather(Variants, Weights, -Round)%>%ggplot(aes(x=Round, y=Weights, fill=Variants))+geom_bar(stat="identity")my_plot
There exist other Multi-Armed Bandit algorithms like the ε-greedy, the greedy the UCB etc. There are also contextual multi-armed bandits. In practice, there are some issues with the multi-armed bandits. Let’s mention some:
- The CTR/CR can change across days as well as the preference of the variants(seasonality effect, day of the week effect, the fatigue of the campaign etc)
- By changing the weights of the variants it affects the CTR/CR mainly because:
a) Due to the cookies, it affects the distribution of the new and the repeated users
b) The results are skewed due to the late conversions
<!DOCTYPE html>
<html>
<head>
<meta charset=”utf-8″ />
<style>body{background-color:white;}</style>
<script>(function() {
// If window.HTMLWidgets is already defined, then use it; otherwise create a
// new object. This allows preceding code to set options that affect the
// initialization process (though none currently exist).
window.HTMLWidgets = window.HTMLWidgets || {};
// See if we’re running in a viewer pane. If not, we’re in a web browser.
var viewerMode = window.HTMLWidgets.viewerMode =
/bviewer_pane=1b/.test(window.location);
// See if we’re running in Shiny mode. If not, it’s a static document.
// Note that static widgets can appear in both Shiny and static modes, but
// obviously, Shiny widgets can only appear in Shiny apps/documents.
var shinyMode = window.HTMLWidgets.shinyMode =
typeof(window.Shiny) !== “undefined” && !!window.Shiny.outputBindings;
// We can’t count on jQuery being available, so we implement our own
// version if necessary.
function querySelectorAll(scope, selector) {
if (typeof(jQuery) !== “undefined” && scope instanceof jQuery) {
return scope.find(selector);
}
if (scope.querySelectorAll) {
return scope.querySelectorAll(selector);
}
}
function asArray(value) {
if (value === null)
return [];
if ($.isArray(value))
return value;
return [value];
}
// Implement jQuery’s extend
function extend(target /*, … */) {
if (arguments.length == 1) {
return target;
}
for (var i = 1; i < arguments.length; i++) {
var source = arguments[i];
for (var prop in source) {
if (source.hasOwnProperty(prop)) {
target[prop] = source[prop];
}
}
}
return target;
}
// IE8 doesn’t support Array.forEach.
function forEach(values, callback, thisArg) {
if (values.forEach) {
values.forEach(callback, thisArg);
} else {
for (var i = 0; i < values.length; i++) {
callback.call(thisArg, values[i], i, values);
}
}
}
// Replaces the specified method with the return value of funcSource.
//
// Note that funcSource should not BE the new method, it should be a function
// that RETURNS the new method. funcSource receives a single argument that is
// the overridden method, it can be called from the new method. The overridden
// method can be called like a regular function, it has the target permanently
// bound to it so “this” will work correctly.
function overrideMethod(target, methodName, funcSource) {
var superFunc = target[methodName] || function() {};
var superFuncBound = function() {
return superFunc.apply(target, arguments);
};
target[methodName] = funcSource(superFuncBound);
}
// Add a method to delegator that, when invoked, calls
// delegatee.methodName. If there is no such method on
// the delegatee, but there was one on delegator before
// delegateMethod was called, then the original version
// is invoked instead.
// For example:
//
// var a = {
// method1: function() { console.log(‘a1’); }
// method2: function() { console.log(‘a2’); }
// };
// var b = {
// method1: function() { console.log(‘b1’); }
// };
// delegateMethod(a, b, “method1”);
// delegateMethod(a, b, “method2”);
// a.method1();
// a.method2();
//
// The output would be “b1”, “a2”.
function delegateMethod(delegator, delegatee, methodName) {
var inherited = delegator[methodName];
delegator[methodName] = function() {
var target = delegatee;
var method = delegatee[methodName];
// The method doesn’t exist on the delegatee. Instead,
// call the method on the delegator, if it exists.
if (!method) {
target = delegator;
method = inherited;
}
if (method) {
return method.apply(target, arguments);
}
};
}
// Implement a vague facsimilie of jQuery’s data method
function elementData(el, name, value) {
if (arguments.length == 2) {
return el[“htmlwidget_data_” + name];
} else if (arguments.length == 3) {
el[“htmlwidget_data_” + name] = value;
return el;
} else {
throw new Error(“Wrong number of arguments for elementData: ” +
arguments.length);
}
}
// http://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
function escapeRegExp(str) {
return str.replace(/[-[]/{}()*+?.\^$|]/g, “\$&”);
}
function hasClass(el, className) {
var re = new RegExp(“\b” + escapeRegExp(className) + “\b”);
return re.test(el.className);
}
// elements – array (or array-like object) of HTML elements
// className – class name to test for
// include – if true, only return elements with given className;
// if false, only return elements *without* given className
function filterByClass(elements, className, include) {
var results = [];
for (var i = 0; i < elements.length; i++) {
if (hasClass(elements[i], className) == include)
results.push(elements[i]);
}
return results;
}
function on(obj, eventName, func) {
if (obj.addEventListener) {
obj.addEventListener(eventName, func, false);
} else if (obj.attachEvent) {
obj.attachEvent(eventName, func);
}
}
function off(obj, eventName, func) {
if (obj.removeEventListener)
obj.removeEventListener(eventName, func, false);
else if (obj.detachEvent) {
obj.detachEvent(eventName, func);
}
}
// Translate array of values to top/right/bottom/left, as usual with
// the “padding” CSS property
// https://developer.mozilla.org/en-US/docs/Web/CSS/padding
function unpackPadding(value) {
if (typeof(value) === “number”)
value = [value];
if (value.length === 1) {
return {top: value[0], right: value[0], bottom: value[0], left: value[0]};
}
if (value.length === 2) {
return {top: value[0], right: value[1], bottom: value[0], left: value[1]};
}
if (value.length === 3) {
return {top: value[0], right: value[1], bottom: value[2], left: value[1]};
}
if (value.length === 4) {
return {top: value[0], right: value[1], bottom: value[2], left: value[3]};
}
}
// Convert an unpacked padding object to a CSS value
function paddingToCss(paddingObj) {
return paddingObj.top + “px ” + paddingObj.right + “px ” + paddingObj.bottom + “px ” + paddingObj.left + “px”;
}
// Makes a number suitable for CSS
function px(x) {
if (typeof(x) === “number”)
return x + “px”;
else
return x;
}
// Retrieves runtime widget sizing information for an element.
// The return value is either null, or an object with fill, padding,
// defaultWidth, defaultHeight fields.
function sizingPolicy(el) {
var sizingEl = document.querySelector(“script[data-for='” + el.id + “‘][type=’application/htmlwidget-sizing’]”);
if (!sizingEl)
return null;
var sp = JSON.parse(sizingEl.textContent || sizingEl.text || “{}”);
if (viewerMode) {
return sp.viewer;
} else {
return sp.browser;
}
}
// @param tasks Array of strings (or falsy value, in which case no-op).
// Each element must be a valid JavaScript expression that yields a
// function. Or, can be an array of objects with “code” and “data”
// properties; in this case, the “code” property should be a string
// of JS that’s an expr that yields a function, and “data” should be
// an object that will be added as an additional argument when that
// function is called.
// @param target The object that will be “this” for each function
// execution.
// @param args Array of arguments to be passed to the functions. (The
// same arguments will be passed to all functions.)
function evalAndRun(tasks, target, args) {
if (tasks) {
forEach(tasks, function(task) {
var theseArgs = args;
if (typeof(task) === “object”) {
theseArgs = theseArgs.concat([task.data]);
task = task.code;
}
var taskFunc = tryEval(task);
if (typeof(taskFunc) !== “function”) {
throw new Error(“Task must be a function! Source:n” + task);
}
taskFunc.apply(target, theseArgs);
});
}
}
// Attempt eval() both with and without enclosing in parentheses.
// Note that enclosing coerces a function declaration into
// an expression that eval() can parse
// (otherwise, a SyntaxError is thrown)
function tryEval(code) {
var result = null;
try {
result = eval(code);
} catch(error) {
if (!error instanceof SyntaxError) {
throw error;
}
try {
result = eval(“(” + code + “)”);
} catch(e) {
if (e instanceof SyntaxError) {
throw error;
} else {
throw e;
}
}
}
return result;
}
function initSizing(el) {
var sizing = sizingPolicy(el);
if (!sizing)
return;
var cel = document.getElementById(“htmlwidget_container”);
if (!cel)
return;
if (typeof(sizing.padding) !== “undefined”) {
document.body.style.margin = “0”;
document.body.style.padding = paddingToCss(unpackPadding(sizing.padding));
}
if (sizing.fill) {
document.body.style.overflow = “hidden”;
document.body.style.width = “100%”;
document.body.style.height = “100%”;
document.documentElement.style.width = “100%”;
document.documentElement.style.height = “100%”;
if (cel) {
cel.style.position = “absolute”;
var pad = unpackPadding(sizing.padding);
cel.style.top = pad.top + “px”;
cel.style.right = pad.right + “px”;
cel.style.bottom = pad.bottom + “px”;
cel.style.left = pad.left + “px”;
el.style.width = “100%”;
el.style.height = “100%”;
}
Credit: BecomingHuman By: George Pipis