WWW 2008 / Poster Paper April 21-25, 2008 · Beijing, China Algorithm for Stochastic Multiple-Choice Knapsack Problem and Application to Keywords Bidding Yunhong Zhou HP Labs 1501 Page Mill Rd Palo Alto, CA 94304 Victor Naroditskiy yunhong.zhou@hp.com Depar tment of Computer Science Brown University Providence, RI 02912 victor@brown.edu ABSTRACT We model budget-constrained keyword bidding in sp onsored search auctions as a stochastic multiple-choice knapsack problem (S-MCKP) and design an algorithm to solve S-MCKP and the corresp onding bidding optimization problem. Our algorithm selects items online based on a threshold function which can b e built/up dated using historical data. Our algorithm achieved ab out 99% p erformance compared to the offline optimum when applied to a real bidding dataset. With synthetic dataset and iid item-sets, its p erformance ratio against the offline optimum converges to one empirically with increasing numb er of p eriods. Categories and Sub ject Descriptors: G.1.6 [Mathematics of Computing]: Numerical Analysis - Optimization (stochastic programming) General Terms: Algorithms, Economics Keywords: Sp onsored search auction, keyword bidding, multiple-choice knapsack problem, stochastic optimization k k s. Formally wts and vts are defined as follows: k kk wts = pts (s)X k (t), k vts = (V k - pks )k (s)X k (t), s, t, k. t (1) 1. INTRODUCTION Here V k denotes the exp ected value-p er-click for keyword k, X k (t) denotes the numb er of user queries for keyword k at time p eriod t, k (s) denotes the click-through rate (CTR) of p osition s and keyword k (the ratio b etween total user clicks on the ad at s-th slot and the total numb er of impressions), and pks denotes the cost-p er-click for ads in p osition s for t keyword k at time t. The bidding optimization problem has a strong stochastic flavor as bidding prices and click traffic change all the time. The online and stochastic variant of MCKP (S-MCKP) where item-sets are received one at a time, allows us to explicitly incorp orate this uncertainty in the distribution of future itemsets. Although, decisions are made assuming that the itemsets are indep endent and identically distributed (iid), the algorithm p erforms well even when items in different time p eriods do not follow the same distribution as evidenced by exp erimental results for the "auto insurance" dataset. Sp onsored search auction is an effective way of monetizing search activities where advertisers pay to place their ads on search results pages for sp ecific user keyword queries. In this work we focus on the bidding optimization problem for an advertiser with budget constraints. Formally, we address the following problem: for each keyword and each time p eriod, how much should the advertiser bid (i.e., which p osition to obtain), so as to maximize ROI of the ads given a fixed budget and a fixed time horizon? We can view each ad p osition as an item with associated weight (cost) and value (either revenue or profit). The advertiser has a budget constraint, and it naturally corresp onds to the knapsack capacity. Furthermore each advertiser can have at most one ad app ear on each keyword results page. This corresp onds to the constraint that at most one item from each item-set can b e taken in the multiplechoice knapsack problem (MCKP), a well-known variation of the knapsack problem (KP). The following model of the bidding problem is prop osed: given multiple keywords k K , multiple time p eriods when the advertiser places bids t {1, . . . , T }, and multiple p ositions s {1, . . . , S }, the k k k item-set Nt consists of items (wts , vts ) for all ad p ositions Work was done while the author was an intern at HP Labs. Copyright is held by the author/owner(s). WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. 1.1 Related Work In the last few years, a numb er of pap ers addressed the problem of revenue maximization or bidding optimization in sp onsored search auctions [1, 4, 2, 3, 7]. None of the previous work prop osed a solution that employs distributional information ab out prices and solves the bidding problem with multiple ad p osition, keywords, and time p eriods. Zhou et al. [8] attack a very similar problem and model it as Online-MCKP; however their work focuses on comp etitive algorithms with worst-case p erformance guarantees while this work focuses on average-case p erformance with stochastic input. The threshold function we develop for SMCKP is based on the threshold function for the stochastic knapsack problem by Lueker [6]. 2. APPROXIMATING S-MCKP Our algorithm for the S-MCKP is based on Lueker's Algorithm for Online-KP and an approximation for MCKP [5]. At a high level, we use a technique from the approximation for MCKP to convert each MCKP item-set into KP items and apply Lueker's threshold function to selected KP items. We now describ e the algorithm in detail. We first apply a technique for converting items from a MCKP item-set into incremental items with the following prop erty: taking multiple incremental items with decreasing efficiency (value/weight) is equivalent to taking exactly one original item. The technique consists of two steps: (i) removing dominated and LP-dominated items from the item- 1175 WWW 2008 / Poster Paper set and (ii) creating incremental items from undominated items. For details, see Kellerer et al. [5], p.320. After obtaining incremental items, we use a threshold function to decide which incremental items to take. Lueker [6] prop osed solving Online-KP using an adaptive threshold: only items that meet the threshold efficiency are put in the knapsack. The threshold function (which we call g ) maps the average remaining capacity p er time p eriod to the threshold efficiency e . The threshold efficiency is such that the exp ected weight of the remaining items (all items are iid) with efficiency at least e is equal to the remaining capacity: n = f w i v C = E w ,v 1{ w e } wi 1{ vi e } n Ew,v w ( C v while f (e) Ew,v 1{ w e} 2) n We need the distribution of incremental items to calculate the threshold function (Eq. 2), however such information may not b e known or have a closed-form representation. Alternatively, we can approximate the threshold function of incremental items using item-sets received in the past. Formally, given a sample set of m incremental items, we can ~ use f to approximate f where m 1i ~ f (e ) wi 1{ vi e} (3) wi m =1 (e ) = ~ f is a piecewise constant function with at most m pieces, and it can b e computed in time O(m log m) based on sorting of item efficiency. The algorithm for S-MCKP is in Figure 1. It consists of two phases: the first phase (optional) is to generate the threshold function if training item-sets are available. In the second phase, at each time p eriod the algorithm converts the current item-set to incremental items, checks which incremental items meet the threshold, and takes the corresp onding item from the item-set. The threshold function can b e up dated in step 2 by incorp orating information from the current item-set. In the case when no training itemsets are available in step 1, the first real item-set is used to generate the initial threshold function. Input: item-set Nt at time t, for t = 1, . . . , n; knapsack capacity C ; (optional) training item-sets; Output: items to take at each time 1. create incremental items from training item-sets r is the average numb er of incremental items p er set ~ generate f using incremental items 2. for t from 1 to n create incremental items from item-set Nt ~ up date f and r based on incremental items C ~ e = f -1 ( r(n-t+1) ) /** r (n - t + 1) is the exp ected numb er of remaining incremental items **/ select incremental items with efficiency at least e (w, v ) is the corresp onding item of these selected incremental items if w C take item (w, v ), up date C := C - w Figure 1: Algorithm for S-MCKP. =1 wi April 21-25, 2008 · Beijing, China indep endently from one of the following distributions: Uniform with supp ort b etween 1 and 10, Normal with mean 10, and Exp onential with mean 10. The numb er of items p er set received each time p eriod is 5. We express the budget (capacity) as a fraction of the mean weight of an item times the total numb er of time p eriods, i.e., C = × n × mean(w), where n is the numb er of time p eriods, mean(w) is the mean weight of a random item. The value of indicates whether the budget is large compared to the exp ected overall sp ending if you pick random items each time without any optimization. We tested the algorithms on problem instances with the budget level {0.05, 0.2, 0.5, 0.9, 1.1}. We evaluate the p erformance of the algorithm based on the ratio of the value obtained by the algorithm and an upp er b ound on the optimal solution to offline MCKP. Figure 2 shows exp erimenal results with no training item-sets for generating the threshold function in step 1 of Alg 1. The p erformance of the algorithm is almost always within 10% of the optimal when n 20 and approaches the optimal as n goes to . A graph for the p erformance of Alg 1 with the Uniform distribution is shown b elow. Graphs with the other distributions look very similar, and are omitted due to space constraints. 1 0.95 ratio 0.9 0.05 0.20 0.50 0.90 1.10 20 40 80 160 320 number of periods 640 1280 0.85 0.8 10 Figure 2: Performance with U(1,10). The second set of exp eriments uses a real dataset for the "auto insurance" keyword describ ed in [8]. The p erformance of our algorithm is around 99% while the algorithm in [8] achieves around 90%-95%. 3. EXPERIMENTAL RESULTS We run two sets of exp eriments. In the first set of exp eriments, we generate items with weights and values drawn [1] Z. Abrams. Revenue maximization when bidders have budgets. In SODA, pages 1074­1082. ACM Press, 2006. [2] Z. Abrams, O. Mendelevitch, and J. A. Tomlin. Optimal delivery of sponsored search advertisements sub ject to budget constraints. In ACM EC, pages 272­278, 2007. [3] C. Borgs, J. Chayes, O. Etesami, N. Immorlica, K. Jain, and M. Mahdian. Dynamics of bid optimization in online advertisement auctions. In Proc. WWW, pp 531­540, 2007. [4] N. Buchbinder, K. Jain, and J. S. Naor. Online primal-dual algorithms for maximizing ad-auctions revenue. In Proc. ESA, pages 253­264, 2007. [5] H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack Problems. Springer, 2004. [6] G. S. Lueker. Average-case analysis of off-line and on-line knapsack problems. J. of Algorithms, 29(2):277­305, 1998. [7] S. Muthukrishnan, M. P´l, and Z. Svitkina. Stochastic a models for budget optimization in search-based advertising. In Proc. WINE, LNCS 4858, pages 131­142, 2007. [8] Y. Zhou, D. Chakrabarty, and R. Lukose. Budget constrained bidding in keyword auctions and online knapsack problems. In Proc. WWW (poster), 2008. 4. REFERENCES 1176