標題: UEF: 在即時競標系統中統一的點擊率預測架構
UEF: A Unified Framework for CTR Estimation in Real-Time Bidding Advertising
作者: 王俊儫
彭文志
Wang, Chun-Hao
Peng, Wen-Chih
資訊科學與工程研究所
關鍵字: 即時競價;廣告需求方平台;點擊率預測;real-time bidding;demand-side platform;CTR estimation
公開日期: 2016
摘要: 在即時競標系統(RTB)中,對於Demand-Side Platforms (DSP) 而言,預測一個廣告和競標需求組合的click through rate(CTR)會直接的影響到投標策略。在實際的環境中,競標需求和廣告的資料是非結構性的,對於CTR預測而言,非結構性的資料是很難處理的。除了資料多樣性的問題,一天中超過一千萬筆的競標需求也導致了資料處理速率以及巨大的資料量的問題。在這篇論文中,我們針對在RTB的嚴苛條件中即時的預測CTR提出了一個創新的架構Unified Estimation Framework(UEF)。首先我們使用feature hashing的方法來處理非結構性資料的問題。雖然feature hashing能夠將非結構化的資料轉化為結構化的資料,但是資料維度還是很大。為了優化運算效率以及減少記憶體使用量,我們提出了Softmax Ensemble Model(SEM),SEM只使用了在feature hashing 後少量鑑別力較高的特徵來預測CTR。此外,我們亦提出了simplified SEM(SSEM),來增加運算的效率。實驗結果說明了我們提出的方法可以在嚴苛的環境中良好的運行,並且在兩份實際的資料中僅使用少於50個特徵就能在效能上勝過目前最好的方法。
In Real-Time Bidding (RTB), evaluating the Click Through Rate (CTR) of a bid request and an advertisement is important for bidding strategy optimization on Demand-Side Platforms (DSPs). In real environments, the bid requests and advertisements are unstructured. Thus, it is hard to model the bid requests and the advertisements for CTR estimation. Besides the variety issue, the velocity and volume issue are also in RTB, more than 10M bid requests are in a day. In this paper, we propose a novel framework, UEF (short for Unified Estimation Framework), for online CTR estimation against the harsh environments in RTB. We first exploit the feature hashing technique to deal with the issue of untrusted data. Although the feature hashing technique has the ability to deal with the issue of unstructured data, it causes a huge number of features for each bid request and advertisement. In order to estimate CTR effectively and efficiently after feature hashing, we propose the Sofemax Ensemble Model, SEM, which adopts only a few key features after feature hashing for CTR estimation. Moreover, we also propose a simplified model of SEM, SSEM, for computation efficiency. The experimental results demonstrate that our proposed approach is able to adapt to the harsh environments in RTB, and outperforms the state-of-the-art approaches effectively when only less than 50 features are adopted in two real datasets.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356051
http://hdl.handle.net/11536/139224
顯示於類別:畢業論文