Inside Ads, Besides Design: Ad Tech

Inside Ads, Besides Design: Ad Tech

2021｜Type: Edited｜Tag: Theory

The main players

Advertiser: The buyers of the ad space of digital advertising space, often assemble a marketing growth team to improve customer acquisition efficiency.
Publisher: The owners or suppliers of digital advertising space, with a commercialisation team to increase revenue by traffic monetization.
Ad Networks and Exchanges: The platform bundles ad space from the publishers and sells it to advertisers. The goal is to consider multiple parties' interests to improve ad matching efficiency.

The concept

eCPM = r(a, u, c)

This function highly generalizes the revenue per ad impression from the ad network perspective. The expected revenue for the ad network to display an ad a each time a user u accesses or searches, under the situation of context c. It defines the core value of advertising.

Billing method for advertising

CPT, Cost per Time: The ad space is delivered to the advertiser exclusively and charged for an exclusive period.
CPM, Cost per 1000 Impressions: Shows the amount spent on an ad campaign, divided by impressions, multiplied by 1,000.
CPC, Cost per click: Shows how much, on average, each link click costs you.
CPS, Cost per sale / CPA, Cost per Action / ROI: Shows the amount spent by sales, action or ROI. The intent is to get close to the advertiser.
oCPM, Optimized Cost Per 1000 Impressions: The new approach that allows you to prioritize the marketing goal, and deliver ads towards these goals in the most effective way possible.

The conflict between the advertiser and the publisher

The ad revenue of the publishers equals impressions, multiplied by CPM.
The gross margin equals the difference between average transaction value and CPS, multiplied by sales volume.

Publishers and advertisers are individually focused on the closest billing metrics and have opposite expectations, while all conversions in between are subject to uncertainty. This contradiction is the basis of logic in the ad network.

The commonality between the publisher and the ad network

The goal is to improve the efficiency of traffic monetisation in ads. Overall this can be summarized in terms of automation in ad delivery, and we usually look at five elements - budget, targeting, bids, creative and landing pages.

Automatic budget allocation. Set budgets for campaigns and accounts, and make decisions on whether volume or cost is guaranteed.

Automatic targeting audience matched. It has mainly experienced two stages. The first stage utilizes the second-party data for mining and extracts the advertising audience through label combination and label extraction to execute an ad delivery. The second stage utilizes the advertiser's first-party data to do the expansion and split by hierarchy to complete ad delivery, or furthermore, executes the bidding by sub-targeting the audience to directly impact the retrieval, scoring and reranking.

Automatic bidding. oCPX has emerged to integrate all of the budget control, conversion rate prediction, and bidding on the ad network side. Bidding directly has an impact on the real-time ranking of eCPMs.

Dynamic Creative. Upload all the advertising materials at once, the system automatically completes the combination of materials and testing.

Landing pages. Currently landing pages focus on industry demand tools as well as templates and are still at the cooling-off period of automation level.

The predictive modelling

Earlier ads were mainly cost per click or action.

eCPM = oCPA * CVR * CTR * 1000

Because the system can't know the true CVR and CTR until the ad is displayed, it is generally predicted through machine learning.

eCPM = oCPA * pCVR * pCTR * 1000

The bidding

ROI based on ad delivery, the common strategies of advertisers can be divided into three categories: conservative, conventional and aggressive. (Oceanengine divides the bidding scenarios into five categories: the upper limit of cost-controlled delivery, the balanced delivery, the climax buying prioritized, the climax buying and the conversion cost-optimised, and the climax buying)

The restriction

Ads online allocation.
Budget constraint. The budget of campaigns or accounts set by the advertisers, the upper limit of the amount spent on advertising over some time.
Traffic constraint. The publishers also have limited ad spaces.
User experience. Advertising is not only the publishers, the advertisers, and the ad network but also the users. Ads that are focused on revenue the ad network will flood with overpriced, low-quality ads. This is the need to improve the user experience of ads, especially in ad formats. The usual approach is to quantify the user experience or introduce a price squeeze factor.

The Retrieval Phase

The goal

The retrieval phase is facing the whole ad library with a large search engine. User-interested-ad being recalled may be a variety of perspectives. In the next-Gen ad network, high quality but low quantity is the requirement of the retrieval. Before the next-Gen ad network, models meet low precision because simple and efficient is sufficient.

The measurement

Relevance of retrieval based on user and context
Diversity of retrieval

The present

Before the next-Gen ad network, which will be by 2022, the approach was mainly targeting and supplemented by recall. Among them, the recall is a multi-channel recall. It can be roughly divided into TAG branch, ANN branch and new branches being explored.

TAG branch. That is, recall by label, on this basis combined with dynamic adjustment of weights and labels. mainly include:

Behavioural and interest of audience targeting
App installation of audience targeting
Scene of audience targeting
Audience pack
LevelTag
BackendTag

ANN branch. That is, vectorized recall, the model is currently mainly the deep structured semantic models of user and item. That has two shortcomings. First, the expression of the embedding is limited; second, this model is originally based on business goals and does not use the embedding similarity of the item as an optimization constraint. The ANN requires the similarity of items before. This process can be u2i or u2i2i. At present, ANN branches mainly include:

Lookalike
Predictive modelling such as ad Exposure through Rate, Click-through Rate, Conversion Rate, Deep Conversion
Rate, eCPM, GMV, winning percentage in Scoring, and winning percentage in Reranking.
FM, deep structured semantic models, multi-interest model extension by DSSM, sequence recommendation model, GNN graph neural network model.

The new branches being explored mainly include:

DR/CAR: depth map retrieval used by Alimama
TDM: Deep Tree Retrieval used by Oceanengine

It is expected to use intelligent targeting with multi-channel recall to perform Top K and merge in the future. include:

Retrieval of the original targeting
Retrieval of the content
Retrieval of the industry strategy
Retrieval of the traffic strategy
Retrieval of the general strategy
Retrieval of the first-party data

Combined with retrieval and filtered rules, such as filtered by targeting, budget, etc. Take 10,000 to 20,000 to enter the next stage.

The disadvantage of this is that the overlap of multi-channel recall is high and resources are wasted. The retrieval model is Learn-to-Rank in the next-Gen ad network. This time we focus on the model mainly, supplemented by the retrieval. It will be demonstrated in the next stage.

How it works

The following describes how recall takes effect in the ad network after the audience targeting is selected by the advertiser.

The Scoring Phase

The goal

Retrieval high-performance ads: high eCPM ads, discover better new ads
Balance of precision and speed: the algorithm of the scoring model requires high computing performance, and a relatively simple and fast algorithm is needed.
Correlation with the goal of the reranking model: from the perspective of the ad network, select the ads that are considered high-performance by the reranking model.

The measurement

The consistency of the goal between the scoring model and the reranking model.
The consistency of the strategies.
The consistency of the eCPM.

How it works

To ensure system performance and prevent excessive processing time, each stage of the ad selection process incorporates a pre-sorting and truncation mechanism. This mechanism limits the number of ads proceeding to the next stage by applying a fixed upper bound. In the first two pre-sorting stages, offline CTR, CVR, and eCPM calculations are performed at the ad, account, and overall levels due to the absence of the LiteCxR model. The LiteCXR model is used for eCPM calculation in the third truncation stage.

First Timeout Truncation (TimeOutProtectionCut): The increasing number of ads in the ad pool, expanded recRetrievalsed on intelligent targeting, and improved label coverage for original targeting can lead to a significant increase in the number of ads entering the rough ranking stage. To maintain performance, truncation is applied before each stage.
Aggregate Optimisation: Aggregate optimization addresses the issue of ad clustering by selecting high-quality ads based on ad-level optimization scores within the same account or product category. This helps break the vicious cycle of "creating a large number of similar ads to compete for traffic, resulting in scattered ad traffic, slow traffic growth, and high waste rates." The goal is to improve the system ecology and enhance the traffic growth experience.
Second Timeout Stage: This stage selects the most high-performance ads for pre-scoring.
LiteBindAds: This module binds the fields required for ad scoring to prepare data for scoring. It mainly includes the following categories: Bid-related fields for eCPM calculation, Bid factors for adjusting ad bids, Ad attributes, CXR offline scores and calibration factors, eCPM support-related fields, and Material-related fields
Pre-scoring (Prescoring): This module uses the Lite model to score the retrieved ads, calculates the rough ranking eCPM, and is used for the third timeout truncation, filtering, sorting, and washing stages. This stage is a strategy framework that links the LiteCxR, price adjustment, and other modules to score the retrieved ads. LiteCxR can be evaluated using offline and online AUC, and eCPM can be evaluated using consistency with fine-ranking eCPM.
Third Timeout Truncation (CutIndexProcessor): Under performance-permissible conditions, this module selects the most high-performance ads for downstream sorting.
FullBindAds: This module binds the fields required for subsequent filtering, bucketing, sorting, and washing to the ads.
Filtering Strategy (FilterProcessor): This module applies a comprehensive set of business strategies to filter the candidate ads entering the Filter stage.
Bucketing Strategy (BucketingProcessor): This module sets mask information for a bucket based on ad attributes to enable different scoring and sorting logic for different buckets. There is no performance objective.
Washing, Sorting, and Selecting Top N Ads (BaselineRankingProcessor): This module performs scoring and strategy filtering on the remaining ads (800~1000) after timeout truncation, aggregate optimization, and filtering. The ads are then bucketed and truncated according to the allocated quota. Finally, they are washed, and sorted, and the top N (100~300) ads are selected. Currently, the sorting and washing changes are being optimized through observation of the experimental system.

Before the end of 2020, the coarse ranking system employed a pre-estimation algorithm based on post-impression value, namely LiteCXR. This algorithm estimated eCPM by separately predicting ad CTR and CVR after impression, incorporating price adjustment strategies. However, it faced several persistent issues:

Low consistency between coarse and fine ranking eCPM: Ads deemed high-quality by coarse ranking were often not recognized as such by fine ranking. This inconsistency stemmed primarily from two factors:
Inaccurate CTR and CVR prediction in coarse ranking: Inaccuracy arose from data latency and sparsity issues. From a modelling perspective, computational constraints limited the model's expressive ability.
Numerous business strategy rules or influencing factors in both coarse and fine ranking, particularly in fine ranking.
Sample selection bias: CTR and CVR modelling based on ad impression, click, and conversion logs resulted in a biased subset. The training data and the actual data to be predicted came from different distributions, challenging the model's generalization ability. However, the sample space of the coarse ranking ad queue could be vast.
High computational complexity: As illustrated in the figure, each request required three predictions for each recRetrievalad: CTR, shallow CVR, and deep CVR. This led to significant computational overhead, and multiple model iterations added to the complexity.

The next-Gen ad system proposes a ranking learning algorithm based on fitting fine-ranking results. It utilizes the fine-ranking eCPM order as the optimization objective and employs a ranking learning LTR (learn to rank) model to ultimately select the ads with the highest estimated eCPM by fine ranking. This approach addresses the following challenges:

Rapid propagation of fine-ranking optimization results in coarse ranking: High consistency between coarse and fine ranking, leading to improved linkage efficiency.
Richer samples based on fine-ranking queue TrackLog logs: Mitigates sample selection bias and data sparsity issues.
Larger scale and faster feedback based on fine-ranking queue TrackLog logs: Enables faster model updates.

The primary goal of coarse ranking modelling remains to align the estimated eCPM of ads in coarse ranking with that of fine ranking. Several approaches can be explored, with a focus on:

Fitting coarse ranking model to fine ranking eCPM/bid values (regression problem, value estimation): This approach directly estimates the eCPM or bid values predicted by fine ranking.
Directly fitting fine ranking eCPM order relationships (ranking problem, order estimation): This approach focuses on replicating the order of ads as determined by fine ranking.

The advantage of "value" over "order" is that:

Values imply ordinal relationships, which are more informative.
Value learning requires fewer samples, and is easier to learn from experience.
Values have clear physical meanings and are highly interpretable.
The numeric sizing of the eCPM is more important than the order because the business goal is to pick away with a higher overall eCPM, not a better order.

The current preference at this stage in 2022 is to indirectly fit the values of the eCPM of the reranking model by directly learning the Click-though-conversion-rate of the reranking model or the eCPM divides bid values.

Use the Pointwise algorithm to train samples through value models: The training set and test set are randomly divided according to the request. Each request can construct one 9-tuple sample <user, ad, pCTR, pCVR1, pCVR2, bid1, bid2, eCPM1, eCPM2> or two 6-tuple samples <user, ad, pCTR, pCVR1, bid1, eCPM1> and <user, ad, pCTR, pCVR2, bid2, eCPM2>

Use the Pairwise algorithm to train samples through order models: The problems here focus on how to construct pairs, how to sample pairs, and how to set sample weights. However, there are probably a lot of reranking ads in the queue, which involves the sampling problem of training samples. After the demonstration, the positive and negative examples used were sampled from different intervals and the positive and negative examples were sampled from the same interval.

Use the Pointwise algorithm and the Pairwise algorithm to train samples through binary classification models.

Pointwise algorithm: Take the top 10 ads of the reranking model as positive samples, and sample 20 ads after the Top 10 as negative samples. The sample format is <user, ad, label = 1 or 0>
Pairwise algorithm: There are a total of 15 ads in the queue of the ranking model, which are collected as follows: Top1, Top2~Top50 sample 5 advertisements, and Top20 and after sample 9 advertisements.

The Ranking Phase

The goal

Prioritise high-precision ads to strike a balance between the value of the advertising system, advertisers, and users. The final step in the selection process focuses on selection control strategies.

How it works

Multi-way Merge (DocWash): An intermediate step between scoring and ranking, which includes a multi-way merge, quota retrieval bucket, and ad washing for ads from the same advertiser and product. Among them, a multi-way merge currently involves two regular ads and two dynamic creatives. Both regular ads and dynamic creatives are based on the same scoring formula. The quota retrieval bucket is consistent with the scoring bucket logic, with new/old/special business buckets; scoring is used to sort within each bucket. Some special ads skip the DocWash washing logic and are directly retrieved such as contract, CPD, direct delivery, and ad slot whitelist.
Fetch Sorting Information: Before calibration and scoring, parse user data, ad slot data, and ad data separately to prepare preliminary data for subsequent scoring. The parsing content involves freshness, user negative feedback, exposure time parsing, etc.
Calibration: When dealing with multiple materials (dynamic creative ads, multi-material ads), it is necessary to perform optimization, and the traffic side will also calibrate pCTR. This selects the best material and the best material to optimize pCTR bias and pave the way for the accurate calculation of eCPM.
Scoring & Ranking (Auction): The core module of ranking, eCPM calculation, and sorting based on eCPM determines the second-to-last ranking of ads. (There are also various adjustment strategies and freshness filtering strategies later)
Filtering: According to ad type, traffic characteristics and business characteristics, comprehensively consider user experience, ad and platform revenue, and remove low-quality ads from the ad queue.
Adjusting (TunePos): Strongly related to the business. Adjust the ad queue according to specific traffic, business background and other information.
Decision (Page): Includes diversity strategies, traffic provider-specific strategies, CPD ad insertion, billing logic, etc., and finally selects the winning ad.

References

《Calculating Advertising》
《The Strategy Product Manager Modelling and Methodology》