Data Ingestion of Advertiser for Tencent Ads & Retail
2023|Type: Platform|Tag: Design for Efficiency|Role: Product, UX Design
Background
Tencent Marketing has multiple data ingestion channels in two major areas, Tencent Ads and Tencent Smart Retail. This has led to a lot of work and time in demand communication, data governance, and guiding the advertiser to re-ingest data.
View the project on Slides (These slides were from my instant talk when problems were discovered during the delivery phase. A little bit cringed though, not every time could we send confetti to us. Face it, record it for sure.)
What is data ingestion?
Over 90% of ad cost is oCPA ads, which are strongly dependent on the data postback by the advertiser. In the last 30 days since 17 March 2023, the Zhishu - as known as DMP previously - ingested nearly 200 billion pageviews. So what exactly is the first-party data ingestion?
The advertisers based on their demands, conduct the event tracking at the front end and (or) data acquisition at the backend, the data is by a certain protocol to report to the server, this data transmission process is called data ingestion. Common data is ingested such as JS, SDK data acquisition, API endpoints, and files to the server for parsing.
A complete data ingestion can be as short as two to three months or as long as six months.
What kind of data is ingested?
The data ingested mainly includes user-generated data from various scenarios, such as apps and WeChat Mini-Programs. Under ads and retail, we uniformly call the objects described by the data as entities. Users, orders, goods, etc. are all entities. Entities constitute the data assets after collection.
Ads
Retail

What is the First-Party Data Ingestion
Discover Problems
Feedbacks from ads and retail are received, the Vice President of 37GAMES said that the data ingestion was ineffective, and our retail clients were crushed by the communications. What's the problem with data ingestion? The product managers and operators were guided by the designers to sort out the problems in the ads and retail before, during and after the ingestion.
Data Ingestion in Ads
In the ads, in the era of cost-per-click, the platform does not strongly rely on data ingestion, and the click data is with it. With the evolution of oCPX, the optimization goals are getting deeper and deeper, and more and more rely on the action data postback by advertisers. The conversion data would be used in downstream applications after processing, including audience targeting, scoring, ranking, oCPX, attribution, ad delivery report, and so on.
For advertisers, there are four touchpoints for data ingestion, namely, Audience Files and Data Sources in Zhishu, Attribution in the Ad Delivery Platform, and API endpoints.
Audience Files: It is uploaded to Zhishu in CSV formats, which are used to bid on specified audiences by the advertisers for price adjustment or targeting in the ad delivery. Specifically include single-column audience files with only ID columns and multi-column audience files with besides ID columns for downstream reading and use.
Data Source, which is with JS, SDK or API endpoints, loads the conversion data back to the ads side. The data statistics are from 27 September 2022 to 26 October 2022.
Ingestion Steps:
The data within the data source has many uses, one of which is attribution. The attributed data will be used in various aspects of machine learning as a tool. We can use attribution as an example to analyse the data flow.
The operational path for the old version of attribution was:
With the development of the refined operation of the advertising platform, the different industry demands vary. The data gradually met the requirements that the platform suggested. Cooperating with KA customers has gradually developed into an all-channel, all-path data being postback. In terms of this, the old version of attribution has the following problems:
This requires more information, such as optimisation goals and deep optimisation goals. It also pushes the oCPX ad delivery platform from "campaign + optimisation goal+ whether to turn on deep optimisation goal" to "campaign + industry path + optimisation goal".
The new version of attribution was born. It is a flexible one-stop solution, where advertisers add a click detection link when they create attribution rules (selecting a combination of optimisation goals under an industry path) in the ad delivery platform. This is achieved by agreeing on a string with a specified format to represent the position of the field that will be replaced in the future, this format is collectively known as "macro", click to see the introduction.
At the same time, as of 03 November 2022, 67.31% of customers are still using old version attribution, which still accounts for 77.52% of the cost. The old attribution still needs to be compatible. As complex as attribution is, it is also a challenge to consider the new version of attribution alongside the old one.
Advertisers in various industries post back conversion data to the platform under the requirement of oCPX ads, which is concerned about the accuracy of the data.

Data ingestion in Ads
Data Ingestion in Retail
That is smart retail under the WeChat ecosystem. We found that in the process of data ingestion in retail, due to its positioning of business analysis, and many applications on the cloud, clients need to ingest the data on their own to see the lively business in real time. Clients consider their own needs to open accounts or purchase applications on the cloud. This has the biggest advantage of retail data ingestion: one-time ingestion for multiple uses. To satisfy all of them, except for a few scenarios such as Cloud Alliance and Preferred Alliance, almost all retail applications meet the requirements of Youshu. However, there are several very strict requirements on Youshu, while other applications are not so complex, resulting in labour intensive, but can not be used. In addition, in data ingestion of retail, there are some functional features, such as providing clients with the log query, you can locate what each pageview looks like, which is very friendly to them.
Ingestion Steps:

Data ingestion in Retail
Define Problems
For the External Clients
Advertisers and retailers have the following issues with the access process:
Data ingestion is not uniform.
Differences in the ingestion process and its standard of the two domains have led to reticulate communication, repeated postback, and unstandardised postback by clients and agencies during the cooperation with Tencent, which has directly led to busy communication, rotten efficiency, and poor data quality in data ingestion.
For example: When a customer builds an application, there will be a demand for data ingestion at Tencent. According to incomplete statistics, we have close to 200 documents. When a customer uses A capability and wants to use B capability, he may have to check B documents from A documents. These documents may be partially the same and fields also. It can be confusing for the customer's development team. If this customer has multiple teams, it creates reticulate communication, resulting in a lot of repeated data ingestion, which is inefficient.
Data management is inadequate.
Data ingestion is indirect leading to difficulties in efficiency between multiple channels, as well as data scattered-managed and differences in field standards. Clients and agencies have poor experience in managing data.
For example The ROAS lift-based strategy, before 2021, due to the difference between WeChat traffic and non-WeChat traffic in the parent-child ad ids, the difference in statistical standards, the difference in the meaning of the time of the intermediate table, the difference in the statistical objects, the missing data and other problems, there is a Gap of nearly 70% between two statistical results.
For the Internal Data Governance Teams
For the governance of data, both the ads and the retail struggled to meet the current demands of the business in terms of validation:
Validation in ads is weak.
There is only a simple engineering validation with no record of the reason for failure and no traceability. There is no clear logic for repeated postback. The logic is also not reflected in the documents. The validation also is incomplete and unclear.
For example, the platform at the beginning of the data ingestion is mainly three types: The data of attribution is provided for the optimization goals. The data of the audience file is collected for the targeting. The action data is loaded for the data insights. These data is historically, in the governance and management of different teams which built a lot of silos. To reach an agreement on the effects, industry operations negotiated with the advertisers, and at first everyone postback action type data, and when it wasn't enough, they added it inside the attributes. This resulted in poor data postbacked by the advertisers.
Validation in retail is strong.
Because products, platforms and services have their data in retail, the evolution and automation of each branch are far apart, to improve the efficiency of data governance, the most stringent standards as the bottom line, but in fact, resulting in a high input, low output of customer service.
For example, in addition to a few scenarios such as Cloud Alliance and Preferred Alliance, nearly all retail applications postback data by the requirements of Youshu.
For the Internal Data Application Teams
Data in ads is used deeply and retail is used widely.
There are a lot of barriers to conversion paths.
Each conversion path has different data processing, and there are many process sessions within four major problems less, more, wrong, and slower.
For example, problems in some sessions lead to less data, retries lead to more data, format conversion tampering with the original data leads to data errors, and more sessions lead to slower distribution.
There is a lot of waste of data resources.
Inefficient consumption. Each data-consuming side takes the data and makes a copy and cold backup, resulting in a waste of resources.
For example, in the past, the federated modelling, attribution and dee cooperation were three data flows, so you had to store three copies. Then it was merged into two, and now it's advancing into one.
Difficult to make downstream applications fully understand.
Downstream application parties have difficulty understanding the data being ingested. The requirements for data are different for each application side. Especially in machine learning, the data for training and predicting is lost a lot, and the value of conversion data application is limited.
For example, unfamiliar understanding. In the advertising system, attribution, models, strategies, et cetera. Each has its understanding and usage on the data-consuming side. With inconsistent goals, it's easy to become a Simpson's paradox. "Good individually, bad together."
These types of problems can be further abstracted into:
Objectives
Based on these three types of users, we try to explore and build a new full-domain data ingestion and application distribution experience. It meets the requirements of deep application, more accurate and real-time data in ads, and also meets the requirements of rich applications and multi-purpose applications in retail. Taking ads and retail data as a bridge, it creates an all-domain marketing data assistant that connects public and private traffic for clients.
Client Objectives
Efficiently complete high-quality data ingestion on the same Tencent platform by the data standard of a specific industry, and meet the use of multiple applications with a one-time data ingestion. Clients can meet up to 22 applications with a one-time data ingestion.
Business Objectives
By unifying ads and retail, integrating the requirements and standards of multi-channel, multi-application and multi-industry data ingestion, and unifying the ETL services and application distribution, it provides customers with a one-stop data ingestion and data management platform. Truly draw the positive cycle from data ingestion to data marketing.
Design Objectives
Solution
First Objective: Unified Data Ingestion and Asset Management
First, in the early days of the team merger, when everyone was unfamiliar with each other, we took a design sprint to guide the ads side and retail side to quickly reach a consensus and come up with an information architecture.

Information Architecture
Second, we figured out a plan to promote ingestion through application hooks. After collecting the applications of the two domains, they were categorised, from the public domain to the private domain for marketing purposes, brands and touchpoints perceived by users.

How to organised information
We then had the design of the application selected. Taking it into the process means that tasks are based on whatever applications you're going to use.
Application Selecting
There is also a concept of "one ingestion, multiple applications". Such as two applications have been ingested and distributed, and accidentally meet the third. It is not necessary to re-ingest for that one.
One-Time Ingestion for Multiple Uses
Again, in data management, we have to first help customers solve the problem of where to look at the data assets. That is the relationship among all of the platforms. Then we solve the management problem. Previously, it was for the management of assets without applications also distribution, because there is just one and only one. Now it is a one-on-one relationship between assets and applications, to achieve accurate authorisation and distribution.
We provide users with first-time solutions for the platform jump, convey the value proposition of each platform, switch all the platforms at dark launch, clarify the platform differences and value, and support the future jump with Youshu.

Jumps between Platforms
Establish one-on-one relationships between data sources or audience files and applications to improve efficiency from filtering and batching. The object of distribution and authorisation is no longer the data source or audience files, but the precise authorisation and precise distribution for <one data source, one application> and <one file, one application>.
Authorisation and Distribution
Second Objective: Ingestion efficiency
Fewer steps, fewer actions, shorter duration, and fewer repeated fields are achieved by quantifying ingestion fields and ingestion steps.
First of all, the ingestion ways take the intersection and the fields take the concatenation. In this way, the least ingestion ways can satisfy the most fields needed.
The Ingestion Ways Take The Intersection and The Fields Take The Concatenation
Secondly, to solve the problem of carrying a big package over to the test as well as spending 1,000 CNY on validation, we split the test into matrix-like steps. With no test data going into the formal environment to avoid data contamination.
Matrix-like Steps
Third objective: Ingested Data Quality
The difficulty lies in the optimisation of the cores, which includes the data model and validation rules.
Through the practice of data governance on the DataCube Platform, the data side built and validated the data model of quadruple - UserInfo, ItemInfo, ActionInfo, and QualityInfo - based on the shuttle mechanism of internal experiments. Among them, QualityInfo will be measured from the following dimensions:
Through industry research, we have constructed brand-new sets of validation rules, including entity base rules, application rules, and industry rules. And 184 sets have been sorted out, which are still advancing according to the priority:
What the designer can do is to keep on going to optimise the documents where the field information is located and improve it. There are two main touchpoints:
Documents by hand and Customised Fields by Industries
Effect
By the end of 2022, DataNexus covered 8,000+ clients, with both ads and retail dropping from days to hours in terms of data ingestion efficiency. On the ingested data quality, a significant increase in GMV was also achieved. In addition to this, as a designer bringing the team together at the beginning also wins the trust. The downside was the full release as well as the research after the full release has not been conducted.

Overall Effect
Last but not least
Whether it's data governance internally or data ingestion externally, whether it's human-centred or machine-centred, what we need to do is to create order out of chaos. To be a designer who can guard traditional design values and also explore the boundaries for it.
© 2024 Xiang PENG. All Rights Reserved.