Generative AI for Ad Performance Query

Generative AI for Ad Performance Query

2024｜Type: Platform, WeChat Mini-Program｜Tag: Design for Effect｜Role: Product, UX Design

The Rapid Pivot Design series represents a rapid redesign undertaken post-team restructuring. Despite the initial challenges of inconsistent design visions and unfocused tasks, I proactively guided three newly formed product teams which were divided up from the ad data product team. Leveraging my deep understanding of ad data, I provided swift design solutions to ensure a smooth transition and meaningful MVP product launches.

Background

AData Platform integrates the core capabilities of multiple internal AMS platform products, such as Galaxy Platform, Data Warehouse, Adhome, Greenspan, and GreenBI, into an all-in-one business data platform. Operators from the industry side, traffic side, and advertising platform side can all use this platform to look up data tables and data dictionaries, apply for data permissions (viewing, using), and conduct data modeling and data analysis.

Now, based on Tencent Hunyuan, AData provides Chat capabilities within it, focusing on core performance metrics to create a smarter data analysis product: ChatBI. At the same time, Greenwich, a WeChat mini-program under AData designed for managers to view core performance metrics, also supports ChatBI.

So, what can ChatBI achieve?

Users' Needs

We've gathered common questions from our business analysis and business manage teams focused on core performance metrics. These questions can be grouped into three categories:

Data retrieval using natural language.
Attribution.
Trend prediction, along with action recommendations.

Data Retrieval Using Natural Language

"How's the recent advertising going for that particular client?"
"What audience segments did the latest ads from that particular client hit, and how competitive is this client with their rivals?"

These types of questions involve extracting the needed data directly through natural language. They're relatively straightforward and can be tackled by combining large-language models with advertising know-how.

Attribution

"Why was the revenue from video contract-based advertising particularly poor in the first quarter of 2024? What issues were there on the media side and the sales side?"
"There was a dramatic increase in Tencent Video's impressions in August. Was there any trigger?"

These questions require a bit more digging to figure out the reasons behind the changes. They're a bit trickier, but we can try to solve them by leveraging the know-how of business management.

Trend prediction, along with action recommendations

"What's the future trend and estimated decline for the current peak of the Mobile game Yuan Meng Star?"
"Yesterday's performance was only 2 billion CNY. What are the main issues? How long will this situation last, and when can we expect to hit 3 billion CNY?"

These questions are all about predicting future trends and suggesting actions. They're really tough, even for business experts who might not always have a clear answer. Right now, large-language models are pretty much stumped by these kinds of issues.

What We Can Do Right Now

Tencent Hunyuan has some hiccups. It can't quite get the hang of our industry jargon and sometimes comes up with some wonky results. So, it's not quite ready to be the go-to tool for big management decisions.

But we can still put its powers to good use! For our business analysis group, it can help make data analysis a breeze. Stuff like turning regular talk into SQL code, or pulling data from SQL to whip up some handy charts – those are the kinds of cool things we're trying out.

Based on the questions we've gathered and what Tencent Hunyuan can do at this point, we're mainly focusing on tackling those Data Retrieval Using Natural Language questions. When users ask something, we've got two services lined up:

Data Retrieval: It's all about turning text into SQL magic. We use ETL to auto-generate SQL, so users can grab the data they need, stat.
Visualization: AData's got the goods to take that data and show it off in a visual way possible.

Let's check two major cases

Data Retrieval Case 1

The user's question could be, "Hey, can you pull up the overall performance data for the past week? And show it to me in CNY, grouped by each day."

Intent recognition: What the user wants is the cost - as known as the field real_cost_v2 - from May 23rd to May 30th, 2024. The user wants it all summed up by the date, and the results should be in CNY.

Text2SQL as follow:

SELECT day, SUM(real_cost) / 1e6 AS real_cost FROM domain_db.ads_business_manage_middle_df WHERE day BETWEEN '2024-05-23' AND '2024-05-30' AND process_time < NOW() GROUP BY day;

During the processing, we hit a snag with "Column 'day' cannot be resolved." Turns out, 'day' was the hallucination. After some tweaking and optimizing the intent recognition process, we managed to sort it out. This case really highlights the issue of SQL accuracy.

Data Retrieval Case 2

The user's question could be, "What were the average daily costs of the last year and the year before the last year during the 618 shopping festival?"

Intent recognition: The user wants to know the average daily cost - as known as real cost - during the 618 shopping festival from June 1st to June 20th for both 2022 and 2023. The user is looking for the total 'real cost' divided by the number of days to get the average daily cost.

Text2SQL as follow:

SELECT SUM(CASE WHEN day_year = 2023 THEN real_cost ELSE 0 END) / COUNT(DISTINCT process_time) AS ytd_cost, SUM(CASE WHEN day_year = 2022 THEN real_cost ELSE 0 END) / COUNT(DISTINCT process_time) AS lastyear_cost FROM domain_ads_business_channel_middle_df WHERE process_time BETWEEN '2022-06-01' AND '2023-06-20';

This case highlights issues with the know-how. The concept of the 618 shopping festival wasn't grasped, and that needs to be added. Plus, the date range for the 618 sale was off. It should be from May 20th to June 20th.

Design Challenges

We're on a mission to create a slick Chat tool for both our web platform, Adata, and our mobile WeChat Mini-Program, Greenwich. This tool will let users get data analysis results for core performance metrics just by asking questions. It's like having a data genie at your fingertips, but only for the data you're allowed to see.

Problems

First up, we've got the know-how issue. We can gradually gather common concepts and feed them in. But accuracy, that's the big bad wolf standing in our users' way. Our goal is to get users' questions as close as possible to what the system can actually answer. We want to make sure when they ask, they get a solid, reliable response.

Now, AData is like a Swiss Army knife of web platforms, packed with features. The WeChat mini-program, on the other hand, is a whole different ball game with its own set of development rules. We need to use HTML 5 to make the interface look and feel the same across both. And here's the kicker, the WeChat mini-program has a voice input feature. That means we've got to make sure our tool plays nice on both the web platform and the WeChat mini-program, giving users a seamless experience whether they're typing or talking.

Objectives & Strategies

Technically speaking, aside from NPS and other User Experience Metrics, we can use a high engagement rate as a proxy for positive user experience. This includes the frequency of user inquiries, the number of questions asked in a single session, and the overall length of interactions.

However, the current model's comprehension of questions still needs improvement. Its accuracy can be enhanced by continuously collecting and analyzing 'bad cases' to refine its understanding. Ideally, if the model could accurately grasp the meaning behind a query in one go, it could complete the task in a single interaction.

At this relatively nascent stage, our user objective is to guide users to ask appropriate questions, ensuring that they can get accurate answers with fewer attempts. By continually optimizing based on 'bad cases,' we aim to improve accuracy. Only when accuracy is enhanced will users continue to engage with the model. Thus, our key user experience metrics at this stage are Average Conversation Length and Interaction Frequency.

Solutions

First-Time Use

Upon first entry, disclose the product's usage limitations, supporting only the query of performance data.

First-Time Use on Mobile

First-Time Use on Web

Rest Status

On the home page, provide common questions; once on the conversation page, reveal the thought process to help users understand the system.

Rest Status on Mobile

Rest Status on Web

Activated Status

After activating the input box, offer some question options to spark inspiration.

Activated Status on Mobile

Typing Status

After entering text in the input box, suggest associated questions.

Typing Status

WeChat Supports Voice2Text on Mobile Only

The WeChat Mini-Program Open API also offers voice-to-text functionality, allowing users to input their questions verbally.

WeChat Supports Voice2Text on Mobile Only

History

After the conversation ends, provide a history of the dialogue, showing the turns of the conversation to help users keep track of their inquiry process.

History on Mobile

History on Web

Delete

For unsatisfactory conversations, users can promptly delete them.

Delete on Mobile

Delete on Web

Effect

To be continued...

Last but not least

Compared to other large language model applications, this product is particularly cautious in its project advancement due to its involvement with ad performance. This means that many fancy features simply cannot be implemented. Given the timing of the product launch, it is not hard to imagine that our users have already developed certain habits and expectations. This has been the biggest challenge we face in advancing the product.