16 minute read

How to Run Incrementality Tests: A Step-by-Step Methodology Guide

Written by

Matt Pattoli

Founder at Cometly

Follow On YouTube

Published on

February 23, 2026

Copy link

Learn how Cometly can help you pinpoint channels driving revenue.

You're spending thousands on ads, but how much of that revenue would have happened anyway? This is the fundamental question incrementality testing answers. Unlike attribution models that assign credit after the fact, incrementality testing reveals the true causal impact of your marketing—showing you which campaigns actually drive new conversions versus those riding the coattails of organic demand.

Think of it this way: if you turned off your Facebook ads tomorrow, would your sales drop? And if so, by how much? Attribution might tell you Facebook "gets credit" for 30% of conversions, but incrementality testing shows you the actual lift those ads create above what would have happened organically.

This distinction matters because it changes everything about how you allocate budget. Channels that look profitable through attribution might be capturing demand that would have converted anyway. Meanwhile, channels creating genuine new demand might be undervalued in your current measurement framework.

This guide walks you through a proven incrementality testing methodology for designing, executing, and analyzing tests that give you actionable answers. By the end, you'll know exactly how to measure the true incremental lift of any channel, campaign, or tactic in your marketing mix—and make budget decisions based on causal impact rather than correlation.

Step 1: Define Your Test Hypothesis and Success Metrics

Every incrementality test starts with a clear, testable hypothesis. This isn't a vague question like "Does Facebook work?" Instead, you need a specific prediction about what will happen when you change something.

A strong hypothesis follows this format: "If we [specific action], then [specific metric] will change by [expected direction and rough magnitude]." For example: "If we pause Facebook prospecting campaigns for two weeks, new customer acquisitions will decrease by 15-25%." This gives you a concrete prediction to test against reality.

The key is specificity. You're not testing whether Facebook "contributes" to conversions—attribution already tells you that. You're testing whether Facebook creates incremental conversions that wouldn't happen without it. This subtle shift in thinking separates correlation from causation.

Next, select your primary success metrics. For most incrementality tests, you'll focus on three core measurements: incremental conversions, incremental revenue, and cost per incremental conversion. These metrics tell you not just whether your marketing works, but how efficiently it drives new business growth.

Incremental conversions represent the additional conversions created by your marketing above the baseline. Incremental revenue shows the actual dollar impact. Cost per incremental conversion reveals your true efficiency—and this number often looks very different from your standard CPA once you account for conversions that would have happened anyway.

Before launching your test, set your minimum detectable effect size. This is the smallest lift that would actually matter to your business. If a 5% improvement wouldn't change your budget decisions, don't design a test to detect 5% changes. Focus on effect sizes that cross decision thresholds.

Document your baseline performance thoroughly. Record current conversion rates, revenue per user, and any relevant behavioral metrics for both your eventual test and control groups. This baseline becomes your comparison point for measuring lift. Without it, you're flying blind.

Finally, establish your confidence requirements upfront. Most marketing tests use 95% confidence (p-value < 0.05), meaning you're willing to accept a 5% chance of false positives. Higher confidence requires larger samples and longer tests, so align this decision with your risk tolerance and business constraints.

Step 2: Choose Your Test Design and Audience Segmentation

Your test design determines how you create control and test groups. The right approach depends on what you're testing and which platforms you're using. Three primary methodologies dominate incrementality testing, each with distinct advantages.

Geographic holdout tests split audiences by region or designated market area (DMA). You continue running ads in some markets while pausing them in others, then compare conversion behavior between the two groups. This approach works exceptionally well for channel-level measurement and brand awareness testing.

Major brands frequently use geo-holdouts because they're clean and platform-agnostic. You're not relying on cookie-based tracking or platform features—you're simply comparing matched markets where one sees your ads and one doesn't. The challenge lies in finding truly comparable markets and accounting for regional differences in customer behavior.

When selecting geographic holdouts, match markets on key characteristics: population size, demographics, historical conversion rates, competitive intensity, and seasonality patterns. You want markets that behave similarly under normal conditions, so any differences during the test can be attributed to your marketing intervention.

Ghost ads or PSA tests take a different approach. Instead of showing no ads to the control group, you show them placebo ads—typically public service announcements or generic brand messages. The test group sees your actual campaign creative. This method works well for creative-level testing and situations where you want to measure specific ad impact while maintaining consistent reach.

The advantage of ghost ads is that both groups receive the same ad frequency and placement, isolating the impact of your specific creative or offer. The control group still sees an ad in their feed, so you're measuring the incremental value of your message, not just the value of occupying ad space.

User-level randomization offers the most precise measurement when platform capabilities allow it. You randomly assign individual users to test or control groups, ensuring statistical comparability at the individual level. Facebook's conversion lift studies and similar platform tools use this methodology.

The key advantage is statistical power. With random assignment, you can be confident that differences between groups result from your marketing, not pre-existing behavioral differences. However, this approach requires platform support and careful implementation to prevent contamination between groups.

Regardless of which design you choose, ensure your control and test groups are statistically comparable before the test begins. Match on demographics, past purchase behavior, engagement history, and any other factors that might influence conversion likelihood. The more similar your groups at baseline, the more confident you can be in your results.

For geo-holdouts, consider using matched pairs. Instead of randomly selecting holdout markets, pair each test market with a similar control market based on historical performance. This pairing increases statistical power and helps control for regional variation that might otherwise obscure your results.

Step 3: Calculate Sample Size and Test Duration

Running a test without proper sample size is like trying to measure temperature with a broken thermometer. You'll get a number, but it won't tell you anything reliable. Power analysis determines the minimum sample size needed to detect your target effect with statistical confidence.

Start with your baseline conversion rate, expected lift, and desired confidence level. These three inputs drive your sample size calculation. Higher baseline conversion rates require smaller samples. Larger expected lifts are easier to detect. Higher confidence requirements demand more data.

Here's the challenge: detecting small effects requires large samples. If your baseline conversion rate is 2% and you want to detect a 10% relative lift (an absolute increase to 2.2%), you'll need significantly more traffic than if you're looking for a 50% lift. This is why setting realistic minimum detectable effects matters.

Many marketers underestimate required sample sizes and end up with inconclusive tests. A test that's underpowered can't distinguish between "this didn't work" and "we didn't collect enough data to tell." Build your sample size calculations before you commit to a test design.

Test duration connects directly to sample size. You need enough time to accumulate sufficient conversions in both test and control groups. But duration isn't just about hitting a number—it's about capturing the full customer journey.

Consider your typical purchase cycle. If customers often research for two weeks before converting, a three-day test will miss delayed conversions. Many practitioners recommend running tests for at least two full purchase cycles to capture both immediate and delayed impact.

Seasonality adds another layer of complexity. Running a test during Black Friday week will give you very different results than running it in January. Your test period should represent typical business conditions, or at minimum, you should account for seasonal factors in your analysis.

Build buffer time into your test duration for external factors. Competitor campaigns, market events, platform algorithm changes, and unexpected news cycles can all influence results. A longer test period helps these factors average out, giving you more stable measurements.

For most channel-level incrementality tests, plan for at least two to four weeks of measurement. Creative-level tests might run shorter if you have high traffic volume. Brand awareness tests often require longer periods to capture the full impact of brand building on conversion behavior.

Don't stop your test early just because you see interesting results. Pre-commit to your test duration and sample size requirements. Early stopping based on promising interim results is a common mistake that leads to false positives and overestimated effects.

Step 4: Execute the Test with Proper Controls

A well-designed test means nothing if execution introduces contamination or measurement errors. Clean execution requires vigilance across multiple dimensions: audience isolation, spend consistency, tracking accuracy, and environmental controls.

First, implement a truly clean holdout. Your control group must have zero exposure to the marketing you're testing. This sounds obvious, but it's harder than it appears. Users switch devices, share households, and browse in different contexts. A user in your control group who sees your ad on a different device or through a different channel creates contamination.

For geo-holdouts, ensure your advertising platforms respect geographic boundaries. Some programmatic advertising can bleed across market borders. Verify that users in holdout markets aren't seeing your ads through any channel—paid search, display, social, or otherwise.

Maintain consistent spend and creative in your test group throughout the measurement period. If you're testing Facebook's incrementality, keep your Facebook campaigns running exactly as they normally would. Don't make optimization changes, don't launch new creative, don't adjust budgets. Any changes during the test period confound your results.

This consistency requirement extends to other marketing channels. If you're testing Facebook but simultaneously launch a major Google Ads push, you've introduced a confounding variable. Try to keep all other marketing activities stable during your test period, or at minimum, ensure changes affect both test and control groups equally.

Track both groups using server-side tracking for accurate measurement across devices and contexts. Cookie-based tracking misses cross-device behavior and gets blocked by privacy tools. Server-side tracking captures conversions more completely, reducing measurement error that could obscure your true incremental lift.

With platforms like Cometly, you can implement server-side tracking that captures every touchpoint across the customer journey, ensuring your incrementality measurements reflect actual behavior rather than tracking limitations. This complete view becomes critical when users interact with your brand across multiple devices before converting.

Monitor for data leakage between groups throughout your test. Set up alerts for unusual patterns: control group users showing up in platform pixel data, unexpected geographic overlap, or demographic shifts that suggest contamination. Catching these issues early lets you adjust or restart the test before investing weeks in flawed data.

Document everything about your test execution. Record exact start and end times, any platform changes or outages, competitor activity you observe, and external events that might influence results. This documentation becomes invaluable during analysis when you need to explain unexpected patterns or validate your findings.

Step 5: Analyze Results and Calculate Incremental Lift

Your test has run its course. Now comes the moment of truth: calculating the actual incremental impact of your marketing. This analysis reveals not just whether your marketing works, but precisely how much value it creates above baseline organic conversions.

Start by comparing conversion rates between your test and control groups. The formula is straightforward: take the test group conversion rate, subtract the control group conversion rate, then divide by the control group conversion rate. This gives you the percentage lift created by your marketing.

For example, if your test group converted at 3.5% and your control group at 3.0%, your lift is (3.5% - 3.0%) / 3.0% = 16.7%. This means your marketing increased conversions by 16.7% above what would have happened organically. Understanding what incrementality in marketing truly means helps contextualize these calculations.

Next, calculate incremental conversions—the actual number of additional conversions your marketing created. This requires a slightly different calculation than you might expect. Take your total test group conversions and multiply by (Lift / (1 + Lift)). This formula accounts for the fact that some test group conversions would have happened anyway.

Using the example above: if your test group had 1,000 conversions, your incremental conversions would be 1,000 × (0.167 / 1.167) = 143 incremental conversions. The other 857 conversions would have happened even without your marketing. This is the sobering reality that incrementality testing reveals.

Now calculate your cost per incremental conversion—your true efficiency metric. Divide your total marketing spend during the test by your incremental conversions only. This number often looks very different from your standard CPA, because standard CPA includes conversions that would have happened anyway.

If you spent $10,000 on the marketing being tested and generated 143 incremental conversions, your cost per incremental conversion is $69.93. Even if your standard CPA was $10, your incremental CPA tells you what you're actually paying for new business growth. This distinction changes budget allocation decisions.

Before making any decisions based on your results, run statistical significance tests. Calculate p-values and confidence intervals to determine whether your observed lift is real or could have occurred by chance. A p-value below 0.05 indicates 95% confidence that your results aren't due to random variation.

Confidence intervals matter just as much as point estimates. A lift of 15% with a confidence interval of [12%, 18%] tells you something very different from a lift of 15% with a confidence interval of [-5%, 35%]. The first result is actionable. The second suggests you need more data.

Look beyond simple conversion metrics to understand the full picture. Analyze incremental revenue, not just incremental conversions. Check whether your marketing drove higher-value customers or simply more customers. Examine time-to-conversion patterns to understand whether your marketing accelerated purchases that would have happened eventually.

Compare your incrementality results to what attribution suggested. If attribution credited your channel with 40% of conversions but incrementality shows only 15% lift, you've uncovered significant over-attribution. This gap reveals how much credit your channel was getting for conversions it didn't actually cause.

Step 6: Apply Insights to Budget Allocation and Scaling

Incrementality data transforms from interesting analysis to business impact when you use it to reallocate budget. This step separates marketers who measure from marketers who optimize based on measurement.

Compare incremental CPA across all channels where you've run tests. This comparison reveals your true efficiency hierarchy. A channel with a $30 standard CPA but $60 incremental CPA is less efficient than a channel with $40 standard CPA but $45 incremental CPA. The second channel creates more new demand per dollar spent.

Reallocate budget from low-incrementality to high-incrementality campaigns. This sounds simple, but it requires courage. You're often moving money away from channels that look good in attribution reports toward channels that create more genuine new demand. Your overall reported conversions might initially appear to drop, even as your actual business growth accelerates.

Start with modest reallocation—shift 10-20% of budget based on your incrementality findings. Monitor the impact carefully. If your hypothesis was correct, you should see improved overall efficiency even as attribution numbers shift. This validation builds confidence for larger reallocation decisions.

Feed your incrementality learnings back into your attribution models. If you know Facebook creates 15% incremental lift, you can adjust how much credit your attribution model assigns to Facebook touches. This creates a feedback loop where incrementality testing improves ongoing measurement, which informs better daily optimization decisions.

Platforms like Cometly let you compare attribution models against incrementality findings, helping you identify which model best reflects true causal impact. This comparison helps you trust your day-to-day optimization decisions while planning your next round of incrementality tests.

Plan follow-up tests to validate findings and measure at different spend levels. Incrementality isn't static—it changes with budget scale, competitive intensity, and market saturation. A channel might show strong incrementality at $5,000 per month but diminishing returns at $20,000 per month. Regular testing reveals these inflection points.

Test your highest-spend channels first, then work down your budget allocation. The channels where you spend the most have the biggest potential impact from optimization. Even small efficiency improvements in major channels drive meaningful business results. Implementing effective ad spend optimization strategies becomes much easier when you understand true incrementality.

Document your testing roadmap for the next 12 months. Plan to test each major channel at least annually, with more frequent testing for channels where you're actively scaling spend. This systematic approach to incrementality testing builds a competitive advantage through superior measurement.

Putting It All Together

Incrementality testing transforms marketing from guesswork into science. By following this methodology—defining clear hypotheses, designing rigorous tests, calculating proper sample sizes, executing with clean controls, and analyzing for true lift—you'll finally know which marketing dollars drive real business growth versus which are simply claiming credit for conversions that would have happened anyway.

The marketers who master incrementality testing don't just optimize campaigns; they build sustainable competitive advantages through superior measurement. While competitors chase vanity metrics and over-attributed conversions, you'll be allocating budget based on causal impact and genuine incremental value creation.

Start with your highest-spend or most-questioned channel. Run a clean geo-holdout test following the methodology outlined here. Let the data guide your next budget decision. The insights you gain will pay dividends far beyond the immediate test results—they'll reshape how you think about marketing effectiveness.

Remember that incrementality testing isn't a one-time project. It's an ongoing discipline that evolves with your marketing mix. As you scale channels, launch new campaigns, and enter new markets, your incrementality dynamics change. Regular testing keeps your understanding current and your budget allocation optimal.

The path forward is clear: hypothesis, design, execution, analysis, action. Each test builds your knowledge base and refines your measurement capabilities. Over time, you'll develop intuition about which channels create genuine demand and which ride the coattails of organic growth.

Ready to elevate your marketing game with precision and confidence? Discover how Cometly's AI-driven recommendations can transform your ad strategy—Get your free demo today and start capturing every touchpoint to maximize your conversions. With server-side tracking, comprehensive attribution models, and AI-powered insights, you'll have everything you need to run incrementality tests that reveal the true impact of your marketing investments.

Learn how Cometly can help you pinpoint channels driving revenue.

Join Data Driven ads community