An experimentation design document is crucial in helping an experiment owner think through all cases—to frame, provide clarity, and democratize knowledge cross functionally, improving future strategies and optimizing resources. It also enhances collaboration and decision-making within the team.
Scroll down for the template you can use immediately, it should take ~10mins to leverage it, enjoy!
1. Problem Statement 🚧
What is the business outcome?
Start with the outcome we want to improve. What metric or area shows there's an opportunity to drive meaningful impact?
What is the customer problem?
What problem are we solving, and for whom? Why does this problem exist? Focus on the user's experience and pain point.
What observation, data, or insight led us to identify the problem or opportunity?
What research, insight, or data surfaced this as something worth exploring?
How does this connect to the current growth model and strategy?
Which part of the growth model (e.g., acquisition, activation, retention) does this address? How does solving this problem tie back to the company’s broader strategy and mission?
You can also provide links to related artifacts e.g. user studies, research that’s been done, and previous experiments in this area.
2. Hypothesis 🧪
We believe that by changing the [independent variable] we expect [dependent variable] (not) to [increase/decrease] because [some reason].
Example:
We believe that by changing product suggestions at checkout from static to personalized we expect average revenue per user to increase because users are more likely to buy products that are relevant to them.
A hypothesis is something that we believe to be true based on what we know about users. Hypothesis should be testable and falsifiable — that means there’s something out there you can practically observe that would lead you to reconsider the hypothesis.
Start with the words “we believe”
Use the word “because”
Don’t use “if” or “then” (that’s a prediction)
Resources:
3. Experiment Design 📊
How are we designing this experiment to test our hypothesis? Which group of people are in the experiment? How does the Control vs Variant experience look like?
How long will we run the experiment? Has effort and time to build the test been identified? Ensure test does not have a collision.
Assets (add links)
Jira issue
Design mockups
Report that shows the current state of things
Figma or Miro comparing control vs treatment experience.
3.1. Experiment Cohort/Target Audience 🎯
Describe the user base we will target to run this experiment to, e.g.:
Markets: e.g. "US only", "All English", etc
Customer type: New, Existing, Paid, Free, etc
Signup pathways: All, Front of Site, Email, etc
Tiers: Basic, etc.
Other: (for example, users who did x action or users who have specific characteristics - whatever it may be)
3.2. Test Length ⏳
Given how we are measuring the test, and the size of the audience, how long will it take us to get to significance?
Use a test duration estimator to state how long it will take for the experiment to reach statistical significance.
Example:Test should take 2 weeks to get to significance
4. Metrics 📈
Describe exactly which metrics we are measuring to determine results for the test. List primary and secondary metrics if applicable. Please include baseline metrics (e.g. the control). Describe how we will calculate the metric if it is not a standard metric or if we are inferring from behavior. For example, if the goal is conversion improvement but we are using clicks on the plan page options as a leading indicator, spell that out. Any other metrics that could be cannibalized should be considered and documented
Example:
Primary test metric: Conversion rate for New Users. Current baseline is x%
Secondary metrics: Plan mix, Term mix, etc. Current baselines are...
Other metrics to consider or be aware of for this test that might be impacted for analysis.
You can also include links to relevant reports, or other artifacts that may help whoever is performing or interpreting the analysis.
4.1. Instrumentation 📡
Do we have all the event logging in place to gather these metrics? What events and properties are missing?
Provide a list of tracking which will be used for the experiment so it can be verified both during development and post release by examining data reported. It can be specific guidelines and let developers format the name or it can be explicit names required.
Please provide when these should fire.
5. Definition of Success 🏆
We will determine that the variant is a winner if it… [increase/decrease the primary metric by ≥ x%] higher than the control group at statistical significance.
6. Pre-Mortem ⚰️
This is to help increase test preparation. Based on the potential results, what actions will we take?
Implementation plan: How will we take the experiment variation and turn it into permanent product experience if it wins?
Iteration plan: What about our initial assumption has changed? Will we form a new hypothesis?
Expansion plan: How will we apply the learnings from this experiment widely and double down?
Holdout groups: Does it have potential for actual win to be different when made permanent product experience and should you use holdout groups or follow up data checks?
7. Analysis/Results 📊
Our hypothesis turned out to be [correct / incorrect].
You can post a link to a spreadsheet or other document if results are captured elsewhere and link to funnels demonstrating user behavior, etc.
8. Learnings 🧠
We learned that… What did we learn from running this experiment? How do these learnings impact the next steps? Tie them back to the pre-mortem test preparation work and share out the learnings widely.
When sharing out the learnings widely, it may help to categorize them i.e, high value vs informational.
Additional Notes:
Feel free to copy and customize the Experimentation design template.
Experimentation is still expensive so ensure that one is needed (This is a good post that talks about when you might not need one).
You might also want to evaluate if the test is complex enough that you need to run an A/A test. (Test set ups in new surfaces or more than one surface often require both preplanning and an a/a test run to ensure data collection is set up for analysis)
You can create an experiment writeup document using the template and complete steps 1-6 before development begins.
Depending on the documentation tools you use, eg. Confluence or Notion, you can enable creating this documentation using this template with the click of a button.
This template is continually evolving, and your input is appreciated. Feel free to share any suggestions or comments you may have, and enjoy using it!