How experiments work
- Define variants: Different configurations to test
- Assign users: Deterministic assignment based on user ID hash
- Collect metrics: Track engagement per variant
- Analyze results: Statistical significance testing
- Promote winner: Apply winning config to all users
Creating an experiment
Step 1: Basic info
Go to Experiments → Create New:- Name: Descriptive name (e.g., “Recency vs Engagement Weight Test”)
- Description: Hypothesis and goals
- Type: What you’re testing
Experiment types
| Type | What You Can Test |
|---|---|
| Ranking | Signal weights, decay rates, thresholds |
| Layout | Feed orientation, height, adjacent feeds |
| Controls | Enabled controls, auto-hide settings |
| Theme | Colors, fonts, control styling |
| Ad frequency | Ad placement intervals, modes |
| Custom | Any SDK config parameter |
Step 2: Define variants
Add 2-4 variants:- Ranking example
- Layout example
- Ad frequency example
Variant A (Control): Current weightsVariant B (Test): More recency
Step 3: Traffic allocation
Set percentage of users for each variant:- Equal split: Automatically divide traffic
- Custom split: Set specific percentages (must sum to 100%)
Step 4: Success metrics
Select primary and secondary metrics: Primary metrics (choose 1-2):- Watch time per session
- Session duration
- Videos watched per session
- Completion rate
- Return rate (next day)
- Share rate
- Ad impressions
- Ad revenue
- Rebuffer rate
Step 5: Launch
Review and launch:- Preview how variants will look
- Set experiment duration (recommended: 2+ weeks)
- Click Launch
Managing experiments
Experiment states
| State | Description |
|---|---|
| Draft | Not yet launched, can edit |
| Running | Actively collecting data |
| Paused | Temporarily stopped |
| Completed | Reached end date or stopped |
| Archived | Historical record |
Monitoring
While running, view:- Users per variant
- Real-time metrics
- Statistical significance progress
Pausing
Pause an experiment to:- Investigate unexpected results
- Fix a bug in one variant
- Temporarily stop traffic split
Analyzing results
Results dashboard
View at Experiments → [Experiment] → Results:| Metric | Control | Test | Lift | Confidence |
|---|---|---|---|---|
| Watch time | 45.2s | 52.1s | +15.3% | 94% |
| Completion rate | 42% | 48% | +14.3% | 97% |
| Session length | 4.2min | 4.8min | +14.3% | 91% |
Statistical significance
Results show:- Lift: Percentage change vs. control
- Confidence: Probability that the difference is real (not random)
- Status: Significant (>95%), trending, or inconclusive
Segmented analysis
Break down results by:- Platform (iOS/Android/Web)
- Region
- User tenure (new vs. returning)
- Device type
Promoting a winner
When you have significant results:- Go to experiment results
- Click Promote Variant on the winning variant
- Choose:
- Apply to 100%: Immediately apply to all users
- Gradual rollout: Slowly increase from test percentage
Best practices
Experiment design
Test one thing at a time
Test one thing at a time
Changing multiple variables makes it hard to attribute results. If testing both ranking weights and ad frequency, run separate experiments.
Run long enough
Run long enough
Short experiments may show false positives. Run for at least 2 weeks to capture different usage patterns (weekdays vs. weekends, etc.).
Consider sample size
Consider sample size
More users = faster significant results. For smaller apps, plan for longer experiment duration.
Document hypotheses
Document hypotheses
Write down what you expect to happen and why. This helps interpret results and plan follow-up tests.
Common experiments
| Goal | What to Test |
|---|---|
| Increase watch time | Ranking weights, content diversity |
| Improve retention | Session start experience, personalization depth |
| Boost completion | Content length filters, quality thresholds |
| Increase ad revenue | Ad frequency, placement timing |
| Reduce rebuffering | ABR settings, prefetch depth |
Experiment history
View all past experiments at Experiments → Completed:- Search by name, type, or date
- Filter by outcome (positive lift, no change, negative)
- Export results for reporting
