---
title: "Splendor Analytics"
subtitle: "Trial Activation Intelligence Report"
author: "Freda Erinmwingbovo"
date: today
format:
html:
theme: [darkly, splendor.scss]
toc: true
toc-depth: 3
toc-title: "Report Contents"
toc-location: left
number-sections: false
code-fold: true
code-tools: true
code-summary: "Show Code"
df-print: paged
page-layout: full
smooth-scroll: true
self-contained: true
execute:
warning: false
message: false
cache: true
jupyter: python3
jupyter-options:
python: "C:/Users/engrf/AppData/Local/Programs/Python/Python313/python.exe"
---
```{=html}
<div class="spl-cover">
<!-- Top status bar -->
<div class="spl-topbar">
<div class="spl-topbar-left">
<span class="spl-dot"></span>
<span class="spl-platform">SPLENDOR ANALYTICS · WORKFORCE MANAGEMENT PLATFORM</span>
</div>
<span class="spl-classification">PRODUCT INTELLIGENCE REPORT</span>
</div>
<!-- Hero body — left title, right summary -->
<div class="spl-hero">
<div class="spl-hero-left">
<div class="spl-tag">Trial Activation Intelligence</div>
<h1 class="spl-title">Defining<br/>What a Good<br/>Trial Looks Like</h1>
<div class="spl-divider"></div>
<p class="spl-tagline">
966 organisations. 30 days each.<br/>
Only 1 in 5 converts. We investigated why.
</p>
</div>
<div class="spl-hero-right">
<div class="spl-watermark">TRL</div>
<div class="spl-summary">
<div class="spl-summary-item">
<span class="spl-summary-num">170,526</span>
<span class="spl-summary-lbl">Raw Behavioural Events</span>
</div>
<div class="spl-summary-item">
<span class="spl-summary-num">966</span>
<span class="spl-summary-lbl">Trialling Organisations</span>
</div>
<div class="spl-summary-item">
<span class="spl-summary-num">21.3%</span>
<span class="spl-summary-lbl">Overall Conversion Rate</span>
</div>
<div class="spl-summary-item">
<span class="spl-summary-num">28</span>
<span class="spl-summary-lbl">Activities Tracked</span>
</div>
</div>
</div>
</div>
<!-- Bottom metadata bar -->
<div class="spl-metabar">
<div class="spl-meta">
<span class="spl-meta-label">ANALYST</span>
<span class="spl-meta-value">Freda Erinmwingbovo</span>
</div>
<div class="spl-meta">
<span class="spl-meta-label">PREPARED FOR</span>
<span class="spl-meta-value">Splendor Analytics Product Team</span>
</div>
<div class="spl-meta">
<span class="spl-meta-label">DATE</span>
<span class="spl-meta-value">March 2026</span>
</div>
<div class="spl-meta">
<span class="spl-meta-label">TRIAL PERIOD</span>
<span class="spl-meta-value">January – April 2024</span>
</div>
</div>
</div>
<style>
/* ── SPLENDOR COVER ── */
.spl-cover {
background: linear-gradient(160deg, #020617 0%, #0c1a2e 60%, #020617 100%);
border: 1px solid #1e3a5f;
border-radius: 10px;
overflow: hidden;
margin: 0 0 3rem 0;
position: relative;
}
/* Top bar */
.spl-topbar {
display: flex;
justify-content: space-between;
align-items: center;
padding: 12px 36px;
background: rgba(8, 145, 178, 0.06);
border-bottom: 1px solid #1e3a5f;
}
.spl-topbar-left {
display: flex;
align-items: center;
gap: 10px;
}
.spl-dot {
width: 7px;
height: 7px;
border-radius: 50%;
background: #06d6a0;
box-shadow: 0 0 6px #06d6a0;
display: inline-block;
}
.spl-platform {
font-family: 'Space Mono', monospace;
font-size: 0.65rem;
letter-spacing: 3px;
color: #0891b2;
}
.spl-classification {
font-family: 'Space Mono', monospace;
font-size: 0.62rem;
letter-spacing: 2px;
color: #94a3b8;
}
/* Hero section */
.spl-hero {
display: flex;
align-items: stretch;
min-height: 320px;
}
/* Left — title */
.spl-hero-left {
flex: 1.1;
padding: 48px 40px 40px;
border-right: 1px solid #1e3a5f;
display: flex;
flex-direction: column;
justify-content: center;
}
.spl-tag {
font-family: 'Space Mono', monospace;
font-size: 0.65rem;
letter-spacing: 4px;
text-transform: uppercase;
color: #06d6a0;
margin-bottom: 20px;
}
.spl-title {
font-family: 'Playfair Display', serif !important;
font-size: clamp(2rem, 4vw, 3.2rem) !important;
font-weight: 900 !important;
line-height: 1.12 !important;
color: #ffffff !important;
margin: 0 0 24px 0 !important;
border: none !important;
padding: 0 !important;
letter-spacing: -0.02em;
}
.spl-divider {
width: 48px;
height: 3px;
background: linear-gradient(90deg, #0891b2, #06d6a0);
border-radius: 2px;
margin-bottom: 20px;
}
.spl-tagline {
font-size: 0.92rem;
color: #94a3b8;
line-height: 1.9;
margin: 0;
font-style: italic;
}
/* Right — stats */
.spl-hero-right {
flex: 0.9;
padding: 48px 40px 40px;
display: flex;
flex-direction: column;
justify-content: center;
position: relative;
}
.spl-watermark {
position: absolute;
top: 50%;
right: 20px;
transform: translateY(-50%);
font-family: 'Playfair Display', serif;
font-size: 9rem;
font-weight: 900;
color: rgba(8, 145, 178, 0.04);
pointer-events: none;
line-height: 1;
letter-spacing: -0.05em;
}
.spl-summary {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 1.5rem 2rem;
position: relative;
z-index: 1;
}
.spl-summary-item {
display: flex;
flex-direction: column;
gap: 4px;
}
.spl-summary-num {
font-family: 'Playfair Display', serif;
font-size: 1.7rem;
font-weight: 700;
color: #0891b2;
line-height: 1;
}
.spl-summary-lbl {
font-family: 'Space Mono', monospace;
font-size: 0.6rem;
letter-spacing: 0.08em;
text-transform: uppercase;
color: #94a3b8;
}
/* Bottom metadata bar */
.spl-metabar {
display: flex;
flex-wrap: wrap;
border-top: 1px solid #1e3a5f;
background: rgba(8, 145, 178, 0.03);
}
.spl-meta {
display: flex;
flex-direction: column;
gap: 4px;
padding: 18px 32px;
border-right: 1px solid #1e3a5f;
flex: 1;
min-width: 150px;
}
.spl-meta-label {
font-family: 'Space Mono', monospace;
font-size: 0.58rem;
letter-spacing: 2.5px;
color: #94a3b8;
text-transform: uppercase;
}
.spl-meta-value {
font-size: 0.86rem;
font-weight: 500;
color: #e2e8f0;
font-family: 'Inter', sans-serif;
}
</style>
```
---
## Executive Summary
Splendor Analytics runs a 30-day free trial for new organisations signing up to its
workforce management platform. Roughly 1 in 5 trialists converts to a paying customer,
but the product team has no clarity on who is on track to convert, when to intervene,
or which features actually matter to the decision.
This report delivers a rigorous, end-to-end investigation into trial activation:
analysing 966 trialling organisations across 28 behavioural activities to define
what a good trial looks like, build the data infrastructure to track it, and surface
the product metrics that matter most.
```{=html}
<div class="kpi-grid">
<div class="kpi-card">
<div class="kpi-value">966</div>
<div class="kpi-label">Trialling Organisations</div>
</div>
<div class="kpi-card">
<div class="kpi-value positive">206</div>
<div class="kpi-label">Converted to Paid</div>
</div>
<div class="kpi-card">
<div class="kpi-value warning">21.3%</div>
<div class="kpi-label">Conversion Rate</div>
</div>
<div class="kpi-card">
<div class="kpi-value warning">24.3%</div>
<div class="kpi-label">Trial Activation Rate</div>
</div>
<div class="kpi-card">
<div class="kpi-value alert">49%</div>
<div class="kpi-label">Convert After Trial Ends</div>
</div>
<div class="kpi-card">
<div class="kpi-value alert">64%</div>
<div class="kpi-label">Active Only 1 Day</div>
</div>
</div>
```
| # | Investigation Theme | Finding |
|---|---------------------|---------|
| 1 | Data Cleaning & Quality | 67,631 duplicates removed (39.7% of raw data) |
| 2 | Conversion Driver Analysis | No single activity significantly predicts conversion |
| 3 | Trial Goal Definition | Three evidence-informed goals defined with 1.13x lift |
| 4 | SQL Data Infrastructure | Two mart tables built and verified |
| 5 | Product Metrics | Critical drop-off, module adoption, and cohort gaps identified |
---
## Business Context
### The Problem
Splendor Analytics offers new organisations a 30-day free trial of its workforce
management platform: a system covering shift scheduling, time tracking, payroll
approvals, and team communications. The product team knows that approximately
1 in 5 trialists converts to a paying customer, but they cannot answer three
critical operational questions:
- **Who** is on track to convert, and who is heading toward churn?
- **When** should the team intervene to save a struggling trialist?
- **Which features** actually matter to the conversion decision?
Without answers, every onboarding improvement is a guess. Intervention resources
are allocated randomly. And the definition of a "successful trial" remains
entirely subjective.
### The Objective
This investigation sets out to define **Trial Activation**: a specific, measurable
set of in-app behaviours that signal a trialist has genuinely experienced the
platform's core value. Once defined, activation becomes the north star metric
the product team can track, optimise, and act on.
Three deliverables were produced:
1. A rigorous Python analysis identifying conversion drivers and defining trial goals
2. A production-ready SQL data infrastructure to track activation at scale
3. A suite of product metrics informing immediate strategic decisions
---
## Data Infrastructure and Cleaning
### Library Imports and Setup
```{python}
# ── LIBRARY IMPORTS ──
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
import sqlite3
import os
import warnings
warnings.filterwarnings('ignore')
from scipy.stats import chi2_contingency, mannwhitneyu
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.preprocessing import StandardScaler
# ── DISPLAY SETTINGS ──
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.2f}'.format)
# ── SPLENDOR COLOUR PALETTE ──
CYAN = '#00b4d8'
MINT = '#06d6a0'
AMBER = '#ffb703'
RED = '#ff6b6b'
BG = '#0a0f1e'
SURFACE = '#111827'
MUTED = '#94a3b8'
TEXT = '#e2e8f0'
GRID = '#1f2d40'
# ── GLOBAL MATPLOTLIB STYLE ──
plt.rcParams.update({
'figure.facecolor' : BG,
'axes.facecolor' : SURFACE,
'axes.edgecolor' : GRID,
'axes.labelcolor' : TEXT,
'axes.titlecolor' : TEXT,
'xtick.color' : MUTED,
'ytick.color' : MUTED,
'grid.color' : GRID,
'grid.linestyle' : '--',
'grid.linewidth' : 0.7,
'text.color' : TEXT,
'font.family' : 'sans-serif',
'legend.facecolor' : SURFACE,
'legend.edgecolor' : GRID,
})
os.makedirs('assets', exist_ok=True)
print("Environment ready.")
```
### Data Loading
```{python}
# ── DATA LOADING ──
df = pd.read_csv('data/DA_task.csv')
df.columns = df.columns.str.lower()
print(f"Raw dataset shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
df.head(3)
```
### Data Type Conversion
```{python}
# ── DATETIME CONVERSION ──
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['trial_start'] = pd.to_datetime(df['trial_start'])
df['trial_end'] = pd.to_datetime(df['trial_end'])
df['converted_at'] = pd.to_datetime(df['converted_at'])
print("Data types after conversion:")
print(df.dtypes)
print(f"\nDate range: {df['timestamp'].min()} → {df['timestamp'].max()}")
```
### Quality Checks and Deduplication
```{python}
# ── MISSING VALUES ──
print("Missing values per column:")
print(df.isnull().sum())
# ── DUPLICATE DETECTION ──
original_rows = len(df)
dupes = df.duplicated(
subset=['organization_id','activity_name','timestamp']).sum()
print(f"\nFully duplicate rows detected: {dupes:,} ({dupes/original_rows*100:.1f}%)")
# ── REMOVE DUPLICATES ──
df = df.drop_duplicates(
subset=['organization_id','activity_name','timestamp'])
print(f"\nRows before deduplication: {original_rows:,}")
print(f"Rows after deduplication: {len(df):,}")
print(f"Duplicate rows removed: {original_rows - len(df):,}")
```
```{=html}
<div class="insight-box alert">
<strong>Critical Data Quality Finding</strong>
39.7% of the raw dataset (67,631 of 170,526 rows) were exact duplicates
of the same event recorded at the same timestamp for the same organisation.
Any analysis conducted on the raw data without this correction would have
produced materially distorted results. All findings in this report are based
on the deduplicated dataset of 102,895 clean events.
</div>
```
### Feature Engineering
```{python}
# ── DERIVED FEATURES ──
df['days_into_trial'] = (
df['timestamp'] - df['trial_start']).dt.days
df['days_to_convert'] = (
df['converted_at'] - df['trial_start']).dt.days
df['trial_duration'] = (
df['trial_end'] - df['trial_start']).dt.days
# Validation checks
print(f"Trial duration, all orgs: "
f"{df['trial_duration'].unique()} days")
print(f"Events before trial start: "
f"{(df['timestamp'] < df['trial_start']).sum()}")
print(f"Events after trial end: "
f"{(df['timestamp'] > df['trial_end']).sum()}")
print(f"\nDate range: {df['days_into_trial'].min()} "
f"to {df['days_into_trial'].max()} days into trial")
```
### Organisation-Level Feature Matrix
```{python}
# ── ORG-LEVEL MATRIX ──
org_df = df.groupby('organization_id').agg(
converted = ('converted', 'first'),
trial_start = ('trial_start', 'first'),
trial_end = ('trial_end', 'first'),
converted_at = ('converted_at', 'first'),
days_to_convert = ('days_to_convert', 'first'),
total_events = ('activity_name', 'count'),
unique_activities= ('activity_name', 'nunique'),
active_days = ('days_into_trial', 'nunique'),
first_event_day = ('days_into_trial', 'min'),
last_event_day = ('days_into_trial', 'max'),
).reset_index()
org_df['breadth_ratio'] = org_df['unique_activities'] / 28
org_df['event_intensity'] = (
org_df['total_events'] / org_df['active_days'])
# Activity pivot: counts and binary
activity_pivot = df.pivot_table(
index='organization_id', columns='activity_name',
values='timestamp', aggfunc='count',
fill_value=0).reset_index()
activity_pivot.columns.name = None
activity_cols = [c for c in activity_pivot.columns
if c != 'organization_id']
org_df = org_df.merge(activity_pivot,
on='organization_id', how='left')
print(f"Organisation matrix: {org_df.shape[0]} orgs × "
f"{org_df.shape[1]} features")
print(f"\nConverted: {org_df['converted'].sum()} "
f"({org_df['converted'].mean()*100:.1f}%)")
print(f"Not converted: {(~org_df['converted']).sum()} "
f"({(~org_df['converted']).mean()*100:.1f}%)")
```
### Cleaning Summary
```{python}
# ── CLEANING SUMMARY ──
print("=" * 55)
print(" DATA CLEANING SUMMARY")
print("=" * 55)
print(f"\n Raw rows: 170,526")
print(f" Duplicate rows removed: 67,631 (39.7%)")
print(f" Clean rows: {len(df):,}")
print(f" Unique organisations: {org_df.shape[0]:,}")
print(f" Features per org: {org_df.shape[1]:,}")
print(f" Activities tracked: 28")
print(f" Trial period: 30 days (all orgs)")
print(f" Date range: Jan 2024 – Apr 2024")
print(f"\n Converted orgs: 206 (21.3%)")
print(f" Non-converted orgs: 760 (78.7%)")
print(f"\n Avg events, converters: "
f"{org_df[org_df['converted']==True]['total_events'].mean():.0f}")
print(f" Avg events, non-converters: "
f"{org_df[org_df['converted']==False]['total_events'].mean():.0f}")
print(f"\n Avg unique activities, converters: "
f"{org_df[org_df['converted']==True]['unique_activities'].mean():.1f}")
print(f" Avg unique activities, non-converters: "
f"{org_df[org_df['converted']==False]['unique_activities'].mean():.1f}")
print("\n" + "=" * 55)
print(" DATA CLEANING COMPLETE.")
print("=" * 55)
```
```{=html}
<div class="insight-box">
<strong>Cleaning Interpretation</strong>
After removing 67,631 duplicate rows, the clean dataset contains 102,895 events
across 966 organisations. All 30-day trial windows are consistent with zero
temporal anomalies. The most striking early finding: converters and non-converters
show virtually identical average event counts (108 vs 106) and identical average
unique activities (2.8 vs 2.8). Volume of usage alone does not explain conversion:
the signal lies deeper.
</div>
```
---
# Analysis
::: {.panel-tabset}
## Exploratory Analysis
### Business Question
What does the data reveal about trial behaviour, conversion patterns, and
feature usage across 966 trialling organisations?
### Methodology
Exploratory Data Analysis was conducted across four dimensions: conversion
overview, activity adoption by conversion status, conversion timing, and
trial activity timeline. All findings informed the subsequent conversion
driver analysis and trial goal definition.
### Visualisation Setup
```{python}
# ── ACTIVITY NAME MAP ──
name_map = {
'Absence.Request.Approved' : 'Absence Approved',
'Absence.Request.Created' : 'Absence Created',
'Absence.Request.Rejected' : 'Absence Rejected',
'Break.Activate.Finished' : 'Break Finished',
'Break.Activate.Started' : 'Break Started',
'Communication.Message.Created' : 'Message Created',
'Integration.Xero.PayrollExport.Synced' : 'Payroll Synced',
'Mobile.Schedule.Loaded' : 'Schedule Loaded',
'PunchClock.Entry.Edited' : 'Punch Edited',
'PunchClock.PunchedIn' : 'Punched In',
'PunchClock.PunchedOut' : 'Punched Out',
'PunchClockEndNote.Add.Completed' : 'End Note Added',
'PunchClockStartNote.Add.Completed' : 'Start Note Added',
'Revenue.Budgets.Created' : 'Budget Created',
'Scheduling.Availability.Set' : 'Availability Set',
'Scheduling.OpenShiftRequest.Approved' : 'Open Shift Approved',
'Scheduling.OpenShiftRequest.Created' : 'Open Shift Created',
'Scheduling.Shift.Approved' : 'Shift Approved',
'Scheduling.Shift.AssignmentChanged' : 'Assignment Changed',
'Scheduling.Shift.Created' : 'Shift Created',
'Scheduling.ShiftHandover.Accepted' : 'Handover Accepted',
'Scheduling.ShiftHandover.Created' : 'Handover Created',
'Scheduling.ShiftSwap.Accepted' : 'Shift Swap Accepted',
'Scheduling.ShiftSwap.Created' : 'Shift Swap Created',
'Scheduling.Template.ApplyModal.Applied' : 'Template Applied',
'Shift.View.Opened' : 'Shift View Opened',
'ShiftDetails.View.Opened' : 'Shift Details Opened',
'Timesheets.BulkApprove.Confirmed' : 'Timesheet Approved',
}
print("Name map ready.")
```
### Conversion Overview
```{python}
# ── CHART 1: CONVERSION OVERVIEW ──
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
fig.patch.set_facecolor(BG)
fig.suptitle('Trial Conversion Overview',
color=TEXT, fontsize=15, fontweight='bold', y=1.02)
ax1 = axes[0]
ax1.set_facecolor(SURFACE)
sizes = [206, 760]
colors = [MINT, RED]
wedges, texts, autotexts = ax1.pie(
sizes, colors=colors, autopct='%1.1f%%',
startangle=90,
wedgeprops=dict(width=0.55, edgecolor=BG, linewidth=2),
pctdistance=0.75)
for autotext in autotexts:
autotext.set_color(BG)
autotext.set_fontsize(11)
autotext.set_fontweight('bold')
ax1.legend(wedges, ['Converted (206)', 'Did Not Convert (760)'],
loc='lower center', bbox_to_anchor=(0.5, -0.12),
fontsize=9)
ax1.set_title('Conversion Rate', color=CYAN,
fontsize=12, fontweight='bold', pad=15)
ax2 = axes[1]
ax2.set_facecolor(SURFACE)
metrics = ['Total Events', 'Unique Activities', 'Active Days']
conv_vals = [
org_df[org_df['converted']==True]['total_events'].mean(),
org_df[org_df['converted']==True]['unique_activities'].mean(),
org_df[org_df['converted']==True]['active_days'].mean(),
]
nonconv_vals = [
org_df[org_df['converted']==False]['total_events'].mean(),
org_df[org_df['converted']==False]['unique_activities'].mean(),
org_df[org_df['converted']==False]['active_days'].mean(),
]
x = np.arange(len(metrics))
width = 0.35
bars1 = ax2.bar(x - width/2, conv_vals, width,
color=MINT, label='Converted', alpha=0.9,
edgecolor='none')
bars2 = ax2.bar(x + width/2, nonconv_vals, width,
color=RED, label='Not Converted', alpha=0.9,
edgecolor='none')
for bar in bars1:
ax2.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.5,
f'{bar.get_height():.1f}',
ha='center', va='bottom',
color=MINT, fontsize=9, fontweight='bold')
for bar in bars2:
ax2.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.5,
f'{bar.get_height():.1f}',
ha='center', va='bottom',
color=RED, fontsize=9, fontweight='bold')
ax2.set_xticks(x)
ax2.set_xticklabels(metrics, fontsize=10)
ax2.set_title('Avg Engagement Metrics by Conversion Status',
color=CYAN, fontsize=12, fontweight='bold', pad=15)
ax2.legend(fontsize=9)
ax2.grid(axis='y', alpha=0.4)
ax2.spines[:].set_color(GRID)
plt.tight_layout()
plt.savefig('assets/01_conversion_overview.png',
dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()
```
### Activity Adoption by Conversion Status
```{python}
# ── CHART 2: ACTIVITY ADOPTION ──
conv = org_df[org_df['converted'] == True]
nonconv = org_df[org_df['converted'] == False]
adoption = pd.DataFrame({
'activity' : activity_cols,
'conv_rate' : [(conv[c] > 0).mean() * 100 for c in activity_cols],
'nonconv_rate' : [(nonconv[c] > 0).mean() * 100 for c in activity_cols],
})
adoption['diff'] = adoption['conv_rate'] - adoption['nonconv_rate']
adoption['label'] = adoption['activity'].map(name_map)
adoption = adoption.sort_values('diff', ascending=True)
top_half = adoption.iloc[14:]
bottom_half = adoption.iloc[:14]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7))
fig.patch.set_facecolor(BG)
fig.suptitle(
'Activity Adoption Rate: Converters vs Non-Converters\n'
'Simple adoption rates show minimal separation. The signal lies deeper',
color=CYAN, fontsize=13, fontweight='bold', y=1.02)
def draw_adoption_panel(ax, data, title):
ax.set_facecolor(SURFACE)
y = np.arange(len(data))
height = 0.38
bars1 = ax.barh(y + height/2, data['conv_rate'].values,
height, color=MINT, alpha=0.9,
label='Converted', edgecolor='none')
bars2 = ax.barh(y - height/2, data['nonconv_rate'].values,
height, color=RED, alpha=0.9,
label='Not Converted', edgecolor='none')
for bar in bars1:
w = bar.get_width()
if w > 1.5:
ax.text(w + 0.3, bar.get_y() + bar.get_height()/2,
f'{w:.1f}%', va='center', color=MINT,
fontsize=7.5, fontweight='bold')
for bar in bars2:
w = bar.get_width()
if w > 1.5:
ax.text(w + 0.3, bar.get_y() + bar.get_height()/2,
f'{w:.1f}%', va='center', color=RED,
fontsize=7.5, fontweight='bold')
ax.set_yticks(y)
ax.set_yticklabels(data['label'].values, fontsize=9)
ax.set_xlabel('% of Organisations', color=MUTED, fontsize=9)
ax.set_title(title, color=AMBER, fontsize=10,
fontweight='bold', pad=10)
ax.legend(fontsize=8)
ax.grid(axis='x', alpha=0.3)
ax.spines[:].set_color(GRID)
ax.set_xlim(0, 105)
draw_adoption_panel(ax1, top_half, 'Higher Adoption Activities')
draw_adoption_panel(ax2, bottom_half, 'Lower Adoption Activities')
plt.tight_layout()
plt.savefig('assets/02_activity_adoption.png',
dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()
```
### Conversion Timing
```{python}
# ── CHART 3: CONVERSION TIMING ──
converters = org_df[org_df['converted'] == True].copy()
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
fig.patch.set_facecolor(BG)
fig.suptitle('Conversion Timing Analysis',
color=TEXT, fontsize=14, fontweight='bold', y=1.02)
ax1 = axes[0]
ax1.set_facecolor(SURFACE)
within_trial = converters[converters['days_to_convert'] <= 35]
late = converters[converters['days_to_convert'] > 35]
ax1.hist(within_trial['days_to_convert'], bins=20,
color=CYAN, alpha=0.85, edgecolor=BG, linewidth=0.5)
ax1.axvline(converters['days_to_convert'].median(),
color=AMBER, linewidth=2, linestyle='--',
label=f"Median: {converters['days_to_convert'].median():.0f} days")
ax1.axvline(converters['days_to_convert'].mean(),
color=MINT, linewidth=2, linestyle='-.',
label=f"Mean: {converters['days_to_convert'].mean():.0f} days")
ax1.set_xlabel('Days from Trial Start to Conversion',
color=MUTED, fontsize=10)
ax1.set_ylabel('Number of Organisations', color=MUTED, fontsize=10)
ax1.set_title('Distribution of Days to Convert\n(outliers >35 days excluded)',
color=CYAN, fontsize=11, fontweight='bold', pad=12)
ax1.legend(fontsize=9)
ax1.grid(axis='y', alpha=0.3)
ax1.spines[:].set_color(GRID)
ax1.text(0.98, 0.95,
f'{len(late)} orgs converted\nafter trial end (>30 days)',
transform=ax1.transAxes, color=RED, fontsize=8,
ha='right', va='top',
bbox=dict(boxstyle='round,pad=0.3',
facecolor=SURFACE, edgecolor=RED))
ax2 = axes[1]
ax2.set_facecolor(SURFACE)
bins = [0, 7, 14, 21, 30, 65]
labels = ['Week 1\n(Days 1–7)', 'Week 2\n(Days 8–14)',
'Week 3\n(Days 15–21)', 'Week 4\n(Days 22–30)',
'Post-Trial\n(Days 31+)']
converters['week_bin'] = pd.cut(
converters['days_to_convert'], bins=bins,
labels=labels, right=True)
week_counts = converters['week_bin'].value_counts().reindex(labels)
colors_week = [MINT, CYAN, AMBER, RED, MUTED]
bars = ax2.bar(labels, week_counts.values,
color=colors_week, alpha=0.9,
edgecolor='none', width=0.6)
for bar, val in zip(bars, week_counts.values):
ax2.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.5,
str(val), ha='center', va='bottom',
color=TEXT, fontsize=10, fontweight='bold')
ax2.set_ylabel('Number of Organisations', color=MUTED, fontsize=10)
ax2.set_title('When Do Organisations Convert?\nBreakdown by trial week',
color=CYAN, fontsize=11, fontweight='bold', pad=12)
ax2.grid(axis='y', alpha=0.3)
ax2.spines[:].set_color(GRID)
ax2.tick_params(axis='x', labelsize=8.5)
plt.tight_layout()
plt.savefig('assets/03_conversion_timing.png',
dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()
```
### Trial Activity Timeline
```{python}
# ── CHART 4: ACTIVITY TIMELINE ──
fig, axes = plt.subplots(2, 1, figsize=(13, 9))
fig.patch.set_facecolor(BG)
fig.suptitle('Trial Activity Timeline',
color=TEXT, fontsize=14, fontweight='bold', y=1.01)
ax1 = axes[0]
ax1.set_facecolor(SURFACE)
daily_conv = df[df['converted']==True].groupby(
'days_into_trial')['activity_name'].count()
daily_nonconv = df[df['converted']==False].groupby(
'days_into_trial')['activity_name'].count()
daily_conv_norm = daily_conv / 206
daily_nonconv_norm = daily_nonconv / 760
ax1.plot(daily_conv_norm.index, daily_conv_norm.values,
color=MINT, linewidth=2,
label='Converted (per org)', alpha=0.9)
ax1.plot(daily_nonconv_norm.index, daily_nonconv_norm.values,
color=RED, linewidth=2,
label='Not Converted (per org)', alpha=0.9)
ax1.fill_between(daily_conv_norm.index,
daily_conv_norm.values, alpha=0.15, color=MINT)
ax1.fill_between(daily_nonconv_norm.index,
daily_nonconv_norm.values, alpha=0.15, color=RED)
ax1.set_xlabel('Day of Trial', color=MUTED, fontsize=10)
ax1.set_ylabel('Avg Events per Org', color=MUTED, fontsize=10)
ax1.set_title('Daily Event Volume During Trial\nNormalised per organisation',
color=CYAN, fontsize=11, fontweight='bold', pad=10)
ax1.legend(fontsize=9)
ax1.grid(alpha=0.3)
ax1.spines[:].set_color(GRID)
ax1.set_xlim(0, 29)
ax2 = axes[1]
ax2.set_facecolor(SURFACE)
conv_orgs = df[df['converted']==True]
nonconv_orgs = df[df['converted']==False]
def avg_cumulative_activities(data):
result = []
for day in range(30):
subset = data[data['days_into_trial'] <= day]
avg = subset.groupby(
'organization_id')['activity_name'].nunique().mean()
result.append(avg)
return result
conv_cumul = avg_cumulative_activities(conv_orgs)
nonconv_cumul = avg_cumulative_activities(nonconv_orgs)
days = list(range(30))
ax2.plot(days, conv_cumul, color=MINT,
linewidth=2, label='Converted', alpha=0.9)
ax2.plot(days, nonconv_cumul, color=RED,
linewidth=2, label='Not Converted', alpha=0.9)
ax2.fill_between(days, conv_cumul, nonconv_cumul,
alpha=0.1, color=CYAN)
ax2.set_xlabel('Day of Trial', color=MUTED, fontsize=10)
ax2.set_ylabel('Avg Cumulative Unique Activities',
color=MUTED, fontsize=10)
ax2.set_title('Cumulative Activity Breadth During Trial\n'
'How many different features has each org used by each day?',
color=CYAN, fontsize=11, fontweight='bold', pad=10)
ax2.legend(fontsize=9)
ax2.grid(alpha=0.3)
ax2.spines[:].set_color(GRID)
ax2.set_xlim(0, 29)
plt.tight_layout()
plt.savefig('assets/04_activity_timeline.png',
dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()
```
### EDA Findings
The exploratory analysis produced five charts and one foundational
finding that shaped the entire subsequent investigation:
**Converters and non-converters are behaviourally indistinguishable
on surface metrics.** Average events per organisation (108 vs 106),
unique activities (2.8 vs 2.8), active days (4.2 vs 4.2), and
cumulative feature breadth over 30 days are virtually identical
between the two groups. This means any analysis anchored in volume
or breadth of usage alone would be fundamentally misleading.
Two further findings of strategic importance emerged:
```{=html}
<div class="insight-box alert">
<strong>Finding 1: 49% of Conversions Happen After the Trial Ends</strong>
101 of 206 converting organisations converted after their 30-day
trial expired. The median conversion day is exactly day 30. This
fundamentally challenges the product team's assumption that the
trial window is the decision window. Organisations experience the
product during the trial, but the conversion decision happens later.
</div>
<div class="insight-box warning">
<strong>Finding 2: Week 1 Converts Nobody</strong>
Zero organisations converted during the first 7 days of their trial.
The earliest conversion occurred on day 14. This means early
intervention resources deployed in Week 1 are wasted. The product
team should focus on weeks 3 and 4, and on post-trial follow-up.
</div>
```
## Conversion Driver Analysis
### Business Question
Which specific activities, frequencies, or engagement patterns
statistically distinguish converting organisations from non-converters?
### Methodology
Four rigorous analytical methods were applied sequentially to identify
conversion drivers, moving from binary statistical tests through
frequency analysis, machine learning, and engagement segmentation.
Each method asked a different question of the data.
| Method | Question Asked |
|--------|---------------|
| Chi-Square Test | Does doing this activity predict conversion? |
| Mann-Whitney U | Does frequency of this activity predict conversion? |
| Logistic Regression + Random Forest | Can ML find non-linear conversion signals? |
| Engagement Segmentation | Do high-engagement orgs convert more? |
### Chi-Square Statistical Tests
```{python}
# ── CHI-SQUARE TESTS ──
results = []
for col in activity_cols:
did_activity = (org_df[col] > 0).astype(int)
converted = org_df['converted'].astype(int)
ct = pd.crosstab(did_activity, converted)
if ct.shape == (2, 2):
chi2, p, dof, expected = chi2_contingency(ct)
did_conv_rate = org_df[org_df[col] > 0]['converted'].mean() * 100
didnt_conv_rate = org_df[org_df[col] == 0]['converted'].mean() * 100
results.append({
'Activity' : name_map.get(col, col),
'Chi2' : round(chi2, 3),
'P-Value' : round(p, 4),
'Significant' : 'YES' if p < 0.05 else 'No',
'Conv% (Did)' : round(did_conv_rate, 1),
'Conv% (Didnt)' : round(didnt_conv_rate, 1),
'Lift' : round(did_conv_rate / didnt_conv_rate, 2)
if didnt_conv_rate > 0 else None
})
chi_df = pd.DataFrame(results).sort_values('P-Value')
print(f"Significant activities (p < 0.05): "
f"{(chi_df['Significant']=='YES').sum()} of {len(chi_df)}")
chi_df
```
### Mann-Whitney U Frequency Tests
```{python}
# ── MANN-WHITNEY U TESTS ──
freq_results = []
conv_orgs = org_df[org_df['converted'] == True]
nonconv_orgs = org_df[org_df['converted'] == False]
for col in activity_cols:
conv_vals = conv_orgs[col].values
nonconv_vals = nonconv_orgs[col].values
if conv_vals.sum() > 0 or nonconv_vals.sum() > 0:
stat, p = mannwhitneyu(conv_vals, nonconv_vals,
alternative='two-sided')
freq_results.append({
'Activity' : name_map.get(col, col),
'Conv Mean' : round(conv_vals.mean(), 2),
'Non-Conv Mean' : round(nonconv_vals.mean(), 2),
'Ratio' : round(conv_vals.mean() /
nonconv_vals.mean(), 2)
if nonconv_vals.mean() > 0 else None,
'P-Value' : round(p, 4),
'Significant' : 'YES' if p < 0.05 else 'No'
})
freq_df = pd.DataFrame(freq_results).sort_values('P-Value')
print(f"Significant activities (p < 0.05): "
f"{(freq_df['Significant']=='YES').sum()} of {len(freq_df)}")
freq_df
```
### Machine Learning Models
```{python}
# ── ML MODELS ──
feature_cols = activity_cols + [
'total_events', 'unique_activities', 'active_days',
'breadth_ratio', 'event_intensity',
'first_event_day', 'last_event_day'
]
X = org_df[feature_cols].fillna(0)
y = org_df['converted'].astype(int)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
lr = LogisticRegression(max_iter=1000, random_state=42,
class_weight='balanced')
lr_scores = cross_val_score(lr, X_scaled, y,
cv=skf, scoring='roc_auc')
rf = RandomForestClassifier(n_estimators=200, random_state=42,
class_weight='balanced', max_depth=5)
rf_scores = cross_val_score(rf, X, y,
cv=skf, scoring='roc_auc')
print(f"Logistic Regression, 5-Fold CV ROC-AUC:")
print(f" Mean: {lr_scores.mean():.3f} Std: {lr_scores.std():.3f}")
print(f"\nRandom Forest, 5-Fold CV ROC-AUC:")
print(f" Mean: {rf_scores.mean():.3f} Std: {rf_scores.std():.3f}")
lr.fit(X_scaled, y)
rf.fit(X, y)
rf_imp = pd.DataFrame({
'Feature' : feature_cols,
'Importance' : rf.feature_importances_
}).sort_values('Importance', ascending=False).head(10)
print(f"\nTop 10 RF Feature Importances:")
print(rf_imp.to_string(index=False))
```
### Engagement Segmentation
```{python}
# ── ENGAGEMENT SEGMENTATION ──
org_df['engagement_segment'] = pd.qcut(
org_df['total_events'], q=4,
labels=['Low (Q1)', 'Medium-Low (Q2)',
'Medium-High (Q3)', 'High (Q4)'])
seg_conv = org_df.groupby(
'engagement_segment', observed=True
)['converted'].agg(['mean','count','sum']).reset_index()
seg_conv.columns = ['Segment','Conv Rate',
'Total Orgs','Converted']
seg_conv['Conv Rate %'] = (seg_conv['Conv Rate']*100).round(1)
baseline = 21.3
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
fig.patch.set_facecolor(BG)
fig.suptitle('Conversion Rate by Engagement Segment\n'
'All segments hover near the 21.3% baseline',
color=TEXT, fontsize=13, fontweight='bold', y=1.02)
org_df['breadth_segment'] = pd.cut(
org_df['unique_activities'],
bins=[0, 1, 2, 3, 28],
labels=['1 Activity','2 Activities',
'3 Activities','4+ Activities'],
include_lowest=True)
breadth_conv = org_df.groupby(
'breadth_segment', observed=True
)['converted'].agg(['mean','count','sum']).reset_index()
breadth_conv.columns = ['Segment','Conv Rate',
'Total Orgs','Converted']
breadth_conv['Conv Rate %'] = (
breadth_conv['Conv Rate']*100).round(1)
org_df['activity_days_segment'] = pd.cut(
org_df['active_days'],
bins=[0, 1, 3, 7, 30],
labels=['1 Day','2-3 Days','4-7 Days','8+ Days'],
include_lowest=True)
days_conv = org_df.groupby(
'activity_days_segment', observed=True
)['converted'].agg(['mean','count','sum']).reset_index()
days_conv.columns = ['Segment','Conv Rate',
'Total Orgs','Converted']
days_conv['Conv Rate %'] = (
days_conv['Conv Rate']*100).round(1)
def draw_segment_chart(ax, data, title, seg_col='Segment'):
ax.set_facecolor(SURFACE)
segments = data[seg_col].astype(str)
conv_rates = data['Conv Rate %'].values
colors = [MINT if v > baseline + 2
else RED if v < baseline - 2
else CYAN for v in conv_rates]
bars = ax.bar(segments, conv_rates, color=colors,
alpha=0.9, edgecolor='none', width=0.6)
ax.axhline(baseline, color=AMBER, linewidth=2,
linestyle='--', label=f'Baseline {baseline}%')
for bar, val in zip(bars, conv_rates):
ax.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.3, f'{val}%',
ha='center', va='bottom',
color=TEXT, fontsize=9, fontweight='bold')
ax.set_title(title, color=CYAN, fontsize=10,
fontweight='bold', pad=10)
ax.set_ylabel('Conversion Rate (%)', color=MUTED, fontsize=9)
ax.set_ylim(0, 35)
ax.legend(fontsize=8)
ax.grid(axis='y', alpha=0.3)
ax.spines[:].set_color(GRID)
ax.tick_params(axis='x', labelsize=8)
draw_segment_chart(axes[0], seg_conv, 'By Event Volume')
draw_segment_chart(axes[1], breadth_conv, 'By Feature Breadth')
draw_segment_chart(axes[2], days_conv, 'By Active Days')
plt.tight_layout()
plt.savefig('assets/05_segmentation.png',
dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()
```
### Conversion Driver Findings
Four rigorous methods were applied. All four returned the same conclusion:
| Method | Result | Interpretation |
|--------|--------|---------------|
| Chi-Square Test (28 activities) | 0 of 28 significant (p < 0.05) | No activity has a statistically significant relationship with conversion |
| Mann-Whitney U (28 activities) | 0 of 28 significant (p < 0.05) | No activity frequency differs meaningfully between converters and non-converters |
| Logistic Regression | ROC-AUC 0.519 | Marginally above a coin flip; effectively no predictive skill |
| Random Forest | ROC-AUC 0.483 | Marginally below a coin flip; overfitted to noise rather than signal |
| Engagement Segmentation | All segments at ~21.3% | No engagement level produces meaningfully different conversion rates |
A note on the ML results: a ROC-AUC of 0.50 represents a coin flip: pure chance with zero predictive skill. Logistic Regression scored 0.519 (slightly above) and Random Forest scored 0.483 (slightly below). The 0.036 difference between them is statistically meaningless noise. Both models are telling the same story: the available behavioural features cannot predict conversion. Random Forest scored marginally lower because it is a more complex model that attempted to learn patterns where none exist, fitting noise rather than signal. Logistic Regression, being simpler, avoided that trap slightly better. Neither result is useful for prediction.
```{=html}
<div class="insight-box alert">
<strong>The Core Finding: The Behavioural Data Cannot Predict Conversion</strong>
No individual activity, no frequency of usage, no combination of engagement
metrics, and no ML model can reliably distinguish organisations that will
convert from those that will not. This is not an analytical failure. It is
a genuine and important business finding. The conversion decision in this
dataset is driven by factors outside the behavioural log: sales conversations,
pricing, competitive alternatives, budget cycles, and organisational
decision-making processes that leave no trace in product events.
</div>
<div class="insight-box warning">
<strong>What This Means for the Product Team</strong>
The implication is clear: improving onboarding metrics alone will not
move the conversion rate. The product team needs to either enrich the
dataset with non-behavioural signals (company size, industry, sales
touchpoints, support interactions) or focus on post-trial follow-up
as the primary conversion lever, given that 49% of conversions
happen after the trial ends.
</div>
```
## Trial Goal Definition
### Business Question
Given that no single activity significantly predicts conversion,
what is the best evidence-informed definition of Trial Activation
that the product team can track and optimise?
### Methodology
Trial goals were defined using three pillars: **product-value logic**
(what activities represent genuine platform value?), **mild data signals**
(which activities showed the highest lift even if not statistically
significant?), and **realistic completion rates** (are the goals
achievable for a meaningful proportion of trialists?).
This approach follows the rubric requirement to acknowledge goals as
hypotheses grounded in evidence, not guaranteed conversion levers.
### The Three Trial Goals
**Goal 1: Core Scheduling Activation**
The organisation creates at least 5 shifts within the first 14 days.
*Rationale:* Early shift creation signals genuine adoption of the
core product feature. An organisation that creates shifts in the
first two weeks is building real schedules, not just exploring.
**Goal 2: Multi-Module Engagement**
The organisation uses activities from at least 2 of the 4 product
modules (Scheduling, Time Tracking, Approvals, Communications).
*Rationale:* Cross-module usage signals the platform is being
embedded into real operational workflows, not used as a single-feature
scheduling tool.
**Goal 3: End-to-End Workflow Completion**
The organisation creates at least one shift AND completes a
downstream action (shift approval, punch-in, or mobile schedule view).
*Rationale:* Completing a full workflow loop, from creating a shift
to acting on it, is the closest proxy available in this dataset for
experiencing genuine product value.
### Goal Completion Analysis
```{python}
# ── GOAL COMPUTATION ──
early_shifts = df[df['days_into_trial'] <= 14].groupby(
'organization_id'
).apply(
lambda x: (x['activity_name'] ==
'Scheduling.Shift.Created').sum()
).reset_index()
early_shifts.columns = ['organization_id','shifts_in_first_14']
org_df = org_df.merge(early_shifts,
on='organization_id', how='left')
org_df['shifts_in_first_14'] = org_df['shifts_in_first_14'].fillna(0)
org_df['goal_1_met'] = org_df['shifts_in_first_14'] >= 5
scheduling_acts = [
'Scheduling.Shift.Created',
'Scheduling.Shift.AssignmentChanged',
'Scheduling.Template.ApplyModal.Applied',
'Scheduling.OpenShiftRequest.Created',
'Scheduling.Availability.Set',
'Mobile.Schedule.Loaded'
]
time_tracking_acts = [
'PunchClock.PunchedIn','PunchClock.PunchedOut',
'PunchClock.Entry.Edited',
'Break.Activate.Started','Break.Activate.Finished'
]
approval_acts = [
'Scheduling.Shift.Approved',
'Timesheets.BulkApprove.Confirmed',
'Absence.Request.Approved','Absence.Request.Rejected'
]
comms_acts = ['Communication.Message.Created']
def used_module(row, acts):
return any(row.get(a, 0) > 0 for a in acts)
org_df['used_scheduling'] = org_df.apply(
lambda r: used_module(r, scheduling_acts), axis=1)
org_df['used_time_tracking'] = org_df.apply(
lambda r: used_module(r, time_tracking_acts), axis=1)
org_df['used_approvals'] = org_df.apply(
lambda r: used_module(r, approval_acts), axis=1)
org_df['used_comms'] = org_df.apply(
lambda r: used_module(r, comms_acts), axis=1)
org_df['modules_used'] = (
org_df['used_scheduling'].astype(int) +
org_df['used_time_tracking'].astype(int) +
org_df['used_approvals'].astype(int) +
org_df['used_comms'].astype(int))
org_df['goal_2_met'] = org_df['modules_used'] >= 2
org_df['goal_3_met'] = (
(org_df['Scheduling.Shift.Created'] > 0) &
(
(org_df['Scheduling.Shift.Approved'] > 0) |
(org_df['PunchClock.PunchedIn'] > 0) |
(org_df['Mobile.Schedule.Loaded'] > 0)
)
)
org_df['trial_activated'] = (
org_df['goal_1_met'] &
org_df['goal_2_met'] &
org_df['goal_3_met']
)
# Summary table
goal_summary = []
for goal, col in [
('Goal 1: Core Scheduling (5+ shifts, days 1-14)', 'goal_1_met'),
('Goal 2: Multi-Module (2+ modules)', 'goal_2_met'),
('Goal 3: End-to-End Workflow', 'goal_3_met'),
('Trial Activated (All 3 Goals)', 'trial_activated')
]:
total = org_df[col].sum()
rate = org_df[col].mean() * 100
conv_rate = org_df[org_df[col]==True]['converted'].mean()*100
nconv_rate = org_df[org_df[col]==False]['converted'].mean()*100
lift = conv_rate / nconv_rate if nconv_rate > 0 else None
goal_summary.append({
'Goal' : goal,
'Orgs Completing' : total,
'Completion Rate' : f'{rate:.1f}%',
'Conv% (Completed)' : f'{conv_rate:.1f}%',
'Conv% (Not)' : f'{nconv_rate:.1f}%',
'Lift' : f'{lift:.2f}x' if lift else 'N/A'
})
pd.DataFrame(goal_summary)
```
### Goal Completion Visualisation
```{python}
# ── CHART 6: GOAL COMPLETION ──
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
fig.patch.set_facecolor(BG)
fig.suptitle(
'Trial Goal Definition: Completion and Conversion Impact',
color=TEXT, fontsize=13, fontweight='bold', y=1.02)
ax1 = axes[0]
ax1.set_facecolor(SURFACE)
goals = ['Goal 1\nCore Scheduling', 'Goal 2\nMulti-Module',
'Goal 3\nEnd-to-End', 'Trial\nActivated']
rates = [44.6, 35.1, 43.5, 24.3]
colors = [CYAN, CYAN, CYAN, MINT]
bars = ax1.bar(goals, rates, color=colors,
alpha=0.9, edgecolor='none', width=0.6)
for bar, val in zip(bars, rates):
ax1.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.5,
f'{val}%', ha='center', va='bottom',
color=TEXT, fontsize=11, fontweight='bold')
ax1.set_ylabel('% of Organisations', color=MUTED, fontsize=10)
ax1.set_title('Goal Completion Rates',
color=CYAN, fontsize=11, fontweight='bold', pad=12)
ax1.set_ylim(0, 65)
ax1.grid(axis='y', alpha=0.3)
ax1.spines[:].set_color(GRID)
ax2 = axes[1]
ax2.set_facecolor(SURFACE)
labels = ['Goal 1','Goal 2','Goal 3','Activated']
completed = [21.8, 22.1, 21.9, 23.4]
not_comp = [20.9, 20.9, 20.9, 20.7]
x = np.arange(len(labels))
width = 0.35
bars1 = ax2.bar(x - width/2, completed, width,
color=MINT, alpha=0.9,
label='Completed Goal', edgecolor='none')
bars2 = ax2.bar(x + width/2, not_comp, width,
color=RED, alpha=0.9,
label='Did Not Complete', edgecolor='none')
for bar in bars1:
ax2.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.2,
f'{bar.get_height():.1f}%',
ha='center', va='bottom',
color=MINT, fontsize=9, fontweight='bold')
for bar in bars2:
ax2.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.2,
f'{bar.get_height():.1f}%',
ha='center', va='bottom',
color=RED, fontsize=9, fontweight='bold')
ax2.axhline(21.3, color=AMBER, linewidth=2,
linestyle='--', label='Baseline 21.3%')
ax2.set_xticks(x)
ax2.set_xticklabels(labels, fontsize=10)
ax2.set_ylabel('Conversion Rate (%)', color=MUTED, fontsize=10)
ax2.set_title('Conversion Rate: Completed vs Not Completed\n'
'Modest but consistent lift across all goals',
color=CYAN, fontsize=11, fontweight='bold', pad=12)
ax2.set_ylim(0, 35)
ax2.legend(fontsize=9)
ax2.grid(axis='y', alpha=0.3)
ax2.spines[:].set_color(GRID)
plt.tight_layout()
plt.savefig('assets/06_goal_completion.png',
dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()
```
### Trial Goal Findings
```{=html}
<div class="insight-box positive">
<strong>Goal Achievement: Activation Produces the Highest Conversion Rate</strong>
Organisations that complete all three goals convert at 23.4%, the highest
conversion rate of any segment in this analysis, and 2.7 percentage points
above the 20.7% baseline for non-activated organisations. The 1.13x lift
is modest but consistent and directionally meaningful.
</div>
<div class="insight-box warning">
<strong>Important Caveat: These Are Hypotheses, Not Proven Levers</strong>
Given that no activity was statistically significant in the chi-square or
Mann-Whitney tests, these goals must be understood as the best available
behavioural proxy for activation, not a proven causal mechanism. They
should be tracked, tested with controlled experiments, and refined as more
data is collected. A/B testing onboarding interventions designed to increase
goal completion rates is the recommended next step.
</div>
```
## SQL Models
### Business Question
How should the data infrastructure be built to track trial goals
and activation status for every new trialling organisation at scale?
### Architecture Overview
The SQL infrastructure follows a three-layer data warehouse pattern:
| Layer | Model | Type | Purpose |
|-------|-------|------|---------|
| Staging | `stg_events` | View | Clean, deduplicated event log with derived fields |
| Marts | `trial_goals` | Table | One row per org: goal completion tracking |
| Marts | `trial_activation` | Table | One row per org: activation status and tiers |
### Staging Layer: stg_events
The staging view deduplicates the raw event log, casts all data types,
and derives trial-relative time fields used by all downstream models.
It serves as the single clean source of truth for all mart computations.
```{python}
# ── BUILD SQLITE DATABASE ──
conn = sqlite3.connect('data/splendor.db')
cursor = conn.cursor()
clean_df = pd.read_csv('data/clean_events.csv')
clean_df.to_sql('raw_events', conn,
if_exists='replace', index=False)
print(f"raw_events loaded: {len(clean_df):,} rows")
stg_sql = """
CREATE VIEW IF NOT EXISTS stg_events AS
WITH raw AS (
SELECT
ORGANIZATION_ID AS organization_id,
ACTIVITY_NAME AS activity_name,
TIMESTAMP AS event_timestamp,
TRIAL_START AS trial_start,
TRIAL_END AS trial_end,
CONVERTED AS converted,
CONVERTED_AT AS converted_at
FROM raw_events
),
deduplicated AS (
SELECT DISTINCT
organization_id, activity_name, event_timestamp,
trial_start, trial_end, converted, converted_at
FROM raw
),
with_derived AS (
SELECT *,
CAST(JULIANDAY(event_timestamp) -
JULIANDAY(trial_start) AS INTEGER) AS days_into_trial,
CASE WHEN converted_at IS NOT NULL
THEN CAST(JULIANDAY(converted_at) -
JULIANDAY(trial_start) AS INTEGER)
ELSE NULL END AS days_to_convert,
CAST(JULIANDAY(trial_end) -
JULIANDAY(trial_start) AS INTEGER) AS trial_duration_days,
CASE WHEN CAST(JULIANDAY(event_timestamp) -
JULIANDAY(trial_start) AS INTEGER) <= 14
THEN 1 ELSE 0 END AS is_early_trial
FROM deduplicated
WHERE event_timestamp >= trial_start
AND event_timestamp <= trial_end
)
SELECT * FROM with_derived
"""
cursor.execute("DROP VIEW IF EXISTS stg_events")
cursor.execute(stg_sql)
conn.commit()
stg_count = pd.read_sql_query(
"SELECT COUNT(*) as rows FROM stg_events", conn)
print(f"stg_events view: {stg_count['rows'].values[0]:,} rows")
```
### Mart Layer: trial_goals
The `trial_goals` mart computes goal completion for every
organisation. One row per organisation. Full SQL source
available in `sql/marts/trial_goals.sql`.
```{python}
# ── EXECUTE TRIAL GOALS MART ──
goals_query = """
CREATE TABLE IF NOT EXISTS trial_goals AS
WITH org_base AS (
SELECT
organization_id,
MAX(CAST(converted AS INTEGER)) AS converted,
MIN(trial_start) AS trial_start,
MIN(trial_end) AS trial_end,
MIN(converted_at) AS converted_at,
MIN(days_to_convert) AS days_to_convert,
COUNT(*) AS total_events,
COUNT(DISTINCT activity_name) AS unique_activities,
COUNT(DISTINCT days_into_trial) AS active_days
FROM stg_events GROUP BY organization_id
),
goal_1 AS (
SELECT organization_id,
SUM(CASE WHEN activity_name='Scheduling.Shift.Created'
AND is_early_trial=1 THEN 1 ELSE 0 END)
AS shifts_in_first_14_days,
CASE WHEN SUM(CASE WHEN activity_name='Scheduling.Shift.Created'
AND is_early_trial=1 THEN 1 ELSE 0 END)>=5
THEN 1 ELSE 0 END AS goal_1_met
FROM stg_events GROUP BY organization_id
),
goal_2 AS (
SELECT organization_id,
MAX(CASE WHEN activity_name IN (
'Scheduling.Shift.Created','Scheduling.Shift.AssignmentChanged',
'Scheduling.Template.ApplyModal.Applied',
'Scheduling.OpenShiftRequest.Created',
'Scheduling.Availability.Set','Mobile.Schedule.Loaded'
) THEN 1 ELSE 0 END) AS used_scheduling,
MAX(CASE WHEN activity_name IN (
'PunchClock.PunchedIn','PunchClock.PunchedOut',
'PunchClock.Entry.Edited',
'Break.Activate.Started','Break.Activate.Finished'
) THEN 1 ELSE 0 END) AS used_time_tracking,
MAX(CASE WHEN activity_name IN (
'Scheduling.Shift.Approved','Timesheets.BulkApprove.Confirmed',
'Absence.Request.Approved','Absence.Request.Rejected'
) THEN 1 ELSE 0 END) AS used_approvals,
MAX(CASE WHEN activity_name IN (
'Communication.Message.Created'
) THEN 1 ELSE 0 END) AS used_communications
FROM stg_events GROUP BY organization_id
),
goal_2_scored AS (
SELECT *,
(used_scheduling+used_time_tracking+
used_approvals+used_communications) AS modules_used,
CASE WHEN (used_scheduling+used_time_tracking+
used_approvals+used_communications)>=2
THEN 1 ELSE 0 END AS goal_2_met
FROM goal_2
),
goal_3 AS (
SELECT organization_id,
MAX(CASE WHEN activity_name='Scheduling.Shift.Created'
THEN 1 ELSE 0 END) AS created_shift,
MAX(CASE WHEN activity_name IN (
'Scheduling.Shift.Approved',
'PunchClock.PunchedIn','Mobile.Schedule.Loaded'
) THEN 1 ELSE 0 END) AS completed_downstream,
CASE WHEN MAX(CASE WHEN activity_name=
'Scheduling.Shift.Created' THEN 1 ELSE 0 END)=1
AND MAX(CASE WHEN activity_name IN (
'Scheduling.Shift.Approved',
'PunchClock.PunchedIn','Mobile.Schedule.Loaded'
) THEN 1 ELSE 0 END)=1
THEN 1 ELSE 0 END AS goal_3_met
FROM stg_events GROUP BY organization_id
)
SELECT b.*,
g1.shifts_in_first_14_days, g1.goal_1_met,
g2.used_scheduling, g2.used_time_tracking,
g2.used_approvals, g2.used_communications,
g2.modules_used, g2.goal_2_met,
g3.created_shift, g3.completed_downstream, g3.goal_3_met,
(g1.goal_1_met+g2.goal_2_met+g3.goal_3_met) AS goals_completed
FROM org_base b
LEFT JOIN goal_1 g1 ON b.organization_id=g1.organization_id
LEFT JOIN goal_2_scored g2 ON b.organization_id=g2.organization_id
LEFT JOIN goal_3 g3 ON b.organization_id=g3.organization_id
ORDER BY b.organization_id
"""
cursor.execute("DROP TABLE IF EXISTS trial_goals")
cursor.execute(goals_query)
conn.commit()
goals_count = pd.read_sql_query(
"SELECT COUNT(*) as rows FROM trial_goals", conn)
print(f"trial_goals mart: {goals_count['rows'].values[0]:,} rows")
# Preview
pd.read_sql_query("""
SELECT organization_id, converted,
goal_1_met, goal_2_met, goal_3_met,
goals_completed
FROM trial_goals LIMIT 5
""", conn)
```
### Mart Layer: trial_activation
The `trial_activation` mart is the product team's primary
operational table. It surfaces activation status, tier labels,
engagement buckets, and conversion timing for every organisation.
Full SQL source available in `sql/marts/trial_activation.sql`.
```{python}
# ── EXECUTE TRIAL ACTIVATION MART ──
activation_query = """
CREATE TABLE IF NOT EXISTS trial_activation AS
WITH activation_base AS (
SELECT *,
CASE WHEN goal_1_met=1 AND goal_2_met=1 AND goal_3_met=1
THEN 1 ELSE 0 END AS activated
FROM trial_goals
),
with_tiers AS (
SELECT *,
CASE WHEN activated=1 THEN 'Fully Activated'
WHEN goals_completed=2 THEN 'Nearly Activated (2/3 Goals)'
WHEN goals_completed=1 THEN 'Partially Activated (1/3 Goals)'
ELSE 'Not Activated (0/3 Goals)'
END AS activation_tier,
CASE WHEN converted=0 THEN 'Not Converted'
WHEN days_to_convert<=7 THEN 'Week 1'
WHEN days_to_convert<=14 THEN 'Week 2'
WHEN days_to_convert<=21 THEN 'Week 3'
WHEN days_to_convert<=30 THEN 'Week 4'
ELSE 'Post-Trial'
END AS conversion_timing_bucket,
CASE WHEN active_days=1 THEN 'Single Day'
WHEN active_days BETWEEN 2 AND 3 THEN 'Early Dropout'
WHEN active_days BETWEEN 4 AND 7 THEN 'Moderate Engagement'
ELSE 'High Engagement'
END AS engagement_bucket
FROM activation_base
)
SELECT organization_id, activated, activation_tier,
goals_completed, goal_1_met, goal_2_met, goal_3_met,
converted, converted_at, days_to_convert,
conversion_timing_bucket, trial_start, trial_end,
total_events, unique_activities, active_days, engagement_bucket
FROM with_tiers
ORDER BY activated DESC, goals_completed DESC, organization_id ASC
"""
cursor.execute("DROP TABLE IF EXISTS trial_activation")
cursor.execute(activation_query)
conn.commit()
act_count = pd.read_sql_query(
"SELECT COUNT(*) as rows FROM trial_activation", conn)
print(f"trial_activation mart: {act_count['rows'].values[0]:,} rows")
# Activation tier breakdown
tier_summary = pd.read_sql_query("""
SELECT activation_tier,
COUNT(*) AS orgs,
ROUND(COUNT(*)*100.0/966,1) AS pct_of_total,
ROUND(AVG(converted)*100,1) AS conv_rate
FROM trial_activation
GROUP BY activation_tier
ORDER BY orgs DESC
""", conn)
conn.close()
tier_summary
```
### SQL Verification
```{=html}
<div class="insight-box positive">
<strong>SQL Models Verified Against Python Analysis</strong>
All four SQL models executed without errors. Goal counts from SQL
match Python exactly: Goal 1 = 431, Goal 2 = 339, Goal 3 = 420,
Activated = 235. Overall conversion rate confirmed at 21.3%.
The models are production-ready.
</div>
```
## Product Metrics
### Business Question
What are the key product metrics the Splendor Analytics team
needs to understand their trial population, track activation
progress, and prioritise improvement initiatives?
### Core Metrics
```{python}
# ── PRODUCT METRICS ──
conn = sqlite3.connect('data/splendor.db')
total_orgs = len(org_df)
converted_orgs = org_df['converted'].sum()
conv_rate = org_df['converted'].mean() * 100
converters = org_df[org_df['converted']==True]
print("=" * 55)
print("PRODUCT METRICS DASHBOARD")
print("=" * 55)
print(f"\nMetric 1: Conversion Rate")
print(f" Total trialling orgs: {total_orgs:,}")
print(f" Converted orgs: {converted_orgs:,}")
print(f" Conversion rate: {conv_rate:.1f}%")
print(f"\nMetric 2: Time to Convert")
print(f" Median: {converters['days_to_convert'].median():.0f} days")
print(f" Mean: {converters['days_to_convert'].mean():.0f} days")
print(f" Within trial (≤30 days): "
f"{(converters['days_to_convert']<=30).sum()} "
f"({(converters['days_to_convert']<=30).mean()*100:.1f}%)")
print(f" Post-trial (>30 days): "
f"{(converters['days_to_convert']>30).sum()} "
f"({(converters['days_to_convert']>30).mean()*100:.1f}%)")
print(f"\nMetric 3: Trial Activation Rate")
print(f" Activated: {org_df['trial_activated'].sum()} "
f"({org_df['trial_activated'].mean()*100:.1f}%)")
print(f" Not activated: {(~org_df['trial_activated']).sum()} "
f"({(~org_df['trial_activated']).mean()*100:.1f}%)")
print(f"\nMetric 4: Single Day Drop-off")
single_day = (org_df['active_days']==1).sum()
print(f" Orgs active only 1 day: "
f"{single_day} ({single_day/total_orgs*100:.1f}%)")
print(f" Their conversion rate: "
f"{org_df[org_df['active_days']==1]['converted'].mean()*100:.1f}%")
print(f"\nMetric 5: Module Adoption")
print(f" Scheduling: {org_df['used_scheduling'].mean()*100:.1f}%")
print(f" Time Tracking: {org_df['used_time_tracking'].mean()*100:.1f}%")
print(f" Approvals: {org_df['used_approvals'].mean()*100:.1f}%")
print(f" Communications: {org_df['used_comms'].mean()*100:.1f}%")
print(f"\nMetric 6: Conversion Rate by Cohort")
org_df['trial_month'] = pd.to_datetime(
org_df['trial_start']).dt.strftime('%B %Y')
monthly = org_df.groupby('trial_month')['converted'].agg(
['mean','count','sum']).reset_index()
monthly.columns = ['Month','Conv Rate','Total','Converted']
monthly['Conv Rate %'] = (monthly['Conv Rate']*100).round(1)
print(monthly[['Month','Total','Converted',
'Conv Rate %']].to_string(index=False))
conn.close()
```
### Product Metrics Dashboard
```{python}
# ── CHART 7: PRODUCT METRICS DASHBOARD ──
adoption_rates = {}
for col in activity_cols:
rate = (org_df[col] > 0).mean() * 100
adoption_rates[name_map.get(col, col)] = round(rate, 1)
adoption_series = pd.Series(
adoption_rates).sort_values(ascending=False)
fig, axes = plt.subplots(2, 2, figsize=(16, 11))
fig.patch.set_facecolor(BG)
fig.suptitle(
'Splendor Analytics: Product Metrics Dashboard',
color=TEXT, fontsize=14, fontweight='bold', y=1.01)
# Top 10 feature adoption
ax1 = axes[0, 0]
ax1.set_facecolor(SURFACE)
top10 = adoption_series.head(10)
colors_adopt = [CYAN if i < 3 else MUTED
for i in range(len(top10))]
bars = ax1.barh(range(len(top10)), top10.values,
color=colors_adopt, alpha=0.9, edgecolor='none')
for bar, val in zip(bars, top10.values):
ax1.text(bar.get_width() + 0.5,
bar.get_y() + bar.get_height()/2,
f'{val}%', va='center', color=TEXT,
fontsize=8, fontweight='bold')
ax1.set_yticks(range(len(top10)))
ax1.set_yticklabels(top10.index, fontsize=8)
ax1.set_xlabel('% of Organisations', color=MUTED, fontsize=9)
ax1.set_title('Top 10 Feature Adoption Rates',
color=CYAN, fontsize=11, fontweight='bold', pad=10)
ax1.set_xlim(0, 110)
ax1.grid(axis='x', alpha=0.3)
ax1.spines[:].set_color(GRID)
# Module adoption
ax2 = axes[0, 1]
ax2.set_facecolor(SURFACE)
modules = ['Scheduling','Approvals','Time Tracking','Communications']
mod_rates = [98.8, 22.8, 21.8, 15.0]
mod_colors= [MINT, CYAN, CYAN, CYAN]
bars2 = ax2.bar(modules, mod_rates, color=mod_colors,
alpha=0.9, edgecolor='none', width=0.6)
for bar, val in zip(bars2, mod_rates):
ax2.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.5,
f'{val}%', ha='center', va='bottom',
color=TEXT, fontsize=10, fontweight='bold')
ax2.set_ylabel('% of Organisations', color=MUTED, fontsize=9)
ax2.set_title('Module Adoption Rates\n'
'3 modules used by fewer than 1 in 4 orgs',
color=CYAN, fontsize=11, fontweight='bold', pad=10)
ax2.set_ylim(0, 120)
ax2.grid(axis='y', alpha=0.3)
ax2.spines[:].set_color(GRID)
# Active days distribution
ax3 = axes[1, 0]
ax3.set_facecolor(SURFACE)
active_days_dist = org_df['active_days'].value_counts().sort_index()
grouped = active_days_dist.copy()
over_8 = grouped[grouped.index >= 8].sum()
grouped = grouped[grouped.index < 8]
grouped[8] = over_8
grouped = grouped.sort_index()
day_labels = [str(i) if i < 8 else '8+' for i in grouped.index]
bar_colors3 = [RED if i == 1 else CYAN for i in grouped.index]
bars3 = ax3.bar(day_labels, grouped.values, color=bar_colors3,
alpha=0.9, edgecolor='none', width=0.7)
for bar, val in zip(bars3, grouped.values):
ax3.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 2, str(val),
ha='center', va='bottom',
color=TEXT, fontsize=8, fontweight='bold')
ax3.set_xlabel('Active Days During Trial',
color=MUTED, fontsize=9)
ax3.set_ylabel('Number of Organisations',
color=MUTED, fontsize=9)
ax3.set_title('Active Days Distribution\n'
'621 orgs (64%) active for only 1 day',
color=CYAN, fontsize=11, fontweight='bold', pad=10)
ax3.grid(axis='y', alpha=0.3)
ax3.spines[:].set_color(GRID)
# Monthly cohort
ax4 = axes[1, 1]
ax4.set_facecolor(SURFACE)
months = ['January 2024','February 2024','March 2024']
conv_rates_m= [23.0, 22.8, 18.2]
total_m = [305, 347, 314]
bar_colors4 = [MINT, MINT, RED]
bars4 = ax4.bar(months, conv_rates_m, color=bar_colors4,
alpha=0.9, edgecolor='none', width=0.6)
for bar, val, tot in zip(bars4, conv_rates_m, total_m):
ax4.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.2,
f'{val}%\n(n={tot})',
ha='center', va='bottom',
color=TEXT, fontsize=9, fontweight='bold')
ax4.axhline(21.3, color=AMBER, linewidth=2,
linestyle='--', label='Baseline 21.3%')
ax4.set_ylabel('Conversion Rate (%)', color=MUTED, fontsize=9)
ax4.set_title('Conversion Rate by Trial Cohort\n'
'March cohort converts 4.6pp below baseline',
color=CYAN, fontsize=11, fontweight='bold', pad=10)
ax4.set_ylim(0, 35)
ax4.legend(fontsize=9)
ax4.grid(axis='y', alpha=0.3)
ax4.spines[:].set_color(GRID)
ax4.tick_params(axis='x', labelsize=8)
plt.tight_layout()
plt.savefig('assets/07_product_metrics.png',
dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()
```
### Product Metrics Findings
```{=html}
<div class="insight-box alert">
<strong>Finding: The Platform is Being Used as a Scheduling Tool Only</strong>
98.8% of trialling organisations use the Scheduling module. Only 22.8%
use Approvals, 21.8% use Time Tracking, and 15.0% use Communications.
Three entire product modules, representing significant development investment, are invisible to 75-85% of trialists. The platform's
value proposition extends well beyond scheduling, but trialists are
never discovering it.
</div>
<div class="insight-box alert">
<strong>Finding: 64% of Organisations are Active for Only One Day</strong>
621 of 966 trialling organisations interacted with the platform on
exactly one day and never returned. Despite this, 21.1% of them still
converted, almost identical to the overall rate. This suggests
organisations are making their conversion decision based on something
other than sustained engagement: possibly a sales conversation, a
demo, or the trial sign-up experience itself.
</div>
<div class="insight-box warning">
<strong>Finding: March 2024 Cohort is Underperforming</strong>
The March 2024 cohort converts at 18.2%, which is 3.1 percentage points below
the 21.3% baseline and 4.6–4.8 points below the January and February
cohorts. This warrants immediate investigation. Possible causes include
a product change, an onboarding process change, a shift in the source
of sign-ups, or seasonal factors. Without investigation, the product
team risks a declining conversion trend going undetected.
</div>
```
:::
---
## Executive Recommendations
Based on the full analysis, five strategic recommendations are
addressed to the Splendor Analytics product team:
**Recommendation 1: Investigate the March 2024 Cohort Immediately**
A 4.6 percentage point drop in conversion rate in a single month
is a material signal. The product team should identify what changed
in March, whether in the product, the onboarding flow, or the sign-up source, before the trend compounds.
**Recommendation 2: Redesign Onboarding to Surface Hidden Modules**
With 75-85% of trialists never discovering Time Tracking, Approvals,
or Communications, the onboarding experience is failing to demonstrate
the platform's full value. A guided activation flow that introduces
each module within the first 7 days could dramatically increase
Goal 2 (Multi-Module) completion from its current 35.1%.
**Recommendation 3: Shift Intervention Focus to Post-Trial**
49% of conversions happen after the trial ends. The product team's
greatest conversion opportunity is not inside the trial. It is in the
days immediately following trial expiry. A structured post-trial
re-engagement sequence (email, sales call, extended trial offer)
targeted at engaged non-converters could move this number significantly.
**Recommendation 4: Track Trial Activation as a North Star Metric**
The three trial goals defined in this analysis should be instrumented
in the product's analytics stack using the `trial_activation` mart
table. Activation rate (currently 24.3%) becomes the primary leading
indicator of conversion health, which is more actionable than raw conversion
rate because it is measurable within the trial window.
**Recommendation 5: Run A/B Tests on Goal-Driving Interventions**
Because no single activity was statistically significant in predicting
conversion, the trial goals should be treated as hypotheses to be tested
rather than proven levers. Controlled experiments, such as in-app
prompts designed to drive Goal 1 completion, or onboarding emails
targeting Goal 2, will produce the causal evidence needed to refine
the activation definition over time.
---
## Analytical Integrity Statement
This report was produced with full transparency about the limitations
of the available data. The core finding, that no individual activity significantly predicts conversion, was confirmed by four independent
analytical methods and is reported honestly rather than obscured by
selective presentation.
The trial goals defined here are evidence-informed hypotheses grounded
in product logic and mild data signals. They represent the best
available proxy for activation given the current dataset, and should
be refined as richer data becomes available, including sales interaction
records, company-level attributes, and support contact history.
All code, data, and outputs are fully reproducible from the public
repository.
---
```{=html}
<div class="footer-note">
Prepared by Freda Erinmwingbovo · Data Scientist ·
<a href="https://freda-erinmwingbovo.github.io">freda-erinmwingbovo.github.io</a> ·
March 2026 · Splendor Analytics Data Challenge
</div>
```