Trial Activation Intelligence Report

Author

Freda Erinmwingbovo

Published

March 20, 2026

SPLENDOR ANALYTICS · WORKFORCE MANAGEMENT PLATFORM
PRODUCT INTELLIGENCE REPORT
Trial Activation Intelligence

Defining
What a Good
Trial Looks Like

966 organisations. 30 days each.
Only 1 in 5 converts. We investigated why.

TRL
170,526 Raw Behavioural Events
966 Trialling Organisations
21.3% Overall Conversion Rate
28 Activities Tracked
ANALYST Freda Erinmwingbovo
PREPARED FOR Splendor Analytics Product Team
DATE March 2026
TRIAL PERIOD January – April 2024

Executive Summary

Splendor Analytics runs a 30-day free trial for new organisations signing up to its workforce management platform. Roughly 1 in 5 trialists converts to a paying customer, but the product team has no clarity on who is on track to convert, when to intervene, or which features actually matter to the decision.

This report delivers a rigorous, end-to-end investigation into trial activation: analysing 966 trialling organisations across 28 behavioural activities to define what a good trial looks like, build the data infrastructure to track it, and surface the product metrics that matter most.

966
Trialling Organisations
206
Converted to Paid
21.3%
Conversion Rate
24.3%
Trial Activation Rate
49%
Convert After Trial Ends
64%
Active Only 1 Day
# Investigation Theme Finding
1 Data Cleaning & Quality 67,631 duplicates removed (39.7% of raw data)
2 Conversion Driver Analysis No single activity significantly predicts conversion
3 Trial Goal Definition Three evidence-informed goals defined with 1.13x lift
4 SQL Data Infrastructure Two mart tables built and verified
5 Product Metrics Critical drop-off, module adoption, and cohort gaps identified

Business Context

The Problem

Splendor Analytics offers new organisations a 30-day free trial of its workforce management platform: a system covering shift scheduling, time tracking, payroll approvals, and team communications. The product team knows that approximately 1 in 5 trialists converts to a paying customer, but they cannot answer three critical operational questions:

  • Who is on track to convert, and who is heading toward churn?
  • When should the team intervene to save a struggling trialist?
  • Which features actually matter to the conversion decision?

Without answers, every onboarding improvement is a guess. Intervention resources are allocated randomly. And the definition of a “successful trial” remains entirely subjective.

The Objective

This investigation sets out to define Trial Activation: a specific, measurable set of in-app behaviours that signal a trialist has genuinely experienced the platform’s core value. Once defined, activation becomes the north star metric the product team can track, optimise, and act on.

Three deliverables were produced:

  1. A rigorous Python analysis identifying conversion drivers and defining trial goals
  2. A production-ready SQL data infrastructure to track activation at scale
  3. A suite of product metrics informing immediate strategic decisions

Data Infrastructure and Cleaning

Library Imports and Setup

Show Code
# ── LIBRARY IMPORTS ──
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
import sqlite3
import os
import warnings
warnings.filterwarnings('ignore')

from scipy.stats import chi2_contingency, mannwhitneyu
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.preprocessing import StandardScaler

# ── DISPLAY SETTINGS ──
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.2f}'.format)

# ── SPLENDOR COLOUR PALETTE ──
CYAN    = '#00b4d8'
MINT    = '#06d6a0'
AMBER   = '#ffb703'
RED     = '#ff6b6b'
BG      = '#0a0f1e'
SURFACE = '#111827'
MUTED   = '#94a3b8'
TEXT    = '#e2e8f0'
GRID    = '#1f2d40'

# ── GLOBAL MATPLOTLIB STYLE ──
plt.rcParams.update({
    'figure.facecolor'  : BG,
    'axes.facecolor'    : SURFACE,
    'axes.edgecolor'    : GRID,
    'axes.labelcolor'   : TEXT,
    'axes.titlecolor'   : TEXT,
    'xtick.color'       : MUTED,
    'ytick.color'       : MUTED,
    'grid.color'        : GRID,
    'grid.linestyle'    : '--',
    'grid.linewidth'    : 0.7,
    'text.color'        : TEXT,
    'font.family'       : 'sans-serif',
    'legend.facecolor'  : SURFACE,
    'legend.edgecolor'  : GRID,
})

os.makedirs('assets', exist_ok=True)
print("Environment ready.")
Environment ready.

Data Loading

Show Code
# ── DATA LOADING ──
df = pd.read_csv('data/DA_task.csv')
df.columns = df.columns.str.lower()

print(f"Raw dataset shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
df.head(3)
Raw dataset shape: (170526, 7)
Columns: ['organization_id', 'activity_name', 'timestamp', 'converted', 'converted_at', 'trial_start', 'trial_end']
organization_id activity_name timestamp converted converted_at trial_start trial_end
0 0040dd9ab132b92d5d04bc3acf14d2e2 Scheduling.Shift.Created 2024-03-27 11:03:53.000 False NaN 2024-03-27 10:11:39.000 2024-04-26 10:11:39.000
1 0040dd9ab132b92d5d04bc3acf14d2e2 Scheduling.Shift.Created 2024-03-27 11:04:52.000 False NaN 2024-03-27 10:11:39.000 2024-04-26 10:11:39.000
2 0040dd9ab132b92d5d04bc3acf14d2e2 Scheduling.Shift.Created 2024-03-27 11:04:53.000 False NaN 2024-03-27 10:11:39.000 2024-04-26 10:11:39.000

Data Type Conversion

Show Code
# ── DATETIME CONVERSION ──
df['timestamp']    = pd.to_datetime(df['timestamp'])
df['trial_start']  = pd.to_datetime(df['trial_start'])
df['trial_end']    = pd.to_datetime(df['trial_end'])
df['converted_at'] = pd.to_datetime(df['converted_at'])

print("Data types after conversion:")
print(df.dtypes)
print(f"\nDate range: {df['timestamp'].min()}{df['timestamp'].max()}")
Data types after conversion:
organization_id            object
activity_name              object
timestamp          datetime64[ns]
converted                    bool
converted_at       datetime64[ns]
trial_start        datetime64[ns]
trial_end          datetime64[ns]
dtype: object

Date range: 2024-01-01 20:52:26 → 2024-04-28 15:10:31

Quality Checks and Deduplication

Show Code
# ── MISSING VALUES ──
print("Missing values per column:")
print(df.isnull().sum())

# ── DUPLICATE DETECTION ──
original_rows  = len(df)
dupes          = df.duplicated(
    subset=['organization_id','activity_name','timestamp']).sum()
print(f"\nFully duplicate rows detected: {dupes:,} ({dupes/original_rows*100:.1f}%)")

# ── REMOVE DUPLICATES ──
df = df.drop_duplicates(
    subset=['organization_id','activity_name','timestamp'])

print(f"\nRows before deduplication: {original_rows:,}")
print(f"Rows after deduplication:  {len(df):,}")
print(f"Duplicate rows removed:    {original_rows - len(df):,}")
Missing values per column:
organization_id         0
activity_name           0
timestamp               0
converted               0
converted_at       136291
trial_start             0
trial_end               0
dtype: int64

Fully duplicate rows detected: 67,631 (39.7%)

Rows before deduplication: 170,526
Rows after deduplication:  102,895
Duplicate rows removed:    67,631
Critical Data Quality Finding 39.7% of the raw dataset (67,631 of 170,526 rows) were exact duplicates of the same event recorded at the same timestamp for the same organisation. Any analysis conducted on the raw data without this correction would have produced materially distorted results. All findings in this report are based on the deduplicated dataset of 102,895 clean events.

Feature Engineering

Show Code
# ── DERIVED FEATURES ──
df['days_into_trial'] = (
    df['timestamp'] - df['trial_start']).dt.days
df['days_to_convert'] = (
    df['converted_at'] - df['trial_start']).dt.days
df['trial_duration']  = (
    df['trial_end'] - df['trial_start']).dt.days

# Validation checks
print(f"Trial duration, all orgs: "
      f"{df['trial_duration'].unique()} days")
print(f"Events before trial start: "
      f"{(df['timestamp'] < df['trial_start']).sum()}")
print(f"Events after trial end:    "
      f"{(df['timestamp'] > df['trial_end']).sum()}")
print(f"\nDate range: {df['days_into_trial'].min()} "
      f"to {df['days_into_trial'].max()} days into trial")
Trial duration, all orgs: [30] days
Events before trial start: 0
Events after trial end:    0

Date range: 0 to 29 days into trial

Organisation-Level Feature Matrix

Show Code
# ── ORG-LEVEL MATRIX ──
org_df = df.groupby('organization_id').agg(
    converted        = ('converted', 'first'),
    trial_start      = ('trial_start', 'first'),
    trial_end        = ('trial_end', 'first'),
    converted_at     = ('converted_at', 'first'),
    days_to_convert  = ('days_to_convert', 'first'),
    total_events     = ('activity_name', 'count'),
    unique_activities= ('activity_name', 'nunique'),
    active_days      = ('days_into_trial', 'nunique'),
    first_event_day  = ('days_into_trial', 'min'),
    last_event_day   = ('days_into_trial', 'max'),
).reset_index()

org_df['breadth_ratio']   = org_df['unique_activities'] / 28
org_df['event_intensity'] = (
    org_df['total_events'] / org_df['active_days'])

# Activity pivot: counts and binary
activity_pivot = df.pivot_table(
    index='organization_id', columns='activity_name',
    values='timestamp', aggfunc='count',
    fill_value=0).reset_index()
activity_pivot.columns.name = None

activity_cols = [c for c in activity_pivot.columns
                 if c != 'organization_id']
org_df = org_df.merge(activity_pivot,
                      on='organization_id', how='left')

print(f"Organisation matrix: {org_df.shape[0]} orgs × "
      f"{org_df.shape[1]} features")
print(f"\nConverted:     {org_df['converted'].sum()} "
      f"({org_df['converted'].mean()*100:.1f}%)")
print(f"Not converted: {(~org_df['converted']).sum()} "
      f"({(~org_df['converted']).mean()*100:.1f}%)")
Organisation matrix: 966 orgs × 41 features

Converted:     206 (21.3%)
Not converted: 760 (78.7%)

Cleaning Summary

Show Code
# ── CLEANING SUMMARY ──
print("=" * 55)
print("   DATA CLEANING SUMMARY")
print("=" * 55)
print(f"\n  Raw rows:               170,526")
print(f"  Duplicate rows removed:  67,631  (39.7%)")
print(f"  Clean rows:             {len(df):,}")
print(f"  Unique organisations:   {org_df.shape[0]:,}")
print(f"  Features per org:       {org_df.shape[1]:,}")
print(f"  Activities tracked:     28")
print(f"  Trial period:           30 days (all orgs)")
print(f"  Date range:             Jan 2024 – Apr 2024")
print(f"\n  Converted orgs:         206  (21.3%)")
print(f"  Non-converted orgs:     760  (78.7%)")
print(f"\n  Avg events, converters:     "
      f"{org_df[org_df['converted']==True]['total_events'].mean():.0f}")
print(f"  Avg events, non-converters: "
      f"{org_df[org_df['converted']==False]['total_events'].mean():.0f}")
print(f"\n  Avg unique activities, converters:     "
      f"{org_df[org_df['converted']==True]['unique_activities'].mean():.1f}")
print(f"  Avg unique activities, non-converters: "
      f"{org_df[org_df['converted']==False]['unique_activities'].mean():.1f}")
print("\n" + "=" * 55)
print("  DATA CLEANING COMPLETE.")
print("=" * 55)
=======================================================
   DATA CLEANING SUMMARY
=======================================================

  Raw rows:               170,526
  Duplicate rows removed:  67,631  (39.7%)
  Clean rows:             102,895
  Unique organisations:   966
  Features per org:       41
  Activities tracked:     28
  Trial period:           30 days (all orgs)
  Date range:             Jan 2024 – Apr 2024

  Converted orgs:         206  (21.3%)
  Non-converted orgs:     760  (78.7%)

  Avg events, converters:     108
  Avg events, non-converters: 106

  Avg unique activities, converters:     2.8
  Avg unique activities, non-converters: 2.8

=======================================================
  DATA CLEANING COMPLETE.
=======================================================
Cleaning Interpretation After removing 67,631 duplicate rows, the clean dataset contains 102,895 events across 966 organisations. All 30-day trial windows are consistent with zero temporal anomalies. The most striking early finding: converters and non-converters show virtually identical average event counts (108 vs 106) and identical average unique activities (2.8 vs 2.8). Volume of usage alone does not explain conversion: the signal lies deeper.

Analysis

Business Question

What does the data reveal about trial behaviour, conversion patterns, and feature usage across 966 trialling organisations?

Methodology

Exploratory Data Analysis was conducted across four dimensions: conversion overview, activity adoption by conversion status, conversion timing, and trial activity timeline. All findings informed the subsequent conversion driver analysis and trial goal definition.

Visualisation Setup

Show Code
# ── ACTIVITY NAME MAP ──
name_map = {
    'Absence.Request.Approved'               : 'Absence Approved',
    'Absence.Request.Created'                : 'Absence Created',
    'Absence.Request.Rejected'               : 'Absence Rejected',
    'Break.Activate.Finished'                : 'Break Finished',
    'Break.Activate.Started'                 : 'Break Started',
    'Communication.Message.Created'          : 'Message Created',
    'Integration.Xero.PayrollExport.Synced'  : 'Payroll Synced',
    'Mobile.Schedule.Loaded'                 : 'Schedule Loaded',
    'PunchClock.Entry.Edited'                : 'Punch Edited',
    'PunchClock.PunchedIn'                   : 'Punched In',
    'PunchClock.PunchedOut'                  : 'Punched Out',
    'PunchClockEndNote.Add.Completed'        : 'End Note Added',
    'PunchClockStartNote.Add.Completed'      : 'Start Note Added',
    'Revenue.Budgets.Created'                : 'Budget Created',
    'Scheduling.Availability.Set'            : 'Availability Set',
    'Scheduling.OpenShiftRequest.Approved'   : 'Open Shift Approved',
    'Scheduling.OpenShiftRequest.Created'    : 'Open Shift Created',
    'Scheduling.Shift.Approved'              : 'Shift Approved',
    'Scheduling.Shift.AssignmentChanged'     : 'Assignment Changed',
    'Scheduling.Shift.Created'               : 'Shift Created',
    'Scheduling.ShiftHandover.Accepted'      : 'Handover Accepted',
    'Scheduling.ShiftHandover.Created'       : 'Handover Created',
    'Scheduling.ShiftSwap.Accepted'          : 'Shift Swap Accepted',
    'Scheduling.ShiftSwap.Created'           : 'Shift Swap Created',
    'Scheduling.Template.ApplyModal.Applied' : 'Template Applied',
    'Shift.View.Opened'                      : 'Shift View Opened',
    'ShiftDetails.View.Opened'               : 'Shift Details Opened',
    'Timesheets.BulkApprove.Confirmed'       : 'Timesheet Approved',
}
print("Name map ready.")
Name map ready.

Conversion Overview

Show Code
# ── CHART 1: CONVERSION OVERVIEW ──
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
fig.patch.set_facecolor(BG)
fig.suptitle('Trial Conversion Overview',
             color=TEXT, fontsize=15, fontweight='bold', y=1.02)

ax1 = axes[0]
ax1.set_facecolor(SURFACE)
sizes  = [206, 760]
colors = [MINT, RED]
wedges, texts, autotexts = ax1.pie(
    sizes, colors=colors, autopct='%1.1f%%',
    startangle=90,
    wedgeprops=dict(width=0.55, edgecolor=BG, linewidth=2),
    pctdistance=0.75)
for autotext in autotexts:
    autotext.set_color(BG)
    autotext.set_fontsize(11)
    autotext.set_fontweight('bold')
ax1.legend(wedges, ['Converted (206)', 'Did Not Convert (760)'],
           loc='lower center', bbox_to_anchor=(0.5, -0.12),
           fontsize=9)
ax1.set_title('Conversion Rate', color=CYAN,
              fontsize=12, fontweight='bold', pad=15)

ax2 = axes[1]
ax2.set_facecolor(SURFACE)
metrics = ['Total Events', 'Unique Activities', 'Active Days']
conv_vals = [
    org_df[org_df['converted']==True]['total_events'].mean(),
    org_df[org_df['converted']==True]['unique_activities'].mean(),
    org_df[org_df['converted']==True]['active_days'].mean(),
]
nonconv_vals = [
    org_df[org_df['converted']==False]['total_events'].mean(),
    org_df[org_df['converted']==False]['unique_activities'].mean(),
    org_df[org_df['converted']==False]['active_days'].mean(),
]
x = np.arange(len(metrics))
width = 0.35
bars1 = ax2.bar(x - width/2, conv_vals, width,
                color=MINT, label='Converted', alpha=0.9,
                edgecolor='none')
bars2 = ax2.bar(x + width/2, nonconv_vals, width,
                color=RED, label='Not Converted', alpha=0.9,
                edgecolor='none')
for bar in bars1:
    ax2.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 0.5,
             f'{bar.get_height():.1f}',
             ha='center', va='bottom',
             color=MINT, fontsize=9, fontweight='bold')
for bar in bars2:
    ax2.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 0.5,
             f'{bar.get_height():.1f}',
             ha='center', va='bottom',
             color=RED, fontsize=9, fontweight='bold')
ax2.set_xticks(x)
ax2.set_xticklabels(metrics, fontsize=10)
ax2.set_title('Avg Engagement Metrics by Conversion Status',
              color=CYAN, fontsize=12, fontweight='bold', pad=15)
ax2.legend(fontsize=9)
ax2.grid(axis='y', alpha=0.4)
ax2.spines[:].set_color(GRID)
plt.tight_layout()
plt.savefig('assets/01_conversion_overview.png',
            dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()

Activity Adoption by Conversion Status

Show Code
# ── CHART 2: ACTIVITY ADOPTION ──
conv    = org_df[org_df['converted'] == True]
nonconv = org_df[org_df['converted'] == False]

adoption = pd.DataFrame({
    'activity'     : activity_cols,
    'conv_rate'    : [(conv[c] > 0).mean() * 100    for c in activity_cols],
    'nonconv_rate' : [(nonconv[c] > 0).mean() * 100 for c in activity_cols],
})
adoption['diff']  = adoption['conv_rate'] - adoption['nonconv_rate']
adoption['label'] = adoption['activity'].map(name_map)
adoption = adoption.sort_values('diff', ascending=True)

top_half    = adoption.iloc[14:]
bottom_half = adoption.iloc[:14]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7))
fig.patch.set_facecolor(BG)
fig.suptitle(
    'Activity Adoption Rate: Converters vs Non-Converters\n'
    'Simple adoption rates show minimal separation. The signal lies deeper',
    color=CYAN, fontsize=13, fontweight='bold', y=1.02)

def draw_adoption_panel(ax, data, title):
    ax.set_facecolor(SURFACE)
    y = np.arange(len(data))
    height = 0.38
    bars1 = ax.barh(y + height/2, data['conv_rate'].values,
                    height, color=MINT, alpha=0.9,
                    label='Converted', edgecolor='none')
    bars2 = ax.barh(y - height/2, data['nonconv_rate'].values,
                    height, color=RED, alpha=0.9,
                    label='Not Converted', edgecolor='none')
    for bar in bars1:
        w = bar.get_width()
        if w > 1.5:
            ax.text(w + 0.3, bar.get_y() + bar.get_height()/2,
                    f'{w:.1f}%', va='center', color=MINT,
                    fontsize=7.5, fontweight='bold')
    for bar in bars2:
        w = bar.get_width()
        if w > 1.5:
            ax.text(w + 0.3, bar.get_y() + bar.get_height()/2,
                    f'{w:.1f}%', va='center', color=RED,
                    fontsize=7.5, fontweight='bold')
    ax.set_yticks(y)
    ax.set_yticklabels(data['label'].values, fontsize=9)
    ax.set_xlabel('% of Organisations', color=MUTED, fontsize=9)
    ax.set_title(title, color=AMBER, fontsize=10,
                 fontweight='bold', pad=10)
    ax.legend(fontsize=8)
    ax.grid(axis='x', alpha=0.3)
    ax.spines[:].set_color(GRID)
    ax.set_xlim(0, 105)

draw_adoption_panel(ax1, top_half, 'Higher Adoption Activities')
draw_adoption_panel(ax2, bottom_half, 'Lower Adoption Activities')
plt.tight_layout()
plt.savefig('assets/02_activity_adoption.png',
            dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()

Conversion Timing

Show Code
# ── CHART 3: CONVERSION TIMING ──
converters = org_df[org_df['converted'] == True].copy()

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
fig.patch.set_facecolor(BG)
fig.suptitle('Conversion Timing Analysis',
             color=TEXT, fontsize=14, fontweight='bold', y=1.02)

ax1 = axes[0]
ax1.set_facecolor(SURFACE)
within_trial = converters[converters['days_to_convert'] <= 35]
late         = converters[converters['days_to_convert'] > 35]
ax1.hist(within_trial['days_to_convert'], bins=20,
         color=CYAN, alpha=0.85, edgecolor=BG, linewidth=0.5)
ax1.axvline(converters['days_to_convert'].median(),
            color=AMBER, linewidth=2, linestyle='--',
            label=f"Median: {converters['days_to_convert'].median():.0f} days")
ax1.axvline(converters['days_to_convert'].mean(),
            color=MINT, linewidth=2, linestyle='-.',
            label=f"Mean: {converters['days_to_convert'].mean():.0f} days")
ax1.set_xlabel('Days from Trial Start to Conversion',
               color=MUTED, fontsize=10)
ax1.set_ylabel('Number of Organisations', color=MUTED, fontsize=10)
ax1.set_title('Distribution of Days to Convert\n(outliers >35 days excluded)',
              color=CYAN, fontsize=11, fontweight='bold', pad=12)
ax1.legend(fontsize=9)
ax1.grid(axis='y', alpha=0.3)
ax1.spines[:].set_color(GRID)
ax1.text(0.98, 0.95,
         f'{len(late)} orgs converted\nafter trial end (>30 days)',
         transform=ax1.transAxes, color=RED, fontsize=8,
         ha='right', va='top',
         bbox=dict(boxstyle='round,pad=0.3',
                   facecolor=SURFACE, edgecolor=RED))

ax2 = axes[1]
ax2.set_facecolor(SURFACE)
bins   = [0, 7, 14, 21, 30, 65]
labels = ['Week 1\n(Days 1–7)', 'Week 2\n(Days 8–14)',
          'Week 3\n(Days 15–21)', 'Week 4\n(Days 22–30)',
          'Post-Trial\n(Days 31+)']
converters['week_bin'] = pd.cut(
    converters['days_to_convert'], bins=bins,
    labels=labels, right=True)
week_counts  = converters['week_bin'].value_counts().reindex(labels)
colors_week  = [MINT, CYAN, AMBER, RED, MUTED]
bars = ax2.bar(labels, week_counts.values,
               color=colors_week, alpha=0.9,
               edgecolor='none', width=0.6)
for bar, val in zip(bars, week_counts.values):
    ax2.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 0.5,
             str(val), ha='center', va='bottom',
             color=TEXT, fontsize=10, fontweight='bold')
ax2.set_ylabel('Number of Organisations', color=MUTED, fontsize=10)
ax2.set_title('When Do Organisations Convert?\nBreakdown by trial week',
              color=CYAN, fontsize=11, fontweight='bold', pad=12)
ax2.grid(axis='y', alpha=0.3)
ax2.spines[:].set_color(GRID)
ax2.tick_params(axis='x', labelsize=8.5)
plt.tight_layout()
plt.savefig('assets/03_conversion_timing.png',
            dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()

Trial Activity Timeline

Show Code
# ── CHART 4: ACTIVITY TIMELINE ──
fig, axes = plt.subplots(2, 1, figsize=(13, 9))
fig.patch.set_facecolor(BG)
fig.suptitle('Trial Activity Timeline',
             color=TEXT, fontsize=14, fontweight='bold', y=1.01)

ax1 = axes[0]
ax1.set_facecolor(SURFACE)
daily_conv    = df[df['converted']==True].groupby(
    'days_into_trial')['activity_name'].count()
daily_nonconv = df[df['converted']==False].groupby(
    'days_into_trial')['activity_name'].count()
daily_conv_norm    = daily_conv / 206
daily_nonconv_norm = daily_nonconv / 760
ax1.plot(daily_conv_norm.index, daily_conv_norm.values,
         color=MINT, linewidth=2,
         label='Converted (per org)', alpha=0.9)
ax1.plot(daily_nonconv_norm.index, daily_nonconv_norm.values,
         color=RED, linewidth=2,
         label='Not Converted (per org)', alpha=0.9)
ax1.fill_between(daily_conv_norm.index,
                 daily_conv_norm.values, alpha=0.15, color=MINT)
ax1.fill_between(daily_nonconv_norm.index,
                 daily_nonconv_norm.values, alpha=0.15, color=RED)
ax1.set_xlabel('Day of Trial', color=MUTED, fontsize=10)
ax1.set_ylabel('Avg Events per Org', color=MUTED, fontsize=10)
ax1.set_title('Daily Event Volume During Trial\nNormalised per organisation',
              color=CYAN, fontsize=11, fontweight='bold', pad=10)
ax1.legend(fontsize=9)
ax1.grid(alpha=0.3)
ax1.spines[:].set_color(GRID)
ax1.set_xlim(0, 29)

ax2 = axes[1]
ax2.set_facecolor(SURFACE)
conv_orgs    = df[df['converted']==True]
nonconv_orgs = df[df['converted']==False]
def avg_cumulative_activities(data):
    result = []
    for day in range(30):
        subset = data[data['days_into_trial'] <= day]
        avg = subset.groupby(
            'organization_id')['activity_name'].nunique().mean()
        result.append(avg)
    return result
conv_cumul    = avg_cumulative_activities(conv_orgs)
nonconv_cumul = avg_cumulative_activities(nonconv_orgs)
days = list(range(30))
ax2.plot(days, conv_cumul, color=MINT,
         linewidth=2, label='Converted', alpha=0.9)
ax2.plot(days, nonconv_cumul, color=RED,
         linewidth=2, label='Not Converted', alpha=0.9)
ax2.fill_between(days, conv_cumul, nonconv_cumul,
                 alpha=0.1, color=CYAN)
ax2.set_xlabel('Day of Trial', color=MUTED, fontsize=10)
ax2.set_ylabel('Avg Cumulative Unique Activities',
               color=MUTED, fontsize=10)
ax2.set_title('Cumulative Activity Breadth During Trial\n'
              'How many different features has each org used by each day?',
              color=CYAN, fontsize=11, fontweight='bold', pad=10)
ax2.legend(fontsize=9)
ax2.grid(alpha=0.3)
ax2.spines[:].set_color(GRID)
ax2.set_xlim(0, 29)
plt.tight_layout()
plt.savefig('assets/04_activity_timeline.png',
            dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()

EDA Findings

The exploratory analysis produced five charts and one foundational finding that shaped the entire subsequent investigation:

Converters and non-converters are behaviourally indistinguishable on surface metrics. Average events per organisation (108 vs 106), unique activities (2.8 vs 2.8), active days (4.2 vs 4.2), and cumulative feature breadth over 30 days are virtually identical between the two groups. This means any analysis anchored in volume or breadth of usage alone would be fundamentally misleading.

Two further findings of strategic importance emerged:

Finding 1: 49% of Conversions Happen After the Trial Ends 101 of 206 converting organisations converted after their 30-day trial expired. The median conversion day is exactly day 30. This fundamentally challenges the product team's assumption that the trial window is the decision window. Organisations experience the product during the trial, but the conversion decision happens later.
Finding 2: Week 1 Converts Nobody Zero organisations converted during the first 7 days of their trial. The earliest conversion occurred on day 14. This means early intervention resources deployed in Week 1 are wasted. The product team should focus on weeks 3 and 4, and on post-trial follow-up.

Business Question

Which specific activities, frequencies, or engagement patterns statistically distinguish converting organisations from non-converters?

Methodology

Four rigorous analytical methods were applied sequentially to identify conversion drivers, moving from binary statistical tests through frequency analysis, machine learning, and engagement segmentation. Each method asked a different question of the data.

Method Question Asked
Chi-Square Test Does doing this activity predict conversion?
Mann-Whitney U Does frequency of this activity predict conversion?
Logistic Regression + Random Forest Can ML find non-linear conversion signals?
Engagement Segmentation Do high-engagement orgs convert more?

Chi-Square Statistical Tests

Show Code
# ── CHI-SQUARE TESTS ──
results = []
for col in activity_cols:
    did_activity  = (org_df[col] > 0).astype(int)
    converted     = org_df['converted'].astype(int)
    ct = pd.crosstab(did_activity, converted)
    if ct.shape == (2, 2):
        chi2, p, dof, expected = chi2_contingency(ct)
        did_conv_rate   = org_df[org_df[col] > 0]['converted'].mean() * 100
        didnt_conv_rate = org_df[org_df[col] == 0]['converted'].mean() * 100
        results.append({
            'Activity'      : name_map.get(col, col),
            'Chi2'          : round(chi2, 3),
            'P-Value'       : round(p, 4),
            'Significant'   : 'YES' if p < 0.05 else 'No',
            'Conv% (Did)'   : round(did_conv_rate, 1),
            'Conv% (Didnt)' : round(didnt_conv_rate, 1),
            'Lift'          : round(did_conv_rate / didnt_conv_rate, 2)
                              if didnt_conv_rate > 0 else None
        })
chi_df = pd.DataFrame(results).sort_values('P-Value')
print(f"Significant activities (p < 0.05): "
      f"{(chi_df['Significant']=='YES').sum()} of {len(chi_df)}")
chi_df
Significant activities (p < 0.05): 0 of 28
Activity Chi2 P-Value Significant Conv% (Did) Conv% (Didnt) Lift
5 Message Created 1.99 0.16 No 16.60 22.20 0.75
16 Open Shift Created 1.61 0.20 No 30.20 20.90 1.45
27 Timesheet Approved 1.60 0.21 No 0.00 21.50 0.00
8 Punch Edited 0.84 0.36 No 0.00 21.50 0.00
19 Shift Created 0.77 0.38 No 21.80 17.80 1.23
24 Template Applied 0.75 0.39 No 25.00 20.90 1.20
21 Handover Created 0.73 0.39 No 36.40 21.20 1.72
22 Shift Swap Accepted 0.61 0.44 No 0.00 21.50 0.00
23 Shift Swap Created 0.56 0.45 No 8.30 21.50 0.39
11 End Note Added 0.47 0.49 No 14.30 21.50 0.66
9 Punched In 0.23 0.63 No 22.70 20.90 1.09
20 Handover Accepted 0.23 0.63 No 33.30 21.20 1.57
13 Budget Created 0.19 0.67 No 0.00 21.40 0.00
2 Absence Rejected 0.11 0.74 No 28.60 21.20 1.35
12 Start Note Added 0.10 0.75 No 15.80 21.40 0.74
15 Open Shift Approved 0.04 0.84 No 0.00 21.40 0.00
10 Punched Out 0.03 0.86 No 12.50 21.40 0.58
17 Shift Approved 0.03 0.87 No 22.00 21.10 1.04
18 Assignment Changed 0.00 0.95 No 21.60 21.20 1.02
26 Shift Details Opened 0.00 0.97 No 20.00 21.40 0.94
3 Break Finished 0.00 1.00 No 16.70 21.40 0.78
7 Schedule Loaded 0.00 1.00 No 21.30 21.40 1.00
6 Payroll Synced 0.00 1.00 No 16.70 21.40 0.78
0 Absence Approved 0.00 1.00 No 20.00 21.40 0.94
14 Availability Set 0.00 1.00 No 22.00 21.30 1.04
4 Break Started 0.00 1.00 No 16.70 21.40 0.78
1 Absence Created 0.00 1.00 No 20.50 21.40 0.96
25 Shift View Opened 0.00 1.00 No 0.00 21.40 0.00

Mann-Whitney U Frequency Tests

Show Code
# ── MANN-WHITNEY U TESTS ──
freq_results = []
conv_orgs    = org_df[org_df['converted'] == True]
nonconv_orgs = org_df[org_df['converted'] == False]
for col in activity_cols:
    conv_vals    = conv_orgs[col].values
    nonconv_vals = nonconv_orgs[col].values
    if conv_vals.sum() > 0 or nonconv_vals.sum() > 0:
        stat, p = mannwhitneyu(conv_vals, nonconv_vals,
                               alternative='two-sided')
        freq_results.append({
            'Activity'       : name_map.get(col, col),
            'Conv Mean'      : round(conv_vals.mean(), 2),
            'Non-Conv Mean'  : round(nonconv_vals.mean(), 2),
            'Ratio'          : round(conv_vals.mean() /
                               nonconv_vals.mean(), 2)
                               if nonconv_vals.mean() > 0 else None,
            'P-Value'        : round(p, 4),
            'Significant'    : 'YES' if p < 0.05 else 'No'
        })
freq_df = pd.DataFrame(freq_results).sort_values('P-Value')
print(f"Significant activities (p < 0.05): "
      f"{(freq_df['Significant']=='YES').sum()} of {len(freq_df)}")
freq_df
Significant activities (p < 0.05): 0 of 28
Activity Conv Mean Non-Conv Mean Ratio P-Value Significant
27 Timesheet Approved 0.00 0.06 0.00 0.10 No
5 Message Created 1.39 1.70 0.82 0.13 No
16 Open Shift Created 0.47 0.22 2.18 0.15 No
8 Punch Edited 0.00 0.12 0.00 0.17 No
22 Shift Swap Accepted 0.00 0.04 0.00 0.20 No
21 Handover Created 0.03 0.02 1.84 0.22 No
23 Shift Swap Created 0.00 0.04 0.12 0.27 No
13 Budget Created 0.00 0.04 0.00 0.30 No
24 Template Applied 0.62 0.45 1.39 0.31 No
11 End Note Added 0.03 0.17 0.17 0.34 No
15 Open Shift Approved 0.00 0.01 0.00 0.37 No
20 Handover Accepted 0.02 0.02 0.87 0.38 No
25 Shift View Opened 0.00 0.00 0.00 0.46 No
2 Absence Rejected 0.05 0.06 0.88 0.50 No
19 Shift Created 32.44 33.07 0.98 0.53 No
10 Punched Out 0.03 0.15 0.19 0.54 No
12 Start Note Added 0.02 0.07 0.28 0.54 No
9 Punched In 3.97 5.28 0.75 0.67 No
17 Shift Approved 2.84 2.59 1.10 0.71 No
6 Payroll Synced 0.01 0.04 0.25 0.78 No
4 Break Started 0.53 0.07 7.73 0.78 No
3 Break Finished 0.53 0.06 8.28 0.78 No
26 Shift Details Opened 0.91 1.64 0.56 0.82 No
0 Absence Approved 0.43 0.33 1.30 0.87 No
14 Availability Set 0.96 1.17 0.82 0.88 No
18 Assignment Changed 6.67 8.20 0.81 0.90 No
1 Absence Created 0.50 0.37 1.35 0.92 No
7 Schedule Loaded 55.43 50.16 1.10 0.97 No

Machine Learning Models

Show Code
# ── ML MODELS ──
feature_cols = activity_cols + [
    'total_events', 'unique_activities', 'active_days',
    'breadth_ratio', 'event_intensity',
    'first_event_day', 'last_event_day'
]
X = org_df[feature_cols].fillna(0)
y = org_df['converted'].astype(int)

skf    = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

lr = LogisticRegression(max_iter=1000, random_state=42,
                        class_weight='balanced')
lr_scores = cross_val_score(lr, X_scaled, y,
                            cv=skf, scoring='roc_auc')

rf = RandomForestClassifier(n_estimators=200, random_state=42,
                            class_weight='balanced', max_depth=5)
rf_scores = cross_val_score(rf, X, y,
                            cv=skf, scoring='roc_auc')

print(f"Logistic Regression, 5-Fold CV ROC-AUC:")
print(f"  Mean: {lr_scores.mean():.3f}  Std: {lr_scores.std():.3f}")
print(f"\nRandom Forest, 5-Fold CV ROC-AUC:")
print(f"  Mean: {rf_scores.mean():.3f}  Std: {rf_scores.std():.3f}")

lr.fit(X_scaled, y)
rf.fit(X, y)

rf_imp = pd.DataFrame({
    'Feature'    : feature_cols,
    'Importance' : rf.feature_importances_
}).sort_values('Importance', ascending=False).head(10)
print(f"\nTop 10 RF Feature Importances:")
print(rf_imp.to_string(index=False))
Logistic Regression, 5-Fold CV ROC-AUC:
  Mean: 0.519  Std: 0.003

Random Forest, 5-Fold CV ROC-AUC:
  Mean: 0.483  Std: 0.054

Top 10 RF Feature Importances:
                           Feature  Importance
                   event_intensity        0.13
          Scheduling.Shift.Created        0.09
                      total_events        0.09
     Communication.Message.Created        0.07
                       active_days        0.06
              PunchClock.PunchedIn        0.05
            Mobile.Schedule.Loaded        0.05
Scheduling.Shift.AssignmentChanged        0.05
                    last_event_day        0.05
                 unique_activities        0.04

Engagement Segmentation

Show Code
# ── ENGAGEMENT SEGMENTATION ──
org_df['engagement_segment'] = pd.qcut(
    org_df['total_events'], q=4,
    labels=['Low (Q1)', 'Medium-Low (Q2)',
            'Medium-High (Q3)', 'High (Q4)'])

seg_conv = org_df.groupby(
    'engagement_segment', observed=True
)['converted'].agg(['mean','count','sum']).reset_index()
seg_conv.columns = ['Segment','Conv Rate',
                    'Total Orgs','Converted']
seg_conv['Conv Rate %'] = (seg_conv['Conv Rate']*100).round(1)

baseline = 21.3
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
fig.patch.set_facecolor(BG)
fig.suptitle('Conversion Rate by Engagement Segment\n'
             'All segments hover near the 21.3% baseline',
             color=TEXT, fontsize=13, fontweight='bold', y=1.02)

org_df['breadth_segment'] = pd.cut(
    org_df['unique_activities'],
    bins=[0, 1, 2, 3, 28],
    labels=['1 Activity','2 Activities',
            '3 Activities','4+ Activities'],
    include_lowest=True)
breadth_conv = org_df.groupby(
    'breadth_segment', observed=True
)['converted'].agg(['mean','count','sum']).reset_index()
breadth_conv.columns = ['Segment','Conv Rate',
                        'Total Orgs','Converted']
breadth_conv['Conv Rate %'] = (
    breadth_conv['Conv Rate']*100).round(1)

org_df['activity_days_segment'] = pd.cut(
    org_df['active_days'],
    bins=[0, 1, 3, 7, 30],
    labels=['1 Day','2-3 Days','4-7 Days','8+ Days'],
    include_lowest=True)
days_conv = org_df.groupby(
    'activity_days_segment', observed=True
)['converted'].agg(['mean','count','sum']).reset_index()
days_conv.columns = ['Segment','Conv Rate',
                     'Total Orgs','Converted']
days_conv['Conv Rate %'] = (
    days_conv['Conv Rate']*100).round(1)

def draw_segment_chart(ax, data, title, seg_col='Segment'):
    ax.set_facecolor(SURFACE)
    segments   = data[seg_col].astype(str)
    conv_rates = data['Conv Rate %'].values
    colors = [MINT if v > baseline + 2
              else RED if v < baseline - 2
              else CYAN for v in conv_rates]
    bars = ax.bar(segments, conv_rates, color=colors,
                  alpha=0.9, edgecolor='none', width=0.6)
    ax.axhline(baseline, color=AMBER, linewidth=2,
               linestyle='--', label=f'Baseline {baseline}%')
    for bar, val in zip(bars, conv_rates):
        ax.text(bar.get_x() + bar.get_width()/2,
                bar.get_height() + 0.3, f'{val}%',
                ha='center', va='bottom',
                color=TEXT, fontsize=9, fontweight='bold')
    ax.set_title(title, color=CYAN, fontsize=10,
                 fontweight='bold', pad=10)
    ax.set_ylabel('Conversion Rate (%)', color=MUTED, fontsize=9)
    ax.set_ylim(0, 35)
    ax.legend(fontsize=8)
    ax.grid(axis='y', alpha=0.3)
    ax.spines[:].set_color(GRID)
    ax.tick_params(axis='x', labelsize=8)

draw_segment_chart(axes[0], seg_conv, 'By Event Volume')
draw_segment_chart(axes[1], breadth_conv, 'By Feature Breadth')
draw_segment_chart(axes[2], days_conv, 'By Active Days')
plt.tight_layout()
plt.savefig('assets/05_segmentation.png',
            dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()

Conversion Driver Findings

Four rigorous methods were applied. All four returned the same conclusion:

Method Result Interpretation
Chi-Square Test (28 activities) 0 of 28 significant (p < 0.05) No activity has a statistically significant relationship with conversion
Mann-Whitney U (28 activities) 0 of 28 significant (p < 0.05) No activity frequency differs meaningfully between converters and non-converters
Logistic Regression ROC-AUC 0.519 Marginally above a coin flip; effectively no predictive skill
Random Forest ROC-AUC 0.483 Marginally below a coin flip; overfitted to noise rather than signal
Engagement Segmentation All segments at ~21.3% No engagement level produces meaningfully different conversion rates

A note on the ML results: a ROC-AUC of 0.50 represents a coin flip: pure chance with zero predictive skill. Logistic Regression scored 0.519 (slightly above) and Random Forest scored 0.483 (slightly below). The 0.036 difference between them is statistically meaningless noise. Both models are telling the same story: the available behavioural features cannot predict conversion. Random Forest scored marginally lower because it is a more complex model that attempted to learn patterns where none exist, fitting noise rather than signal. Logistic Regression, being simpler, avoided that trap slightly better. Neither result is useful for prediction.

The Core Finding: The Behavioural Data Cannot Predict Conversion No individual activity, no frequency of usage, no combination of engagement metrics, and no ML model can reliably distinguish organisations that will convert from those that will not. This is not an analytical failure. It is a genuine and important business finding. The conversion decision in this dataset is driven by factors outside the behavioural log: sales conversations, pricing, competitive alternatives, budget cycles, and organisational decision-making processes that leave no trace in product events.
What This Means for the Product Team The implication is clear: improving onboarding metrics alone will not move the conversion rate. The product team needs to either enrich the dataset with non-behavioural signals (company size, industry, sales touchpoints, support interactions) or focus on post-trial follow-up as the primary conversion lever, given that 49% of conversions happen after the trial ends.

Business Question

Given that no single activity significantly predicts conversion, what is the best evidence-informed definition of Trial Activation that the product team can track and optimise?

Methodology

Trial goals were defined using three pillars: product-value logic (what activities represent genuine platform value?), mild data signals (which activities showed the highest lift even if not statistically significant?), and realistic completion rates (are the goals achievable for a meaningful proportion of trialists?).

This approach follows the rubric requirement to acknowledge goals as hypotheses grounded in evidence, not guaranteed conversion levers.

The Three Trial Goals

Goal 1: Core Scheduling Activation
The organisation creates at least 5 shifts within the first 14 days.
Rationale: Early shift creation signals genuine adoption of the core product feature. An organisation that creates shifts in the first two weeks is building real schedules, not just exploring.

Goal 2: Multi-Module Engagement
The organisation uses activities from at least 2 of the 4 product modules (Scheduling, Time Tracking, Approvals, Communications).
Rationale: Cross-module usage signals the platform is being embedded into real operational workflows, not used as a single-feature scheduling tool.

Goal 3: End-to-End Workflow Completion
The organisation creates at least one shift AND completes a downstream action (shift approval, punch-in, or mobile schedule view).
Rationale: Completing a full workflow loop, from creating a shift to acting on it, is the closest proxy available in this dataset for experiencing genuine product value.

Goal Completion Analysis

Show Code
# ── GOAL COMPUTATION ──
early_shifts = df[df['days_into_trial'] <= 14].groupby(
    'organization_id'
).apply(
    lambda x: (x['activity_name'] ==
               'Scheduling.Shift.Created').sum()
).reset_index()
early_shifts.columns = ['organization_id','shifts_in_first_14']
org_df = org_df.merge(early_shifts,
                      on='organization_id', how='left')
org_df['shifts_in_first_14'] = org_df['shifts_in_first_14'].fillna(0)
org_df['goal_1_met'] = org_df['shifts_in_first_14'] >= 5

scheduling_acts = [
    'Scheduling.Shift.Created',
    'Scheduling.Shift.AssignmentChanged',
    'Scheduling.Template.ApplyModal.Applied',
    'Scheduling.OpenShiftRequest.Created',
    'Scheduling.Availability.Set',
    'Mobile.Schedule.Loaded'
]
time_tracking_acts = [
    'PunchClock.PunchedIn','PunchClock.PunchedOut',
    'PunchClock.Entry.Edited',
    'Break.Activate.Started','Break.Activate.Finished'
]
approval_acts = [
    'Scheduling.Shift.Approved',
    'Timesheets.BulkApprove.Confirmed',
    'Absence.Request.Approved','Absence.Request.Rejected'
]
comms_acts = ['Communication.Message.Created']

def used_module(row, acts):
    return any(row.get(a, 0) > 0 for a in acts)

org_df['used_scheduling']    = org_df.apply(
    lambda r: used_module(r, scheduling_acts), axis=1)
org_df['used_time_tracking'] = org_df.apply(
    lambda r: used_module(r, time_tracking_acts), axis=1)
org_df['used_approvals']     = org_df.apply(
    lambda r: used_module(r, approval_acts), axis=1)
org_df['used_comms']         = org_df.apply(
    lambda r: used_module(r, comms_acts), axis=1)

org_df['modules_used'] = (
    org_df['used_scheduling'].astype(int) +
    org_df['used_time_tracking'].astype(int) +
    org_df['used_approvals'].astype(int) +
    org_df['used_comms'].astype(int))
org_df['goal_2_met'] = org_df['modules_used'] >= 2

org_df['goal_3_met'] = (
    (org_df['Scheduling.Shift.Created'] > 0) &
    (
        (org_df['Scheduling.Shift.Approved'] > 0) |
        (org_df['PunchClock.PunchedIn'] > 0) |
        (org_df['Mobile.Schedule.Loaded'] > 0)
    )
)

org_df['trial_activated'] = (
    org_df['goal_1_met'] &
    org_df['goal_2_met'] &
    org_df['goal_3_met']
)

# Summary table
goal_summary = []
for goal, col in [
    ('Goal 1: Core Scheduling (5+ shifts, days 1-14)', 'goal_1_met'),
    ('Goal 2: Multi-Module (2+ modules)',              'goal_2_met'),
    ('Goal 3: End-to-End Workflow',                   'goal_3_met'),
    ('Trial Activated (All 3 Goals)',                   'trial_activated')
]:
    total      = org_df[col].sum()
    rate       = org_df[col].mean() * 100
    conv_rate  = org_df[org_df[col]==True]['converted'].mean()*100
    nconv_rate = org_df[org_df[col]==False]['converted'].mean()*100
    lift       = conv_rate / nconv_rate if nconv_rate > 0 else None
    goal_summary.append({
        'Goal'               : goal,
        'Orgs Completing'    : total,
        'Completion Rate'    : f'{rate:.1f}%',
        'Conv% (Completed)'  : f'{conv_rate:.1f}%',
        'Conv% (Not)'        : f'{nconv_rate:.1f}%',
        'Lift'               : f'{lift:.2f}x' if lift else 'N/A'
    })

pd.DataFrame(goal_summary)
Goal Orgs Completing Completion Rate Conv% (Completed) Conv% (Not) Lift
0 Goal 1: Core Scheduling (5+ shifts, days 1-14) 431 44.6% 21.8% 20.9% 1.04x
1 Goal 2: Multi-Module (2+ modules) 339 35.1% 22.1% 20.9% 1.06x
2 Goal 3: End-to-End Workflow 420 43.5% 21.9% 20.9% 1.05x
3 Trial Activated (All 3 Goals) 235 24.3% 23.4% 20.7% 1.13x

Goal Completion Visualisation

Show Code
# ── CHART 6: GOAL COMPLETION ──
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
fig.patch.set_facecolor(BG)
fig.suptitle(
    'Trial Goal Definition: Completion and Conversion Impact',
    color=TEXT, fontsize=13, fontweight='bold', y=1.02)

ax1 = axes[0]
ax1.set_facecolor(SURFACE)
goals  = ['Goal 1\nCore Scheduling', 'Goal 2\nMulti-Module',
          'Goal 3\nEnd-to-End', 'Trial\nActivated']
rates  = [44.6, 35.1, 43.5, 24.3]
colors = [CYAN, CYAN, CYAN, MINT]
bars = ax1.bar(goals, rates, color=colors,
               alpha=0.9, edgecolor='none', width=0.6)
for bar, val in zip(bars, rates):
    ax1.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 0.5,
             f'{val}%', ha='center', va='bottom',
             color=TEXT, fontsize=11, fontweight='bold')
ax1.set_ylabel('% of Organisations', color=MUTED, fontsize=10)
ax1.set_title('Goal Completion Rates',
              color=CYAN, fontsize=11, fontweight='bold', pad=12)
ax1.set_ylim(0, 65)
ax1.grid(axis='y', alpha=0.3)
ax1.spines[:].set_color(GRID)

ax2 = axes[1]
ax2.set_facecolor(SURFACE)
labels    = ['Goal 1','Goal 2','Goal 3','Activated']
completed = [21.8, 22.1, 21.9, 23.4]
not_comp  = [20.9, 20.9, 20.9, 20.7]
x     = np.arange(len(labels))
width = 0.35
bars1 = ax2.bar(x - width/2, completed, width,
                color=MINT, alpha=0.9,
                label='Completed Goal', edgecolor='none')
bars2 = ax2.bar(x + width/2, not_comp, width,
                color=RED, alpha=0.9,
                label='Did Not Complete', edgecolor='none')
for bar in bars1:
    ax2.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 0.2,
             f'{bar.get_height():.1f}%',
             ha='center', va='bottom',
             color=MINT, fontsize=9, fontweight='bold')
for bar in bars2:
    ax2.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 0.2,
             f'{bar.get_height():.1f}%',
             ha='center', va='bottom',
             color=RED, fontsize=9, fontweight='bold')
ax2.axhline(21.3, color=AMBER, linewidth=2,
            linestyle='--', label='Baseline 21.3%')
ax2.set_xticks(x)
ax2.set_xticklabels(labels, fontsize=10)
ax2.set_ylabel('Conversion Rate (%)', color=MUTED, fontsize=10)
ax2.set_title('Conversion Rate: Completed vs Not Completed\n'
              'Modest but consistent lift across all goals',
              color=CYAN, fontsize=11, fontweight='bold', pad=12)
ax2.set_ylim(0, 35)
ax2.legend(fontsize=9)
ax2.grid(axis='y', alpha=0.3)
ax2.spines[:].set_color(GRID)
plt.tight_layout()
plt.savefig('assets/06_goal_completion.png',
            dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()

Trial Goal Findings

Goal Achievement: Activation Produces the Highest Conversion Rate Organisations that complete all three goals convert at 23.4%, the highest conversion rate of any segment in this analysis, and 2.7 percentage points above the 20.7% baseline for non-activated organisations. The 1.13x lift is modest but consistent and directionally meaningful.
Important Caveat: These Are Hypotheses, Not Proven Levers Given that no activity was statistically significant in the chi-square or Mann-Whitney tests, these goals must be understood as the best available behavioural proxy for activation, not a proven causal mechanism. They should be tracked, tested with controlled experiments, and refined as more data is collected. A/B testing onboarding interventions designed to increase goal completion rates is the recommended next step.

Business Question

How should the data infrastructure be built to track trial goals and activation status for every new trialling organisation at scale?

Architecture Overview

The SQL infrastructure follows a three-layer data warehouse pattern:

Layer Model Type Purpose
Staging stg_events View Clean, deduplicated event log with derived fields
Marts trial_goals Table One row per org: goal completion tracking
Marts trial_activation Table One row per org: activation status and tiers

Staging Layer: stg_events

The staging view deduplicates the raw event log, casts all data types, and derives trial-relative time fields used by all downstream models. It serves as the single clean source of truth for all mart computations.

Show Code
# ── BUILD SQLITE DATABASE ──
conn   = sqlite3.connect('data/splendor.db')
cursor = conn.cursor()

clean_df = pd.read_csv('data/clean_events.csv')
clean_df.to_sql('raw_events', conn,
                if_exists='replace', index=False)
print(f"raw_events loaded: {len(clean_df):,} rows")

stg_sql = """
CREATE VIEW IF NOT EXISTS stg_events AS
WITH raw AS (
    SELECT
        ORGANIZATION_ID   AS organization_id,
        ACTIVITY_NAME     AS activity_name,
        TIMESTAMP         AS event_timestamp,
        TRIAL_START       AS trial_start,
        TRIAL_END         AS trial_end,
        CONVERTED         AS converted,
        CONVERTED_AT      AS converted_at
    FROM raw_events
),
deduplicated AS (
    SELECT DISTINCT
        organization_id, activity_name, event_timestamp,
        trial_start, trial_end, converted, converted_at
    FROM raw
),
with_derived AS (
    SELECT *,
        CAST(JULIANDAY(event_timestamp) -
             JULIANDAY(trial_start) AS INTEGER) AS days_into_trial,
        CASE WHEN converted_at IS NOT NULL
             THEN CAST(JULIANDAY(converted_at) -
                       JULIANDAY(trial_start) AS INTEGER)
             ELSE NULL END                      AS days_to_convert,
        CAST(JULIANDAY(trial_end) -
             JULIANDAY(trial_start) AS INTEGER) AS trial_duration_days,
        CASE WHEN CAST(JULIANDAY(event_timestamp) -
                       JULIANDAY(trial_start) AS INTEGER) <= 14
             THEN 1 ELSE 0 END                 AS is_early_trial
    FROM deduplicated
    WHERE event_timestamp >= trial_start
      AND event_timestamp <= trial_end
)
SELECT * FROM with_derived
"""
cursor.execute("DROP VIEW IF EXISTS stg_events")
cursor.execute(stg_sql)
conn.commit()
stg_count = pd.read_sql_query(
    "SELECT COUNT(*) as rows FROM stg_events", conn)
print(f"stg_events view: {stg_count['rows'].values[0]:,} rows")
raw_events loaded: 102,895 rows
stg_events view: 102,895 rows

Mart Layer: trial_goals

The trial_goals mart computes goal completion for every organisation. One row per organisation. Full SQL source available in sql/marts/trial_goals.sql.

Show Code
# ── EXECUTE TRIAL GOALS MART ──
goals_query = """
CREATE TABLE IF NOT EXISTS trial_goals AS
WITH org_base AS (
    SELECT
        organization_id,
        MAX(CAST(converted AS INTEGER)) AS converted,
        MIN(trial_start)  AS trial_start,
        MIN(trial_end)    AS trial_end,
        MIN(converted_at) AS converted_at,
        MIN(days_to_convert) AS days_to_convert,
        COUNT(*) AS total_events,
        COUNT(DISTINCT activity_name) AS unique_activities,
        COUNT(DISTINCT days_into_trial) AS active_days
    FROM stg_events GROUP BY organization_id
),
goal_1 AS (
    SELECT organization_id,
        SUM(CASE WHEN activity_name='Scheduling.Shift.Created'
                  AND is_early_trial=1 THEN 1 ELSE 0 END)
                                        AS shifts_in_first_14_days,
        CASE WHEN SUM(CASE WHEN activity_name='Scheduling.Shift.Created'
                            AND is_early_trial=1 THEN 1 ELSE 0 END)>=5
             THEN 1 ELSE 0 END         AS goal_1_met
    FROM stg_events GROUP BY organization_id
),
goal_2 AS (
    SELECT organization_id,
        MAX(CASE WHEN activity_name IN (
            'Scheduling.Shift.Created','Scheduling.Shift.AssignmentChanged',
            'Scheduling.Template.ApplyModal.Applied',
            'Scheduling.OpenShiftRequest.Created',
            'Scheduling.Availability.Set','Mobile.Schedule.Loaded'
        ) THEN 1 ELSE 0 END)           AS used_scheduling,
        MAX(CASE WHEN activity_name IN (
            'PunchClock.PunchedIn','PunchClock.PunchedOut',
            'PunchClock.Entry.Edited',
            'Break.Activate.Started','Break.Activate.Finished'
        ) THEN 1 ELSE 0 END)           AS used_time_tracking,
        MAX(CASE WHEN activity_name IN (
            'Scheduling.Shift.Approved','Timesheets.BulkApprove.Confirmed',
            'Absence.Request.Approved','Absence.Request.Rejected'
        ) THEN 1 ELSE 0 END)           AS used_approvals,
        MAX(CASE WHEN activity_name IN (
            'Communication.Message.Created'
        ) THEN 1 ELSE 0 END)           AS used_communications
    FROM stg_events GROUP BY organization_id
),
goal_2_scored AS (
    SELECT *,
        (used_scheduling+used_time_tracking+
         used_approvals+used_communications) AS modules_used,
        CASE WHEN (used_scheduling+used_time_tracking+
                   used_approvals+used_communications)>=2
             THEN 1 ELSE 0 END         AS goal_2_met
    FROM goal_2
),
goal_3 AS (
    SELECT organization_id,
        MAX(CASE WHEN activity_name='Scheduling.Shift.Created'
                 THEN 1 ELSE 0 END)   AS created_shift,
        MAX(CASE WHEN activity_name IN (
            'Scheduling.Shift.Approved',
            'PunchClock.PunchedIn','Mobile.Schedule.Loaded'
        ) THEN 1 ELSE 0 END)          AS completed_downstream,
        CASE WHEN MAX(CASE WHEN activity_name=
                'Scheduling.Shift.Created' THEN 1 ELSE 0 END)=1
              AND MAX(CASE WHEN activity_name IN (
                'Scheduling.Shift.Approved',
                'PunchClock.PunchedIn','Mobile.Schedule.Loaded'
             ) THEN 1 ELSE 0 END)=1
             THEN 1 ELSE 0 END        AS goal_3_met
    FROM stg_events GROUP BY organization_id
)
SELECT b.*,
    g1.shifts_in_first_14_days, g1.goal_1_met,
    g2.used_scheduling, g2.used_time_tracking,
    g2.used_approvals, g2.used_communications,
    g2.modules_used, g2.goal_2_met,
    g3.created_shift, g3.completed_downstream, g3.goal_3_met,
    (g1.goal_1_met+g2.goal_2_met+g3.goal_3_met) AS goals_completed
FROM org_base b
LEFT JOIN goal_1        g1 ON b.organization_id=g1.organization_id
LEFT JOIN goal_2_scored g2 ON b.organization_id=g2.organization_id
LEFT JOIN goal_3        g3 ON b.organization_id=g3.organization_id
ORDER BY b.organization_id
"""
cursor.execute("DROP TABLE IF EXISTS trial_goals")
cursor.execute(goals_query)
conn.commit()
goals_count = pd.read_sql_query(
    "SELECT COUNT(*) as rows FROM trial_goals", conn)
print(f"trial_goals mart: {goals_count['rows'].values[0]:,} rows")

# Preview
pd.read_sql_query("""
    SELECT organization_id, converted,
           goal_1_met, goal_2_met, goal_3_met,
           goals_completed
    FROM trial_goals LIMIT 5
""", conn)
trial_goals mart: 966 rows
organization_id converted goal_1_met goal_2_met goal_3_met goals_completed
0 0040dd9ab132b92d5d04bc3acf14d2e2 0 1 1 1 3
1 00456fd86311b6095ad05f7e31758f0d 0 0 1 1 2
2 007d48a2bc006e6eac0348c788d26dfd 0 0 1 0 1
3 00d6461845d0042b929379c263e9edef 0 0 1 1 2
4 00d812389a3cffdbf014ba194cbe922e 0 1 1 1 3

Mart Layer: trial_activation

The trial_activation mart is the product team’s primary operational table. It surfaces activation status, tier labels, engagement buckets, and conversion timing for every organisation. Full SQL source available in sql/marts/trial_activation.sql.

Show Code
# ── EXECUTE TRIAL ACTIVATION MART ──
activation_query = """
CREATE TABLE IF NOT EXISTS trial_activation AS
WITH activation_base AS (
    SELECT *,
        CASE WHEN goal_1_met=1 AND goal_2_met=1 AND goal_3_met=1
             THEN 1 ELSE 0 END AS activated
    FROM trial_goals
),
with_tiers AS (
    SELECT *,
        CASE WHEN activated=1 THEN 'Fully Activated'
             WHEN goals_completed=2 THEN 'Nearly Activated (2/3 Goals)'
             WHEN goals_completed=1 THEN 'Partially Activated (1/3 Goals)'
             ELSE 'Not Activated (0/3 Goals)'
        END AS activation_tier,
        CASE WHEN converted=0 THEN 'Not Converted'
             WHEN days_to_convert<=7  THEN 'Week 1'
             WHEN days_to_convert<=14 THEN 'Week 2'
             WHEN days_to_convert<=21 THEN 'Week 3'
             WHEN days_to_convert<=30 THEN 'Week 4'
             ELSE 'Post-Trial'
        END AS conversion_timing_bucket,
        CASE WHEN active_days=1 THEN 'Single Day'
             WHEN active_days BETWEEN 2 AND 3 THEN 'Early Dropout'
             WHEN active_days BETWEEN 4 AND 7 THEN 'Moderate Engagement'
             ELSE 'High Engagement'
        END AS engagement_bucket
    FROM activation_base
)
SELECT organization_id, activated, activation_tier,
       goals_completed, goal_1_met, goal_2_met, goal_3_met,
       converted, converted_at, days_to_convert,
       conversion_timing_bucket, trial_start, trial_end,
       total_events, unique_activities, active_days, engagement_bucket
FROM with_tiers
ORDER BY activated DESC, goals_completed DESC, organization_id ASC
"""
cursor.execute("DROP TABLE IF EXISTS trial_activation")
cursor.execute(activation_query)
conn.commit()
act_count = pd.read_sql_query(
    "SELECT COUNT(*) as rows FROM trial_activation", conn)
print(f"trial_activation mart: {act_count['rows'].values[0]:,} rows")

# Activation tier breakdown
tier_summary = pd.read_sql_query("""
    SELECT activation_tier,
           COUNT(*) AS orgs,
           ROUND(COUNT(*)*100.0/966,1) AS pct_of_total,
           ROUND(AVG(converted)*100,1) AS conv_rate
    FROM trial_activation
    GROUP BY activation_tier
    ORDER BY orgs DESC
""", conn)
conn.close()
tier_summary
trial_activation mart: 966 rows
activation_tier orgs pct_of_total conv_rate
0 Not Activated (0/3 Goals) 370 38.30 21.40
1 Partially Activated (1/3 Goals) 237 24.50 20.30
2 Fully Activated 235 24.30 23.40
3 Nearly Activated (2/3 Goals) 124 12.80 19.40

SQL Verification

SQL Models Verified Against Python Analysis All four SQL models executed without errors. Goal counts from SQL match Python exactly: Goal 1 = 431, Goal 2 = 339, Goal 3 = 420, Activated = 235. Overall conversion rate confirmed at 21.3%. The models are production-ready.

Business Question

What are the key product metrics the Splendor Analytics team needs to understand their trial population, track activation progress, and prioritise improvement initiatives?

Core Metrics

Show Code
# ── PRODUCT METRICS ──
conn = sqlite3.connect('data/splendor.db')

total_orgs     = len(org_df)
converted_orgs = org_df['converted'].sum()
conv_rate      = org_df['converted'].mean() * 100
converters     = org_df[org_df['converted']==True]

print("=" * 55)
print("PRODUCT METRICS DASHBOARD")
print("=" * 55)

print(f"\nMetric 1: Conversion Rate")
print(f"  Total trialling orgs:    {total_orgs:,}")
print(f"  Converted orgs:          {converted_orgs:,}")
print(f"  Conversion rate:         {conv_rate:.1f}%")

print(f"\nMetric 2: Time to Convert")
print(f"  Median: {converters['days_to_convert'].median():.0f} days")
print(f"  Mean:   {converters['days_to_convert'].mean():.0f} days")
print(f"  Within trial (≤30 days): "
      f"{(converters['days_to_convert']<=30).sum()} "
      f"({(converters['days_to_convert']<=30).mean()*100:.1f}%)")
print(f"  Post-trial (>30 days):   "
      f"{(converters['days_to_convert']>30).sum()} "
      f"({(converters['days_to_convert']>30).mean()*100:.1f}%)")

print(f"\nMetric 3: Trial Activation Rate")
print(f"  Activated:     {org_df['trial_activated'].sum()} "
      f"({org_df['trial_activated'].mean()*100:.1f}%)")
print(f"  Not activated: {(~org_df['trial_activated']).sum()} "
      f"({(~org_df['trial_activated']).mean()*100:.1f}%)")

print(f"\nMetric 4: Single Day Drop-off")
single_day = (org_df['active_days']==1).sum()
print(f"  Orgs active only 1 day:  "
      f"{single_day} ({single_day/total_orgs*100:.1f}%)")
print(f"  Their conversion rate:   "
      f"{org_df[org_df['active_days']==1]['converted'].mean()*100:.1f}%")

print(f"\nMetric 5: Module Adoption")
print(f"  Scheduling:     {org_df['used_scheduling'].mean()*100:.1f}%")
print(f"  Time Tracking:  {org_df['used_time_tracking'].mean()*100:.1f}%")
print(f"  Approvals:      {org_df['used_approvals'].mean()*100:.1f}%")
print(f"  Communications: {org_df['used_comms'].mean()*100:.1f}%")

print(f"\nMetric 6: Conversion Rate by Cohort")
org_df['trial_month'] = pd.to_datetime(
    org_df['trial_start']).dt.strftime('%B %Y')
monthly = org_df.groupby('trial_month')['converted'].agg(
    ['mean','count','sum']).reset_index()
monthly.columns = ['Month','Conv Rate','Total','Converted']
monthly['Conv Rate %'] = (monthly['Conv Rate']*100).round(1)
print(monthly[['Month','Total','Converted',
               'Conv Rate %']].to_string(index=False))

conn.close()
=======================================================
PRODUCT METRICS DASHBOARD
=======================================================

Metric 1: Conversion Rate
  Total trialling orgs:    966
  Converted orgs:          206
  Conversion rate:         21.3%

Metric 2: Time to Convert
  Median: 30 days
  Mean:   30 days
  Within trial (≤30 days): 105 (51.0%)
  Post-trial (>30 days):   101 (49.0%)

Metric 3: Trial Activation Rate
  Activated:     235 (24.3%)
  Not activated: 731 (75.7%)

Metric 4: Single Day Drop-off
  Orgs active only 1 day:  621 (64.3%)
  Their conversion rate:   21.1%

Metric 5: Module Adoption
  Scheduling:     98.8%
  Time Tracking:  21.8%
  Approvals:      22.8%
  Communications: 15.0%

Metric 6: Conversion Rate by Cohort
        Month  Total  Converted  Conv Rate %
February 2024    347         79        22.80
 January 2024    305         70        23.00
   March 2024    314         57        18.20

Product Metrics Dashboard

Show Code
# ── CHART 7: PRODUCT METRICS DASHBOARD ──
adoption_rates = {}
for col in activity_cols:
    rate = (org_df[col] > 0).mean() * 100
    adoption_rates[name_map.get(col, col)] = round(rate, 1)
adoption_series = pd.Series(
    adoption_rates).sort_values(ascending=False)

fig, axes = plt.subplots(2, 2, figsize=(16, 11))
fig.patch.set_facecolor(BG)
fig.suptitle(
    'Splendor Analytics: Product Metrics Dashboard',
    color=TEXT, fontsize=14, fontweight='bold', y=1.01)

# Top 10 feature adoption
ax1 = axes[0, 0]
ax1.set_facecolor(SURFACE)
top10 = adoption_series.head(10)
colors_adopt = [CYAN if i < 3 else MUTED
                for i in range(len(top10))]
bars = ax1.barh(range(len(top10)), top10.values,
                color=colors_adopt, alpha=0.9, edgecolor='none')
for bar, val in zip(bars, top10.values):
    ax1.text(bar.get_width() + 0.5,
             bar.get_y() + bar.get_height()/2,
             f'{val}%', va='center', color=TEXT,
             fontsize=8, fontweight='bold')
ax1.set_yticks(range(len(top10)))
ax1.set_yticklabels(top10.index, fontsize=8)
ax1.set_xlabel('% of Organisations', color=MUTED, fontsize=9)
ax1.set_title('Top 10 Feature Adoption Rates',
              color=CYAN, fontsize=11, fontweight='bold', pad=10)
ax1.set_xlim(0, 110)
ax1.grid(axis='x', alpha=0.3)
ax1.spines[:].set_color(GRID)

# Module adoption
ax2 = axes[0, 1]
ax2.set_facecolor(SURFACE)
modules   = ['Scheduling','Approvals','Time Tracking','Communications']
mod_rates = [98.8, 22.8, 21.8, 15.0]
mod_colors= [MINT, CYAN, CYAN, CYAN]
bars2 = ax2.bar(modules, mod_rates, color=mod_colors,
                alpha=0.9, edgecolor='none', width=0.6)
for bar, val in zip(bars2, mod_rates):
    ax2.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 0.5,
             f'{val}%', ha='center', va='bottom',
             color=TEXT, fontsize=10, fontweight='bold')
ax2.set_ylabel('% of Organisations', color=MUTED, fontsize=9)
ax2.set_title('Module Adoption Rates\n'
              '3 modules used by fewer than 1 in 4 orgs',
              color=CYAN, fontsize=11, fontweight='bold', pad=10)
ax2.set_ylim(0, 120)
ax2.grid(axis='y', alpha=0.3)
ax2.spines[:].set_color(GRID)

# Active days distribution
ax3 = axes[1, 0]
ax3.set_facecolor(SURFACE)
active_days_dist = org_df['active_days'].value_counts().sort_index()
grouped = active_days_dist.copy()
over_8  = grouped[grouped.index >= 8].sum()
grouped = grouped[grouped.index < 8]
grouped[8] = over_8
grouped = grouped.sort_index()
day_labels  = [str(i) if i < 8 else '8+' for i in grouped.index]
bar_colors3 = [RED if i == 1 else CYAN for i in grouped.index]
bars3 = ax3.bar(day_labels, grouped.values, color=bar_colors3,
                alpha=0.9, edgecolor='none', width=0.7)
for bar, val in zip(bars3, grouped.values):
    ax3.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 2, str(val),
             ha='center', va='bottom',
             color=TEXT, fontsize=8, fontweight='bold')
ax3.set_xlabel('Active Days During Trial',
               color=MUTED, fontsize=9)
ax3.set_ylabel('Number of Organisations',
               color=MUTED, fontsize=9)
ax3.set_title('Active Days Distribution\n'
              '621 orgs (64%) active for only 1 day',
              color=CYAN, fontsize=11, fontweight='bold', pad=10)
ax3.grid(axis='y', alpha=0.3)
ax3.spines[:].set_color(GRID)

# Monthly cohort
ax4 = axes[1, 1]
ax4.set_facecolor(SURFACE)
months      = ['January 2024','February 2024','March 2024']
conv_rates_m= [23.0, 22.8, 18.2]
total_m     = [305, 347, 314]
bar_colors4 = [MINT, MINT, RED]
bars4 = ax4.bar(months, conv_rates_m, color=bar_colors4,
                alpha=0.9, edgecolor='none', width=0.6)
for bar, val, tot in zip(bars4, conv_rates_m, total_m):
    ax4.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 0.2,
             f'{val}%\n(n={tot})',
             ha='center', va='bottom',
             color=TEXT, fontsize=9, fontweight='bold')
ax4.axhline(21.3, color=AMBER, linewidth=2,
            linestyle='--', label='Baseline 21.3%')
ax4.set_ylabel('Conversion Rate (%)', color=MUTED, fontsize=9)
ax4.set_title('Conversion Rate by Trial Cohort\n'
              'March cohort converts 4.6pp below baseline',
              color=CYAN, fontsize=11, fontweight='bold', pad=10)
ax4.set_ylim(0, 35)
ax4.legend(fontsize=9)
ax4.grid(axis='y', alpha=0.3)
ax4.spines[:].set_color(GRID)
ax4.tick_params(axis='x', labelsize=8)

plt.tight_layout()
plt.savefig('assets/07_product_metrics.png',
            dpi=150, bbox_inches='tight', facecolor=BG)
plt.show()

Product Metrics Findings

Finding: The Platform is Being Used as a Scheduling Tool Only 98.8% of trialling organisations use the Scheduling module. Only 22.8% use Approvals, 21.8% use Time Tracking, and 15.0% use Communications. Three entire product modules, representing significant development investment, are invisible to 75-85% of trialists. The platform's value proposition extends well beyond scheduling, but trialists are never discovering it.
Finding: 64% of Organisations are Active for Only One Day 621 of 966 trialling organisations interacted with the platform on exactly one day and never returned. Despite this, 21.1% of them still converted, almost identical to the overall rate. This suggests organisations are making their conversion decision based on something other than sustained engagement: possibly a sales conversation, a demo, or the trial sign-up experience itself.
Finding: March 2024 Cohort is Underperforming The March 2024 cohort converts at 18.2%, which is 3.1 percentage points below the 21.3% baseline and 4.6–4.8 points below the January and February cohorts. This warrants immediate investigation. Possible causes include a product change, an onboarding process change, a shift in the source of sign-ups, or seasonal factors. Without investigation, the product team risks a declining conversion trend going undetected.

Executive Recommendations

Based on the full analysis, five strategic recommendations are addressed to the Splendor Analytics product team:

Recommendation 1: Investigate the March 2024 Cohort Immediately
A 4.6 percentage point drop in conversion rate in a single month is a material signal. The product team should identify what changed in March, whether in the product, the onboarding flow, or the sign-up source, before the trend compounds.

Recommendation 2: Redesign Onboarding to Surface Hidden Modules
With 75-85% of trialists never discovering Time Tracking, Approvals, or Communications, the onboarding experience is failing to demonstrate the platform’s full value. A guided activation flow that introduces each module within the first 7 days could dramatically increase Goal 2 (Multi-Module) completion from its current 35.1%.

Recommendation 3: Shift Intervention Focus to Post-Trial
49% of conversions happen after the trial ends. The product team’s greatest conversion opportunity is not inside the trial. It is in the days immediately following trial expiry. A structured post-trial re-engagement sequence (email, sales call, extended trial offer) targeted at engaged non-converters could move this number significantly.

Recommendation 4: Track Trial Activation as a North Star Metric
The three trial goals defined in this analysis should be instrumented in the product’s analytics stack using the trial_activation mart table. Activation rate (currently 24.3%) becomes the primary leading indicator of conversion health, which is more actionable than raw conversion rate because it is measurable within the trial window.

Recommendation 5: Run A/B Tests on Goal-Driving Interventions
Because no single activity was statistically significant in predicting conversion, the trial goals should be treated as hypotheses to be tested rather than proven levers. Controlled experiments, such as in-app prompts designed to drive Goal 1 completion, or onboarding emails targeting Goal 2, will produce the causal evidence needed to refine the activation definition over time.


Analytical Integrity Statement

This report was produced with full transparency about the limitations of the available data. The core finding, that no individual activity significantly predicts conversion, was confirmed by four independent analytical methods and is reported honestly rather than obscured by selective presentation.

The trial goals defined here are evidence-informed hypotheses grounded in product logic and mild data signals. They represent the best available proxy for activation given the current dataset, and should be refined as richer data becomes available, including sales interaction records, company-level attributes, and support contact history.

All code, data, and outputs are fully reproducible from the public repository.