Tone Quality — Experiment 1A

Hypothesis

If we improve the tone prompts using each user’s LinkedIn writing history, then we will increase average tone match score from 5.26 to 8 out of 10.

Signal

Tone baseline analyses run on 10 users on March 9 showed scores ranging from 2.1 to 6.9 out of 10, averaging 5.26/10. AI-generated content is measurably diverging from how these users actually write.

Belief

The current prompts lack sufficient personalization signal. Each user’s existing LinkedIn post history contains enough stylistic data to close the gap.

Test

Same 10-user cohort. The improved tone prompt — trained on each user’s LinkedIn post history — runs live for 14 days. All 10 users are re-scored using the same methodology as the March 9 baseline. A one-week check-in reviews directional movement before the final read.

Start Mar 16, 2026

Prompt deployed & live

Check-in Mar 23, 2026

Monday meeting — directional read

End Mar 30, 2026

Final re-score & decision

Measure

Tracked in tone_delta_experiments — same scoring model, same 10-user cohort as March 9.

Baseline (Mar 9) 5.26

avg / 10 · range: 2.1–6.9

Target (Mar 30) 8.0

avg / 10 · +52% lift

User	Baseline Score	Posts Analyzed	LinkedIn Posts (Total)
Kamil Rextin	2.1/10	9	463
Daryl Driedger	3.9/10	11	69
Alejandra Céspedes	4.8/10	4	15 △
Matt Dorman	4.8/10	6	223
Eric Voyer	5.7/10	6	105
Olanike A. Mensah	5.9/10	10	0 △
Meeky Hwang	6.2/10	16	165
Rameez Faheem	6.5/10	6	12 △
Bill Wilson	6.8/10	5	710
Renée Cormier	6.9/10	10	422
Cohort Average	5.26/10	—	—

△ Flagged — thin or missing LinkedIn data. Scores for these users should be read with lower confidence.

Decide

Hit 8/10	Roll improved prompts out to all content module users.
Miss 8/10	Identify which users diverged furthest and determine whether the gap is a data problem (insufficient post history) or a model problem.

Readiness — Not Go

Blocker Improved prompt has not been deployed

All 10 records in tone_delta_experiments still show generatedAt: 2026-03-09. No re-scoring has run. The agents collection shows zero configuration changes since March 9. The improved prompt exists as a stored field in the database but has not been applied to the live system. The experiment cannot start until the prompt is deployed.
Action → Sanjeevi: deploy the improved tone prompt to the live system before March 16.
Issue 2 One cohort user is inactive

cowlickrocks@gmail.com (Daryl Driedger) shows zero content ideas and zero posts in the last 30 days. Every other cohort member has generated 5–21 ideas in that window. A user who is not generating content under the new prompt cannot be re-scored, reducing the effective cohort to 9.
Action → Tukan / Sanjeevi: decide whether to replace this user or carry them. Decision needed before launch.
Issue 3 — Caveat Two users have thin training data

Alejandra Céspedes has 15 LinkedIn posts in the system (4 analyzed). Rameez Faheem has 12 (6 analyzed). Both are on the lower end of the score range. Re-scored results for these two will carry lower confidence than the rest of the cohort. Not a blocker, but should be noted on the final read.
Note → Flag these two users explicitly when interpreting the March 30 results.