The standard CRO programme starts strong. Month one diagnostic audit, hypothesis backlog, three tests live by week six. By month four, the agency is running the same template tests they ran for last quarter's client (sticky CTA, sticky add-to-cart, reduce checkout fields, swap the hero image), and the wins are flatlining. By month six, you are paying for tests that nobody is excited about, the agency is sending you a Looker Studio dashboard nobody opens, and you start wondering whether CRO is even working for your business.
The fix is hypothesis discipline plus continuous diagnostic work. Run the template tests once (they will produce wins on most accounts in the first month) and then move on. The compounding tests are the ones that come from real research: session recordings of buyers in your specific category, support ticket data, sales call recordings, post-purchase surveys, competitor PDP teardowns. We feed that diagnostic work back into the hypothesis backlog continuously, rather than running the same playbook on autopilot.
The other failure mode is over-testing without enough traffic. CRO on a page getting 800 visits a month is mostly theatre; you cannot reach significance in a sensible window. We are direct about minimum traffic thresholds in the audit. If your page-level traffic is below 5,000 monthly visits, the test cadence will be slower and the recommendations will lean toward qualitative work plus structural changes rather than statistical A/B tests.