I love this question. It gets to the heart of a lot of the successes and failures of instanced PvP in WoW.
In my years of twinking, I'd say the sweet spot is 15-25% stronger. That's strong enough for twinks to stand out on the field, but not so strong as to let curious players just blame gear and call it a day. Let's add some nuance to this 15-25% range.
I assume levelers wear some BoAs with enchantments, and twinks arrive decently but not perfectly geared. That should provide for a 15-20% power difference between levelers and twinks. There's no bottom to how badly a leveler can gear (down to naked, I suppose), but I dismiss those for a number of reasons we'll get into momentarily. On the other end, the difference between a decent twink and a BiS twink should yield about 5%. That's enough for twinks to notice (and respect the grind), but not enough that a leveler would see a major difference. Put that together, and we get our 15-25% range.
The vast majority of twinks, myself included, play only slightly better than levelers. Likewise, the vast majority of twinks get as much satisfaction out of building their characters as they do playing them, to the point that many players will see their playtime on a character precipitously drop after a particular twink finishes gearing. When the power differential drops below 15%, player motivation to build twinks starts to dissipate, which is why Legion's stat templates with its 2-3% power level difference (based solely on item level) drove away so many twinks.
When the power differential exceeds 25% and twink power really pulls away from leveler power, levelers dismiss both the twinks and the game, and (rightly) blame Blizzard for terrible balancing. New twinks don't just appear out of thin air. Leveling players need to see a twink in action such that the gear difference makes an impact, but still feels within reach. To be sure, most levelers will blame gear, as they fumble through a BG waiting for the dungeon deserter debuff to wear off. But some levelers really do enjoy PvP, and losing a closer match by virtue of gear can get that leveler to start wondering what would possess a player to spend time on such a thing.
Two major factors impact the sweet spot of this 15-25% power differential.
First, players with atypical (i.e. terrible) gear join for a host of reasons. They may be new to the game, or on a new account, or delusional that their abilities can make up for the gearing difference. This isn't unique to instanced PvP, but Blizzard retains an opportunity to do better matchmaking. Blizzard doesn't take that opportunity, however, for fear of longer queue times. Ask a F2P or vet if they've seen the same leveler working their way through a bracket an hour or two. We all have. The only way it's possible to see that is for a remarkably small percentage of WoW's playerbase to play in leveling battlegrounds. Of the millions who play WoW, not enough levelers play at any given time to prevent repeat meetings in battlegrounds with the same leveler? Instanced PvP is small, folks. To fix the much wider gulf between terribly geared players and twinks, Blizzard could implement a stats floor. Did you join PvP in your pajamas? You're 30% weaker than a top-end twink. Do you have full BoAs with enchantments? Now you're only 15-20% weaker, which means you're 5-10% stronger than Dr. Pajamas. That's noticeable, but small enough to contain some of the wild mismatches that can appear in leveling battlegrounds. Scaling everyone to the far end of a bracket helps a little, but it's not enough -- it mostly just covers up the need for hit rating that "no longer" (but still very much) exists.
Second, the damage-to-health ratio modulates the power differential. When the ratio is high, like in Cata, then a 15-25% power differential means the difference between getting one-shot vs. two-shot i.e. it plays effectively as a 100% differential. In that case, the power differential needs to move down. When the ratio is low, like in TBC, then a 15-25% power differential feels like a lot of work for little impact. Back then, twinks could attain a 50% difference over nontwinks, and it acted like the 25% we see in the 20-29 bracket today. Blizzard used damage reduction i.e. resilience to help with this for a few years, but that turned into an untenable mess, so Blizzard dumped it and didn't return to it until they changed spirit into versatility.
TL;DR: 15-25% is, in my anecdotal experience, the sweet spot for optimizing twink enjoyment, and getting other PvPers to consider twinking.