I wonder how we can get them to sleep more.

This simple thought, expressed by my wife, not even a question, became a challenge to me.

Luckily, I knew the perfect system for testing out some ideas in a controlled and measurable setting.

And with twins, testing would be even easier.

Welcome to parenting, A/B testing style.

This post originally appeared onDad on the Run.

A/B testing is used all over the web.

You likely encounter it dozens if not hundreds of times a day without even noticing it.

Google is famous for testing 41 shades of blue for search results.

Facebook tests different experiences within the feed constantly.

Amazon even changes around the buy buttons and cart layouts fairly often.

A/B testing is used to test one or more treatments or experiments over a control or the existing experience.

Here a complicated knowledge of statistics is needed.

Or the use of any of many powerful testing tools available.

At Audible and Amazon, we test experiences like this all the time.

In any experiment, accurate measurement and data tracking are critical.

Often a success metric is chosen due to the availability of data or measurement ability.

Luckily measuring sleep is about as easy as it gets.

When they wake up at night, we just write it down.

Weve gone through several notebooks already, but its so easy to track.

For this, we even started importing the data into a spreadsheet to see the impact more visually.

First we tested increasing the amount given at the feeding immediately before bedtime.

Instead of the normal four ounces, we tried five, then six.

While one child had a larger evening feeding, the other would stay at four ounces.

The result: inconclusive.

Both children seemed to start increasing length of sleep anyway during this period.

They both slept almost the exact same length of time as well.

Okay maybe it isnt that much of a secret, but it took us a while to try it.

The length of sleep was not impacted much though.

After gripe water, which became the new control, we tested an extra feeding before bed.

However, it seemed like an opportunity ripe for testing, so we gave it a shot.

We did this feeding about 1.5 to 2 hours after the previous, compared with 3 hours normally.

In this feeding we tried 4 ounces compared with the 45 they normally take during daytime feedings.

Sometimes they would refuse to take more than 3.

Of all the experiments, this seemed to work best.

Its important to capture both the adjustment period results and the post-adjustment ones though.

Apple has famously neglected the adjustment period on several product launches, notably maps.

Last, we tested keeping them awake longer during the day.

Our hypothesis was that they would therefore be more tired at night and would sleep longer as a result.

The lesson for testing: dont sacrifice other metrics for a small gain in one.

Many of these tests were inconclusive.

This is largely due to the same size.

With twins, its hard to know what is a real result and what is personality or natural progression.

for more accurately test, we may need to increase the sample size.

Triplets would come in handy for this.

Maybe just someone elses triplets though, we are definitely not ready for that!

This also shows the importance of the test-measure-iterate process.

Though several of the methods didnt show large improvements, put together they may.

By using the treatment as the control when it outperforms the control, small improvements get stacked.

You dont need to move the mountain, just move little handfuls of dirt over a long time.

With this approach to parenting, the boys can continuously grow as well.

And with luck, so will our sanity, well-being, and lives as parents.

The highs and lows, tears and smiles of being a dad.

Top photo byGiu Vicentevia Unsplash.