You need MULTIPLE tests on each bike for a valid comparison. One test on each does not establish a trend.
I would ignore the variability of weather factors as it's too difficult to control. Just perform a minimum of three tests on each bike, not necessarily in order, and you'll have more conclusive evidence.
|