Imperfect detection can bias estimates of site occupancy in ecological surveys but can be corrected by estimating detection probability. Time-to-first-detection (TTD) occupancy models have been proposed as a cost–effective survey method that allows detection probability to be estimated from single site visits. Nevertheless, few studies have validated the performance of occupancy-detection models by creating a situation where occupancy is known, and model outputs can be compared with the truth. We tested the performance of TTD occupancy models in the face of detection heterogeneity using an experiment based on standard survey methods to monitor koala Phascolarctos cinereus populations in Australia. Known numbers of koala faecal pellets were placed under trees, and observers, uninformed as to which trees had pellets under them, carried out a TTD survey. We fitted five TTD occupancy models to the survey data, each making different assumptions about detectability, to evaluate how well each estimated the true occupancy status. Relative to the truth, all five models produced strongly biased estimates, overestimating detection probability and underestimating the number of occupied trees. Despite this, goodness-of-fit tests indicated that some models fitted the data well, with no evidence of model misfit. Hence, TTD occupancy models that appear to perform well with respect to the available data may be performing poorly. The reason for poor model performance was unaccounted for heterogeneity in detection probability, which is known to bias occupancy-detection models. This poses a problem because unaccounted for heterogeneity could not be detected using goodness-of-fit tests and was only revealed because we knew the experimentally determined outcome. A challenge for occupancy-detection models is to find ways to identify and mitigate the impacts of unobserved heterogeneity, which could unknowingly bias many models.