Abstract Background There is a global need to assess physicians' professional performance in actual clinical practice. Valid and reliable instruments are necessary to support these efforts. This study focuses on the reliability and validity, the influences of some sociodemographic biasing factors, associations between self and other evaluations, and the number of evaluations needed for reliable assessment of a physician based on the three instruments used for the multisource assessment of physicians' professional performance in the Netherlands. Methods This observational validation study of three instruments underlying multisource feedback (MSF) was set in 26 non-academic hospitals in the Netherlands. In total, 146 hospital-based physicians took part in the study. Each physician's professional performance was assessed by peers (physician colleagues), co-workers (including nurses, secretary assistants and other healthcare professionals) and patients. Physicians also completed a self-evaluation. Ratings of 864 peers, 894 co-workers and 1960 patients on MSF were available. We used principal components analysis and methods of classical test theory to evaluate the factor structure, reliability and validity of instruments. We used Pearson's correlation coefficient and linear mixed models to address other objectives. Results The peer, co-worker and patient instruments respectively had six factors, three factors and one factor with high internal consistencies (Cronbach's alpha 0.95 - 0.96). It appeared that only 2 percent of variance in the mean ratings could be attributed to biasing factors. Self-ratings were not correlated with peer, co-worker or patient ratings. However, ratings of peers, co-workers and patients were correlated. Five peer evaluations, five co-worker evaluations and 11 patient evaluations are required to achieve reliable results (reliability coefficient ≥ 0.70). Conclusions The study demonstrated that the three MSF instruments produced reliable and valid data for evaluating physicians' professional performance in the Netherlands. Scores from peers, co-workers and patients were not correlated with self-evaluations. Future research should examine improvement of performance when using MSF.