BACKGROUND: Supragastric belching (SGB) and rumination are behavioral disorders associated with proton pump inhibitor (PPI) non-response and can be diagnosed using multichannel intraluminal impedance-pH (MII-pH) and post-prandial high-resolution impedance manometry (PPHRIM). This pilot study compared diagnostic yield and inter-rater agreement for SGB and rumination using MII-pH and PPHRIM. METHODS: Three esophageal physiologists performed blinded interpretations of MII-pH and PPHRIM in 22 PPI non-responders. Raters selected from 4 diagnostic impressions (normal, GERD, behavioral disorders, GERD+behavioral disorders) without clinical context. Primary outcomes were diagnostic impressions compared against clinical gold standard impression, between raters, and between test modalities. Following a 28-month wash-out period, raters re-interpreted MII-pH with clinical context and under consensus definition of diagnostic criteria. KEY RESULTS: Compared to gold standard, rater accuracy for presence of behavioral disorders ranged from 45 to 77% on MII-pH and 45-59% on PPHRIM. On MII-pH, inter-rater agreement was fair for diagnosis (ĸ0.32, p < 0.01) and suboptimal for presence of behavioral disorders (ĸ0.13, p = 0.14). On PPHRIM, inter-rater agreement was suboptimal for both diagnosis (ĸ0.03, p = 0.34) and presence of a behavioral disorder (ĸ-0.22, p = 0.96). Inter-rater agreement improved in post hoc MII-pH interpretations. Rumination was more frequently identified on PPHRIM (23, 35%) compared to MII-pH (7, 11%). CONCLUSIONS AND INFERENCES: Diagnostic accuracy and inter-rater agreement are higher for MII-pH than PPHRIM, and behavioral disorders are more frequently identified on PPHRIM. Identifying behavioral disorders on MII-pH and PPHRIM has implications for clinical evaluation of PPI non-response; clinical context is essential for accurate study interpretation. Further work is needed to standardize definitions and interpretations.