Excess alcohol use is an important determinant of death and disability. Machine learning (ML)-driven interventions leveraging smart-breathalyzer data may help reduce these harms. We developed a digital phenotype of long-term smart-breathalyzer behavior to predict individuals' breath alcohol concentration (BrAC) levels trained on data from a smart breathalyzer. We analyzed roughly one million datapoints from 33,452 users of a commercial smart-breathalyzer device, collected between 2013 and 2017. For validation, we analyzed the associations between state-level observed smart-breathalyzer BrAC levels and impaired-driving motor vehicle death rates. Behavioral, geolocation-based, and time-series-derived features were fed to an ML algorithm using training (70% of the cohort), development (10% of the cohort), and test (20% of the cohort) sets to predict the likelihood of a BrAC exceeding the legal driving limit (0.08 g/dL). States with higher average BrAC levels had significantly higher alcohol-related driving death rates, adjusted for the number of users per state B (SE) = 91.38 (15.16), p < 0.01. In the independent test set, the ML algorithm predicted the likelihood of a given user-initiated BrAC sample exceeding BrAC ≥ 0.08 g/dL, with an area under the curve (AUC) of 85%. Highly predictive features included users' prior BrAC trends, subjective estimation of their BrAC (or AUC = 82% without the self-estimate), engagement and self-monitoring, time since the last measure, and hour of the day. In conclusion, an ML algorithm successfully quantified a digital phenotype of behavior, predicting naturalistic BrAC levels exceeding 0.08 g/dL (a threshold associated with alcohol-related harm) with good discrimination capability. This result establishes a foundation for future research on precision behavioral medicine digital health interventions using smart breathalyzers and passive monitoring approaches.