The livestock industry plays an important role in the global food chain and provides the main source of protein for human consumption. Pork production provides more than one- third of total meat protein worldwide. There is a gap between the amount of data available in the swine industry and its effective use in analytical models and the decision making process of farm management. This work tries to fill this gap by building a data-driven decision framework. This framework allows for risk-based and early intervention in the swine industry which mitigates the overall cost.
First, we focus on the most challenging and costly viral infectious diseases impacting the swine industry called the Porcine Reproductive and Respiratory Syndrome (PRRS). We build a framework to forecast the risk of having a PRRS outbreak on a farm. This forecasting allows for early detection of disease outbreaks and could direct risk-based, and thus more cost-effective, interventions. Machine learning algorithms were trained using multi-scale data (pig group-, farm-, and area-level data). For the first time, on-farm, between-farm, and environmental variables, including farm location, pig movements, production parameters, diagnostic data, and climatic information, were combined for the prediction of PRRS outbreaks. Multi-scale datasets were merged via feature creation, followed by the wrapper and filter feature selection, to find those feature subsets with the best forecasting performance. The predictive value of each features selection mechanism was evaluated in terms of its stability. Numerical results demonstrate good forecasting performance in terms of area under the ROC curve.
Furthermore, we leverage a semi-supervised variational auto-encoder (VAE) deploying Long Short Term Memory (LSTM) to predict the mortality rates (mummified and stillborn) and farrowing rate in the production system. The PRRS can be one of the underlying mortality factors. The use of VAE allows for handling the missing data by building a probabilistic model. We learn the target variable by learning a latent representation using the generative model for samples with unobserved target value, and then learning a generative semi-supervised model, using this representation instead of the raw data.
Finally, a factorized generative model is applied based on fine grained semi-synthatic data for the study of PRRS virus. Using this model, we can predict the PRRS outbreak in all farms of a swine production system by capturing the spatio-temporal dynamics of infection transmission based on the intra-farm pig-level virus transmission dynamics, and inter-farm pig shipment network. We simulate a PRRS infection epidemic based on the shipment network and the SEIR epidemic model using the statistics extracted from real data provided by the swine industry. We develop a hierarchical factorized deep generative model that approximates high dimensional data by a product between time-dependent weights and spatially dependent low dimensional factors to perform per farm time series prediction. The prediction results demonstrate the ability of the model in forecasting the virus spread progression with average error of NRMSE = 2.5%.