As machine learning models grow much larger nowadays, recent research found thatadvances to improve accuracy might not be able to make neural networks applicable to all
situations due to size and speed constraints. To make machine learning more applicable to
all real-world applications, there is a need to obtain a small model size and faster inference
speed. There are many explicit information and hidden data dependent distributions in
the underlying data mining and machine learning problems. However, past research often
focused on model parameters directly without considering the contextual information in the
underlying problem. In this dissertation, we demonstrate how we can obtain a much more
efficient machine learning systems via leveraging data dependent information. Specifically,
we will show how both explicit and implicit data dependent information can be combined
with many existing methods to obtain a much smaller model size and faster inference time.
In addition, this data dependent information is ubiquitous and we can find it in many
applications such as data mining, natural language processing, information retrieval and
recommender system problems.