We propose a subject-aware contrastive learning deep fusion neural network framework for effectively classifying subjects confidence levels in the perception of visual stimuli. The framework, called WaveFusion, is composed of lightweight convolutional neural networks for per-lead time-frequency analysis and an attention network for integrating the lightweight modalities for final prediction. To facilitate the training of WaveFusion, we incorporate a subject-aware contrastive learning approach by taking advantage of the heterogeneity within a multi-subject electroencephalogram dataset to boost representation learning and classification accuracy. The WaveFusion framework demonstrates high accuracy in classifying confidence levels by achieving a classification accuracy of 95.7% while also identifying influential brain regions.