Rich feature extraction is essential to train a good machine learning (ML) framework. These features are generally extracted separately from each modality. We hypothesize that richer features can be learned when modalities are jointly explored. These joint modality features can perform better than those extracted from individual modalities. We study two modalities, physiological signals-Electrodermal activity (EDA) and electrocardiogram (ECG) to investigate this hypothesis. We investigate our hypothesis to achieve three objectives for subject-independent stress detection. For the first time in the literature, we apply our proposed framework in the frequency domain. The frequency-domain decomposition of the signal effectively separates it into periodic and aperiodic components. We can correlate their behaviour by focusing on each band of the signal spectrum. Second, we show that our framework outperforms late fusion, early fusion and other notable works in the field. Finally, we validate our approach on four benchmark datasets to show its generalization ability.