Stress has a significant negative impact on people, which has made it a primary social concern. Early stress detection is essential for effective stress management. This study proposes a Deep Learning (DL) method for effective stress detection using multimodal physiological signals - Electrocardiogram (ECG) and Electrodermal activity (EDA). The extensive latent feature representation of DL models has yet to be fully explored. Hence, this paper proposes a hierarchical AutoEncoder (AE) feature fusion on the frequency domain. The latent representations from different layers of the autoencoder are combined and given as input to the classifier - Convolutional Recurrent Neural Network with Squeeze and Excitation (CRNN-SE) model. A two-set performance comparison is performed (romannum 1) performance on frequency band features, and raw data are compared. (romannum 2) autoencoders trained on three cost functions - Mean Squared Error (MSE), Kullback-Leibler (KL) divergence, and Cosine similarity performance are compared on frequency band features and raw data. To verify the generalizability of our approach, we tested it on four benchmark datasets- WAUC, CLAS, MAUS and ASCERTAIN. Results show that frequency band features showed better results than raw data by 4-8%, respectively. MSE loss produced better results than other losses for both frequency band features and raw data by 3-7%, respectively. The proposed approach considerably outperforms existing stress detection models that are subject-independent by 1-2%, respectively.