Although intelligent load forecasting is essential for optimal energy management (EM) in smart cities, there is a lack of current research exploring EM in well-regulated Internet-of-Things (IoT) networks. This article develops a new deep learning (DL) model for efficient forecasting of short-term energy consumption while maintaining effective communication between energy providers and users. The proposed Energy-Net stack comprises multiple stacked spatiotemporal modules, where each module consists of a temporal transformer (TT) submodule and a spatial transformer (ST) submodule. The TT models the temporal relationships in load data; and the ST submodule extracts hidden spatial information by integrating convolutional layers and includes an improved self-attention mechanism. The experimental evaluation on IHPEC and independent system operator New England (ISO-NE) data set demonstrates the superiority of Energy-Net over recent cutting-edge DL models with root mean-square error (RMSE) of 0.354 and 0.535, respectively. The computational complexity of Energy-Net is appropriate for dependable resource-constrained IoT devices (i.e., fog nodes or edge nodes) linked to a joint IoT-cloud server that interacts with connected smart grids to handle EM tasks.