Visible Light Communication (VLC) has been widely investigated during the last decade due to its ability to provide high data rates with low power consumption. In general, resource management is an important issue in cellular networks that can highly effect their performance. In this paper, an optimisation problem is formulated to assign each user to an optimal access point and a wavelength at a given time. This problem can be solved using mixed integer linear programming (MILP). However, using MILP is not considered a practical solution due to its complexity and memory requirements. In addition, accurate information must be provided to perform the resource allocation. Therefore, the optimisation problem is reformulated using reinforcement learning (RL), which has recently received tremendous interest due to its ability to interact with any environment without prior knowledge. In this paper, the resource allocation optimisation problem in VLC systems is investigated using the basic Q-learning algorithm. Two scenarios are simulated to compare the results with the previously proposed MILP model. The results demonstrate the ability of the Q-learning algorithm to provide optimal solutions close to the MILP model without prior knowledge of the system.