Customer lifetime value (CLV) is an essential measure to determine the level of profitability of a customer to a firm. Customer relationship management treats CLV as the most significant factor for measuring the level of purchases and, consequently, the profitability of a given customer. This motivates researchers to compete in developing models to maximize the value of CLV. Dynamic programming models in general—and the Q-learning model specifically—play a significant role in this area of research as a model-free algorithm. This maximizes the long-term future rewards of a certain agent, given their current state, set of possible actions, and the next state of that agent, assuming the customer represents the agent and CLV is their future reward. However, due to the stochastic nature of this problem, it is inaccurate to obtain a single crisp value for Q. In this paper, fuzzy logic and neutrosophic logic shall be utilized to search for the membership values of Q to capture the stochasticity and uncertainty of the problem. Both fuzzy Q-learning and neutrosophic Qlearning were implemented using two membership functions (i.e., trapezoidal, and triangular) to search for the optimal Q value that maximizes the customer's future rewards. The proposed algorithms were applied to two benchmark datasets: The Knowledge Discovery and Data Mining (KDD) cup 1998 direct mailing campaign dataset and the other from Kaggle, related to direct mailing campaigns. The proposed algorithms proved their effectiveness and superiority when comparing them to each other or the traditional deep Q-learning models.