Vision sensors are increasingly being used in the implementation of Simultaneous Localization and Mapping (SLAM). Even though the mathematical framework of SLAM is well understood, considerable issues remain to be resolved when a particular sensing modality is considered. For instance, the observation model of a small baseline stereo camera is known to be highly nonlinear. As a consequence, state estimations obtained from standard recursive estimators, such as the Extended Kalman Filter, tend to be inconsistent. Further, vision-based approaches are plagued with high feature densities, and the consequent requisite of maintaining large feature databases for loop closure and data association. This paper proposes a two-tier solution for resolving these issues, inspired by the mechanics of human navigation. The proposed two-tier solution addresses the consistency issue by formulating the SLAM problem as a nonlinear batch optimization and presents a novel method for feature management through a two-tier map representation. Simulations and experiments are carried out in an office-like environment to validate the performance of the algorithm.