In this study, we elaborate the procedure of designing a supervisory controller for the Autonomous Transit on Demand Vehicle (ATODV) system. Reinforcement learning is implemented to reduce the mean waiting time of the passengers, and a cost function is introduced to penalize the energy consumption of the electric vehicles. A stochastic simulation environment for an ATODV pilot project is coded in the Python environment to train the autonomous cart decision process as agents with artificial intelligence. Passenger group behavior, get-on and getoff times, destinations are modeled as random variables. A single Deep Q-Learning Network is trained subject to multi-agent settings. The ATODV system's independent decision making for the carts to reduce the passenger's waiting time while constraining the energy consumption and empty vehicle motion is evaluated.