Steve Brunton. Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning.
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. ArXiv preprint arXiv:1312.5602 (2013)
SARSA - on-policy - always doing what you think is the best think (more exploitation) - more cumulative reward during learning process you need to take trajections on the environment - aprender com experiência