SARSA - on-policy - always doing what you think is the best think (more exploitation) - more cumulative reward during learning process you need to take trajections on the environment - aprender com experiência