Ajay Halthor. Foundation of Q-learning | Temporal Difference Learning explained!
Steve Brunton. Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning.
--- <!-- _paginate: false
--- <style scoped> h1 { /* text-align: center; */ color: #ffffff } h3 { /* text-align: center; */ color: #dddddd } </style> ![bg](styles/bg_inteli_01.png) ### Reflexão # Os juros do conhecimento
maybe events that happened more recently are somehow related to the rewards im getting
--- TD-$\lambda$
SARSA - on-policy - always doing what you think is the best think (more exploitation) - more cumulative reward during learning process you need to take trajections on the environment - aprender com experiência
Q-learning pode aprender por imitação pq é off-policy experience replay can explore more epsilon-greedy - off-policy search strategies