Optimising reinforcement learning for neural networks

Evan Hurwitz, Tshilidzi Marwala

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

Reinforcement learning traditionally utilises binary encoders and/or linear function approximators to accomplish its Artificial Intelligence goals. The use of nonlinear function approximators such as neural networks is often shunned, due to excessive difficulties in implementation, usually resulting from stability issues. In this paper the implementation of reinforcement learning for training a neural network is examined, being applied to the problem of learning to play Tic Tac Toe. Methods of ensuring stability are examined, and differing training methodologies are compared in order to optimise the reinforcement learning of the system. TD(1) methods are compared with database methods, as well as a hybridised system that combines the two, which outperforms all of the homogenous systems.

Original languageEnglish
Title of host publication6th International Conference on Intelligent Games and Simulation, GAME-ON 2005
Pages13-18
Number of pages6
Publication statusPublished - 2005
Externally publishedYes
Event6th International Conference on Intelligent Games and Simulation, GAME-ON 2005 - Leicester, United Kingdom
Duration: 24 Nov 200525 Nov 2005

Publication series

Name6th International Conference on Intelligent Games and Simulation, GAME-ON 2005

Conference

Conference6th International Conference on Intelligent Games and Simulation, GAME-ON 2005
Country/TerritoryUnited Kingdom
CityLeicester
Period24/11/0525/11/05

Keywords

  • Difference
  • Learning
  • Network
  • Neural
  • Reinforcement
  • Temporal
  • Tic-tac-toe

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Optimising reinforcement learning for neural networks'. Together they form a unique fingerprint.

Cite this