Abstract: Temporal difference (TD) learning family tries to learn a least-squares solution of an approximate Linear value function (LVF) to deal with large scale and/or continuous reinforcement ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results