TRAINING A POLICY NEURAL NETWORK AND A VALUE NEURAL NETWORK

Number of patents in Portfolio can not be more than 2000

United States of America Patent

SERIAL NO

15280711

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

Methods, systems and apparatus, including computer programs encoded on computer storage media, for training a value neural network that is configured to receive an observation characterizing a state of an environment being interacted with by an agent and to process the observation in accordance with parameters of the value neural network to generate a value score. One of the systems performs operations that include training a supervised learning policy neural network; initializing initial values of parameters of a reinforcement learning policy neural network having a same architecture as the supervised learning policy network to the trained values of the parameters of the supervised learning policy neural network; training the reinforcement learning policy neural network on second training data; and training the value neural network to generate a value score for the state of the environment that represents a predicted long-term reward resulting from the environment being in the state.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
DEEPMIND TECHNOLOGIES LIMITEDLONDON EC4A 3TW

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Graepel, Thore Kurt Hartwig Cambridge, GB 11 265
Guez, Arthur Clement London, GB 6 63
Huang, Shih-Chieh London, GB 51 406
Maddison, Christopher Toronto, CA 5 77
Sifre, Laurent Paris, FR 12 73
Silver, David Hitchin, GB 69 1440
Sutskever, Ilya San Francisco, US 44 522

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation