DISTRIBUTIONAL REINFORCEMENT LEARNING USING QUANTILE FUNCTION NEURAL NETWORKS

Number of patents in Portfolio can not be more than 2000

United States of America

APP PUB NO 20250117654A1
SERIAL NO

18981840

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method comprises: receiving a current observation; for each action of a plurality of actions: randomly sampling one or more probability values; for each probability value: processing the action, the current observation, and the probability value using a quantile function network to generate an estimated quantile value for the probability value with respect to a probability distribution over possible returns that would result from the agent performing the action in response to the current observation; determining a measure of central tendency of the one or more estimated quantile values; and selecting an action to be performed by the agent in response to the current observation using the measures of central tendency for the actions.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
DEEPMIND TECHNOLOGIES LIMITED5 NEW STREET SQUARE LONDON EC4A 3TW

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Dabney, William Clinton London, GB 11 85
Ostrovski, Georg London, GB 5 2

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation