TRANSFORMER NETWORK WITH NORMALIZATION INCLUDING SCALING PARAMETER

Number of patents in Portfolio can not be more than 2000

United States of America Patent

APP PUB NO 20240320482A1
SERIAL NO

18176037

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A computing system is provided, including a processor configured to receive a training data set. Based at least in part on the training data set, the processor is further configured to train a transformer network that includes a plurality of layers. The plurality of layers each respectively include a plurality of sub-layers including an attention sub-layer, a feed-forward sub-layer, and a plurality of normalization sub-layers. The plurality of normalization sub-layers are downstream from corresponding sub-layers of the plurality of sub-layers. Each of the plurality of normalization sub-layers is configured to apply layer normalization to a sum of: a first scaling parameter multiplied by an input vector of the sub-layer; and an output vector of the sub-layer.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
MICROSOFT TECHNOLOGY LICENSING LLCONE MICROSOFT WAY REDMOND WA 98052

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
DONG, Li Beijing, CN 51 190
HUANG, Shaohan Beijing, CN 10 37
MA, Shuming Beijing, CN 1 0
WANG, Hongyu Beijing, CN 165 2468
WEI, Furu Beijing, CN 23 568
ZHANG, Dongdong Beijing, CN 73 156

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation