MULTI-MODAL PRE-TRAINING METHOD AND MULTI-MODAL PRE-TRAINING APPARATUS

Number of patents in Portfolio can not be more than 2000

United States of America

APP PUB NO 20240378865A1
SERIAL NO

18692000

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

The present disclosure provides a multi-modal pre-training method and apparatus. The method includes: sampling a video in a video-text pair to obtain a first video frame sequence; performing word segmentation processing on a text in the video-text pair to obtain a first word segmentation sequence; masking on the first video frame sequence to obtain a second video frame sequence; masking on the first word segmentation sequence to obtain a second word segmentation sequence; encoding the first video frame sequence to obtain a first video feature, and encoding the first word segmentation sequence to obtain a first word segmentation feature; encoding the second video frame sequence to obtain a second video feature, and encoding the second word segmentation sequence to obtain a second word segmentation feature; performing multi-modal pre-training by using the first video feature, the first word segmentation feature, the second video feature and the second word segmentation feature.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
BEIJING JINGDONG SHANGKE INFORMATION TECHNOLOGY CO LTDBEIJING 100086

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
LI, Yehao BEIJING, CN 3 1
MEI, Tao BEIJING, CN 63 1564
PAN, Yingwei BEIJING, CN 4 1
YAO, Ting BEIJING, CN 70 451

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation