IMPROVEMENT OF AUDIO-VISUAL QUESTION ANSWERING

Number of patents in Portfolio can not be more than 2000

United States of America Patent

APP PUB NO 20250104701A1
SERIAL NO

18472528

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

The present disclosure describes techniques for improving audio-visual question answering. A machine learning model is configured for audio-visual question answering (AVQA). The machine learning model comprises a first sub-model configured to capture semantic audio information and output an audio spatial feature map xas(1). The machine learning model comprises a second sub-model configured to extract visual features xvs and audio features xas and further configured to obtain a question vector xq. The machine learning model comprises a third sub-model configured to capture audio-visual correspondence at a granular level. A balanced AVQA dataset is created. The balanced AVQA dataset comprises balanced answer distribution in each question category. The machine learning model is trained to answer questions about visual objects, sounds, and their associations in videos using at least a subset of the balanced AVAQ dataset.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
LEMON INCP 0 BOX 31119 GRAND PAVILION HIBISCUS WAY 802 WEST BAY ROAD GRAND CAYMAN KYL-1205

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Dong, Zhikang Culver City, US 2 0
Liu, Xiulong Culver City, US 8 1
Zhang, Peng Los Angeles, US 846 8792

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation