Data cleansing system and method

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 7729899
APP PUB NO 20080189316A1
SERIAL NO

11702811

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

An automated system and method is provided for debugging training data used to train an automated language identifier. The system and method collects texts written in a particular language, generates an occurrence count for words in each text by counting the number of times each of the words is found within the text, and generates an occurrence ratio (OR) of each of the words by dividing the occurrence count by the total number of words in each text. Words are then filtered from the texts in which their occurrence ratios are substantially higher than their occurrence ratios in at least one of the other texts, to generate a clean text.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
BASIS TECHNOLOGY CORPORATION150 CAMBRIDGEPARK DRIVE CAMBRIDGE MA 02140

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Otsuka, Nobuo San Francisco, US 21 1387

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation