Google Wikilinks corpus

Ebiquity Research Group

Institutional Group • 10 people

Google Wikilinks corpus

Google released the Wikilinks Corpus, a collection of 40M disambiguated mentions from 10M web pages to 3M Wikipedia pages. This data can be used to train systems that do entity linking and cross-document co-reference, problems that Google researchers attacked with an earlier version of this data (see Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models).

You can download the data as ten 175MB files from and some addional tools from UMASS.

This is yet another example of the important role that Wikipedia continues to play in building a common, machine useable semantic substrate for human conceptualizations.

Visit Website

Post Info

posted	March 8, 2013
sponsor	ebiquity research group
tags	google machine-learning nlp semantic-web
share

Google Wikilinks corpus

Post Info

Recent Posts