ARNOR Dataset is a large manually labeled sentence-level test set for distant supervision relation classification. It contains 3,192 sentences and 9,051 instances for 11 relation types (including "None" type), and is carefully annotated to ensure accuracy.
Proactive human-machine conversation is a new conversation task, which aims to build a human-like conversational agent endowed with the abilityof proactively leading the conversation, such as introducing a new topic or maintaining the current topic.
BSTC (Baidu Speech Translation Corpus) is a large-scale dataset for automatic simultaneous interpretation. BSTC version 1.0 contains 50 hours of real speeches, including three parts, the audio files, the transcripts, and the translations.
BROAD (Baidu Research Open-Access Dataset) is designed to help institutions and individual developers train their models to accelerate the research on machine reading comprehension, autonomous cars, visual cognition and other Al related fields.