The Margarita Dialogue Corpus is a collection of question-answer pairs defined both outside the context of a conversation and in the context of dialogues between one person and different people. This corpus is part of a methodology developed for creating the knowledge base for time-offset interaction applications and unstructured dialogue systems. The subject of this corpus is Margarita Bicec, a student at New York University Abu Dhabi. She defined a Knowledge Base (KB) of question-answer pairs by brainstorming some pairs and expanded it by recording and transcribing multiple conversations with strangers to fill-in other possible questions. She then recorded videos of her answers using the TOIA recorder developed by Abu Ali and annotated twenty transcribed dialogues by indicating which answers (if any exist) in the KB would be appropriate to play for any questions. Ten dialogues are used to expand the KB (named 'TRAIN' dialogues), and ten 'TEST' dialogues are left out to test answer selection models for unseen dialogues.
This release contains three datasets:
By downloading the Margarita Dialogue Copus Dataset files from HERE you agree to the terms of the license below.
// License for The Margarita Dialogue Corpus
Copyright 2019 New York University Abu Dhabi. All Rights Reserved.
A license to use and copy this software, data and its documentation solely for your internal research and evaluation purposes, without fee and without a signed licensing agreement, is hereby granted upon your download of the software, through which you agree to the following: 1) the above copyright notice, this paragraph and the following three paragraphs will prominently appear in all internal copies and modifications; 2) no rights to sublicense or further distribute this software are granted; 3) no rights to modify this software are granted; and 4) no rights to assign this license are granted. Please Contact the Office of Industrial Liaison, New York University, One Park Avenue, 6th Floor, New York, NY 10016 (212) 263-8178, for commercial licensing opportunities, or for further distribution, modification or license rights.
Created by Alberto M. Chierici, Nizar Habash and Margarita Bicec at the Computational Approaches to Modeling Language (CAMeL) Lab in New York University Abu Dhabi.
IN NO EVENT SHALL NYU, OR ITS EMPLOYEES, OFFICERS, AGENTS OR TRUSTEES ("COLLECTIVELY "NYU PARTIES") BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY KIND , INCLUDING LOST PROFITS, ARISING OUT OF ANY CLAIM RESULTING FROM YOUR USE OF THIS SOFTWARE, DATA AND ITS DOCUMENTATION, EVEN IF ANY OF NYU PARTIES HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH CLAIM OR DAMAGE.
NYU SPECIFICALLY DISCLAIMS ANY WARRANTIES OF ANY KIND REGARDING THE SOFTWARE and DATA, INCLUDING, BUT NOT LIMITED TO, NON-INFRINGEMENT, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, OR THE ACCURACY OR USEFULNESS, OR COMPLETENESS OF THE SOFTWARE. THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY, PROVIDED HEREUNDER IS PROVIDED COMPLETELY "AS IS". REGENTS HAS NO OBLIGATION TO PROVIDE FURTHER DOCUMENTATION, MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
If you use this resource, cite:
Chierici et al. (2020): Alberto M. Chierici, Nizar Habash, and Margarita Bicec (2020). The Margarita Dialogue Corpus: A Data Set for Time-Offset Interactions and Unstructured Dialogue Systems. Retrieved from http://resources.camel-lab.com.