Arabic dialect identification is the task of automatically labeling a segment of speech or text with the dialect it comes from.
This resource includes all of the datasets and code provided for the MADAR Shared Task on Arabic Fine-Grained Dialect Identification, which was part of the Fourth Arabic Natural Language Processing Workshop (ACL 2019). The dataset consists of two parts pertaining to Subtask 1 and Subtask 2.
This resource was developed at as part of the Multi-Arabic Dialect Applications and Resources (MADAR) Project, a collaboration between Carnegie Mellon University Qatar and New York University Abu Dhabi.
By downloading the MADAR Shared Task files from HERE you agree to the terms of the two licenses below.
//////////////////////////////////////////////////////////////////////
// License for MADAR Corpus/Lexicon Dataset
//////////////////////////////////////////////////////////////////////
Copyright 2018 Carnegie Mellon University and New York University Abu
Dhabi. All Rights Reserved.
A license to use and copy this dataset and its documentation solely
for your internal research and evaluation purposes, without fee and
without a signed licensing agreement, is hereby granted upon your
download of the dataset, through which you agree to the following: 1)
the above copyright notice, this paragraph and the following three
paragraphs will prominently appear in all internal copies and
modifications; 2) no rights to sublicense or further distribute this
software are granted; 3) no rights to modify this dataset are granted;
and 4) no rights to assign this license are granted. Please Contact
the Carnegie Mellon University "CMU" Center for Technology Transfer
and Enterprise Creation, 4615 Forbes Avenue, Suite 302, Pittsburgh, PA
15213 - phone 412.268.7393, for commercial licensing opportunities, or
for further distribution, modification or license rights.
Created by Houda Bouamor, Nizar Habash, Mohammad Salameh, Wajdi
Zaghouani, Owen Rambow, Dana Abdulrahim, Ossama Obeid, Salam Khalifa,
Fadhl Eryani, Alexander Erdmann and Kemal Oflazer.
IN NO EVENT SHALL CMU OR NYU, OR THEIR EMPLOYEES, OFFICERS, AGENTS OR
TRUSTEES ("COLLECTIVELY "CMU/NYU PARTIES") BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY
KIND, INCLUDING LOST PROFITS, ARISING OUT OF ANY CLAIM RESULTING FROM
YOUR USE OF THIS DATASET AND ITS DOCUMENTATION, EVEN IF ANY OF CMU/NYU
PARTIES HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH CLAIM OR DAMAGE.
CMU/NYU SPECIFICALLY DISCLAIMS ANY WARRANTIES OF ANY KIND REGARDING
THE DATASET, INCLUDING, BUT NOT LIMITED TO, NON-INFRINGEMENT, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE, OR THE ACCURACY OR USEFULNESS, OR COMPLETENESS OF THE
SOFTWARE. THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY,
PROVIDED HEREUNDER IS PROVIDED COMPLETELY "AS IS". REGENTS HAS NO
OBLIGATION TO PROVIDE FURTHER DOCUMENTATION, MAINTENANCE, SUPPORT,
UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
If you use this resource, cite:
Bouamor, Houda, Nizar Habash, Mohammad Salameh, Wajdi Zaghouani, Owen
Rambow, Dana Abdulrahim, Ossama Obeid, Salam Khalifa, Fadhl Eryani,
Alexander Erdmann and Kemal Oflazer. The MADAR Arabic Dialect Corpus
and Lexicon. In Proceedings of the International Conference on
Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.
//////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////
// License for MADAR Twitter Dataset
//////////////////////////////////////////////////////////////////////
Copyright 2019 Carnegie Mellon University and New York University Abu
Dhabi. All Rights Reserved.
This work is licensed under the Creative Commons
Attribution-NonCommercial-NoDerivatives 4.0 International License.
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
If you use this resource, cite:
Bouamor, Houda, Sabit Hassan, and Nizar Habash.
The MADAR Shared Task on Arabic Fine-Grained Dialect Identification.
In Proceedings of the Workshop for Arabic Natural Language Processing.
Florence, Italy, 2019.
//////////////////////////////////////////////////////////////////////