Download Page
The SAMER Readability Lexicon

Summary

The SAMER readability lexicon is a large-scale leveled readability lexicon for Modern Standard Arabic. The lexicon was manually annotated in triplicate by language professionals from three regions in the Arab world. Details of the creation process and analysis of the resulting resource are presented in Al Khalil et al. (2020).

The SAMER readability lexicon is available in two versions: v1.0 and v2.0.

Version 1.0 (v1.0) includes a leveled readability lexicon of 26,000 lemmas for Modern Standard Arabic.
Version 2.0 (v2.0) expands this to 40,000 lemmas, offering broader coverage and improved utility for readability assessment.

The creation of this resources was done under the New York University Abu Dhabi (NYUAD) funded Simplification of Arabic Masterpieces for Extensive Reading (SAMER) project. The purpose of the SAMER project is to build a corpus of curriculum reading material, formulate a graded reader scale for the simplification of modern fiction in Arabic intended for school-age learners, and then use it to guide the semi-automated simplification of a number of Arabic works of fiction, a task performed by human simplifiers and facilitated by state-of-the-art NLP computational tools. A project overview is presented in Al Khalil et al. (2017).

Team

Muhamed Al Khalil
Nizar Habash
Zhengyang Jiang
Hind Saddiki

Publications

Al Khalil, Muhamed, Nizar Habash, Zhengyang Jiang. A Large-Scale Leveled Readability Lexicon for Standard Arabic. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 3053-3062 Marseille, 2020. [PDF]
Saddiki, Hind, Nizar Habash, Violetta Cavalli-Sforza and Muhamed Al-Khalil. Feature Optimization for Predicting Readability of Arabic L1 and L2. In Proceedings of the ACL Workshop on Natural Language Processing Techniques for Educational Applications, Melbourne, Australia, 2018. [PDF]
Al-Khalil, Muhamed, Hind Saddiki, Nizar Habash, and Latifa Alfalasi. A Leveled Reading Corpus of Modern Standard Arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018. [PDF]
Al-Khalil, Mohamed, Nizar Habash and Hind Saddiki. Simplification of Arabic Masterpieces for Extensive Reading: A Project Overview. In Proceedings of the International Conference on Arabic Computational Linguistics, Dubai, UAE, 2017. [PDF]

Download

By downloading The SAMER Readability Lexicon files from HERE you agree to the terms of the license below.

//////////////////////////////////////////////////////////////////////////////
// License for The SAMER Readability Lexicon
//////////////////////////////////////////////////////////////////////////////

Copyright 2020 New York University Abu Dhabi. All Rights Reserved. A license to use and copy this software, data and its documentation solely for your internal research and evaluation purposes, without fee and without a signed licensing agreement, is hereby granted upon your download of the software, through which you agree to the following: 1) the above copyright notice, this paragraph and the following three paragraphs will prominently appear in all internal copies and modifications; 2) no rights to sublicense or further distribute this software are granted; 3) no rights to modify this software are granted; and 4) no rights to assign this license are granted. Please Contact the Office of Industrial Liaison, New York University, One Park Avenue, 6th Floor, New York, NY 10016 (212) 263-8178, for commercial licensing opportunities, or for further distribution, modification or license rights.

Created by Muhamed Al Khalil, Nizar Habash and Zhengyang Jiang at the Computational Approaches to Modeling Language (CAMeL) Lab in New York University Abu Dhabi.

IN NO EVENT SHALL NYU, OR ITS EMPLOYEES, OFFICERS, AGENTS OR TRUSTEES ("COLLECTIVELY "NYU PARTIES") BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING LOST PROFITS, ARISING OUT OF ANY CLAIM RESULTING FROM YOUR USE OF THIS SOFTWARE, DATA AND ITS DOCUMENTATION, EVEN IF ANY OF NYU PARTIES HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH CLAIM OR DAMAGE.

NYU SPECIFICALLY DISCLAIMS ANY WARRANTIES OF ANY KIND REGARDING THE SOFTWARE and DATA, INCLUDING, BUT NOT LIMITED TO, NON-INFRINGEMENT, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, OR THE ACCURACY OR USEFULNESS, OR COMPLETENESS OF THE SOFTWARE. THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY, PROVIDED HEREUNDER IS PROVIDED COMPLETELY "AS IS". REGENTS HAS NO OBLIGATION TO PROVIDE FURTHER DOCUMENTATION, MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

Please cite Al Khalil et al. (2020) if you use The SAMER Readability Lexicon in your research:

Al Khalil, Muhamed, Nizar Habash, Zhengyang Jiang. A Large-Scale Leveled Readability Lexicon for Standard Arabic. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 3053-3062 Marseille, 2020.

//////////////////////////////////////////////////////////////////////

CAMeL Lab Resources
CAMeL Lab

Download PageThe SAMER Readability Lexicon

Summary

Team

Publications

Download

Download Page
The SAMER Readability Lexicon