This Gulf Arabic Corpus was created as part of the Computational Approaches to Modeling Languages CAMeL lab in New York University Abu Dhabi NYUAD.

Gulf Arabic

Strictly speaking, Gulf Arabic refers to the linguistic varieties spoken on the western coast of the Arabian Gulf, in Bahrain, Qatar, and the seven Emirates of the UAE (Qafisheh, 1977), as well as in Kuwait and in Al-Hasā -- the eastern region of Saudi Arabia (Holes, 1990). Omani, Hijazi, Najdi, and Baḥārna Arabic, among other additional dialects spoken in the Arabian Peninsula, are usually not included in grammars of Gulf Arabic due to the fact that they considerably vary in their linguistic features from the set of dialects listed above. In this current project, we extend the use of the term ‘Gulf Arabic’ to include any Arabic variety spoken by the indigenous populations residing the six countries of the Gulf Cooperation Council: Bahrain, Kuwait, Oman, UAE, Qatar, and Saudi Arabia.

Corpus Description

Corpus Collection A unique genre of written material that is specifically known to GA is online anonymous publicly published long conversational novels. We have found a huge collection of these novels online in one place. We automatically downloaded about 1,200 MS Word documents. Usually, such novels are written in lengthy threads that can be found in online forums. The data we got was collected by volunteering forum members into MS Word documents and then published by another member in an organized matter.

Corpus Genre The main theme of most of the novels is romantic, it also includes drama and sometimes tragedy. The structure of the novel is simple, it starts with a brief introduction that contains the title of the novel, the writer's pen name (no real names are used) and the country of the novel. The introduction is then followed by a prologue that usually contains a small piece of dialectal poetry or a small piece of literary writing usually in MSA. It also contains a brief description of the novel characters, though some writers prefer to introduce the characters as their role appears. Then comes the main body of the novel, which is often a dialogue between the characters, there is also some pieces of narration between conversations in either the dialect or MSA. The last part of the novel usually has some "moral" lessons narrated by the writer, writers also tend to ask the audience for positive criticism and opinions and whether they should continue writing more novels or not.

The targeted audience is mainly female teenagers, the nature of publishing the novels is highly interactive and dependable on the activity of the audience.

Research Team


We wish to thank all the writers of the novel for sharing it publicly, though all are written under pen names. We would also like to thank the graaam forum members who collected the scattered novels and put them together on MS words files and published them online.

We also thank the Curras members for sharing their web interface code that we built on to produce this website.