A new website for searching the ALC has been released, it provides a basic function for searching the entire corpus or a subset data using a number of determinants, also any subset of texts can be downloaded using the determinants. The website is accessed from: http://www.alcsearch.com
ALC on Sketch Engine
13 November 2014
Arabic Learner Corpus is now included to Sketch Engine free corpora. It is tagged using the Stanford segmenter and tagger.
Users are able to search for specific content using the search box on the top of the table, and sort the table content in an ascending or descending order by clicking on columns' headers.
Validating the XML files against DTD
17 October 2014
The XML files of the Arabic Learner Corpus are now validated against DTD (Document Type Definition). DTD defines the XML document structure to ensure that it is well formed and valid.
As a part of automating the corpus files generation process, the DTD was automatically added to the beginning of each XML file based on its structure. You can download the updated files from the DOWNLOAD page.
The main reference about the ALC
21 July 2014
The following paper includes details about the design and content of the ALC:
Alfaifi, A., Atwell, E. and Hedaya, I. (2014). Arabic Learner Corpus (ALC) v2: A New Written and Spoken Corpus of Arabic Learners. In: Ishikawa, S (ed.) Learner corpus studies in Asia and the world. Vol. 2. Papers from LCSAW2014, pp. 77-89. Kobe, Japan: School of Languages and Communication, Kobe University.
Statistics of the visits to the Arabic Learner Corpus website show that the website received more than 500 visits from 39 countries in less than three months (from 24 Feb to 15 May 2014).
More items have been added to the metadata of the corpus files
19 March 2014
In addition to the previous metadata about the corpus learners and texts (23 items), 3 further items have been added; country and city of where the text was written or recorded as well as the year (the Hajri year in the Arabic headers and the Gregorian year in the English headers). The current number of items of the metadata is 26, and such information assists researchers in restricting their search and analysis to specific type of learners or texts of the corpus data.The updated files can be downloaded from the DOWNLOAD page.
The spoken part of ALC is available in MP3
17 March 2014
The audio recordings, of those learners who granted permissions to publish their recordings online, have been prepared and added to the website, The total is 3 hours, 22 minutes, and 59 seconds, and all files can be downloaded in MP3 format from the DOWNLOAD page.
The whole ALC content in one file
10 March 2014
A new button has been added to the DOWNLOAD page, which enables you to download the whole content of ALC in one file. As the separate files, there are five formats for the one-file choice:- TXT with no metadata- TXT with Arabic metadata- TXT with English metadata- XML with Arabic metadata- XML with English metadata
Distinguished Participation Award for ALC at SSC-UK 2014 conference
2 February 2014
ALC Project wins a Distinguished Participation Award for ALC at the 7th Saudi Students Conference in Edinburgh, UK 1-2 February 2014. The corpus developer Abdullah Alfaifi had participated in the conference and introduce the ALC to the audience who are interested in Arabic language teaching and Learning.