Arabic Dialect Identification:

Arabic is a rich language with a wide collection of dialects in addition to Modern Standard Arabic (MSA). Arabic dialects differ in various ways from MSA. These include phonological, morphological, lexical, and syntactic differences. Although, in theory, Arabic dialects can be classified in various ways, categorizations of Arabic dialects remain arbitrary and primarily based on geographical divisions. Dialect identification is the task of automatically detecting the source variety of a given text or speech segment.

MSA is the only variety that is standardized, regulated, and taught in schools, necessitated by its use in written communication and formal venues. The regional dialects, used primarily for day-to-day dealings and spoken communication, remain somewhat absent from written communication compared with MSA. One domain of written communication in which both MSA and dialectal Arabic are commonly used is the online domain: Dialectal Arabic has a strong presence in blogs, forums, chatrooms, and user/reader commentary.


Example Output 1:

Arabic Sentence (sample input) Dialect (auto generated) Probability (auto generated)
الله، هو إيه أصله دهEgyptian - مصري68.1%
ليش عم تحكي هيك يا بعديLevantine - شامي65.3%
ما أبغي أسمع حاجة زودPeninsular - خليجي54.6%
يعيشك، إحنا نحبوك بالزافMaghrebi - مغاربي57.7%

Example Code 1:

<?php
    $Arabic 
= new \ArPHP\I18N\Arabic();

    
$sentences = array('الله، هو إيه أصله ده',
                       
'ليش عم تحكي هيك يا بعدي',
                       
'ما أبغي أسمع حاجة زود',
                       
'يعيشك، إحنا نحبوك بالزاف');

    echo <<< END
    <center>
      <table border="0" cellspacing="2" cellpadding="5" width="60%" dir="rtl">
        <tr>
          <td bgcolor="#27509D" align="center" width="50%">
            <b><font color="#ffffff">Arabic Sentence (sample input)</font></b>
          </td>
          <td bgcolor="#27509D" align="center" width="25%">
            <b><font color="#ffffff">Dialect (auto generated)</font></b>
          </td>
          <td bgcolor="#27509D" align="center" width="25%">
            <b><font color="#ffffff">Probability (auto generated)</font></b>
          </td>
        </tr>
    END;

    foreach (
$sentences as $sentence) {
        
$analysis $Arabic->arDialect($sentence);

        switch (
$analysis['dialect']) {
            case 
'Egyptian':
                
$dialect 'Egyptian - مصري';
                break;
            case 
'Levantine':
                
$dialect 'Levantine - شامي';
                break;
            case 
'Maghrebi':
                
$dialect 'Maghrebi - مغاربي';
                break;
            case 
'Peninsular':
                
$dialect 'Peninsular - خليجي';
                break;
        }
        
        
$probability sprintf('%0.1f'round(100 $analysis['probability'], 1));
        
        echo 
'<tr><td bgcolor="#f5f5f5" align="right">';
        echo 
'<font face="Tahoma">'.$sentence.'</font></td>';
        echo 
'<td bgcolor="#f5f5f5" align="center">'.$dialect.'</td>';
        echo 
'<td bgcolor="#f5f5f5" align="center">'.$probability.'%</td></tr>';
    }

    echo 
'</table></center>';

Related Documentation: arDialect