Class: I18N_Arabic_Normalise
Source Location: /Arabic/Normalise.php
This class provides various functions to manipulate arabic text and normalise it by applying filters, for example, to strip tatweel and tashkeel, to normalise hamza and lamalephs, and to unshape a joined Arabic text back into its normalised form.
Author(s):
Copyright:
- 2006-2016 Khaled Al-Sham'aa
|
|
Class Details
Class Methods
method charName [line 616]
string charName(
string
$archar)
|
|
Return Arabic letter name in arabic.
Tags:
Parameters:
method isAlef [line 448]
boolean isAlef(
string
$archar)
|
|
Checks for Arabic Alef forms (i.e. ALEF, ALEF MADDA, ALEF HAMZA ABOVE, ALEF HAMZA BELOW,ALEF WASLA, ALEF MAKSURA).
Tags:
Parameters:
method isHamza [line 426]
boolean isHamza(
string
$archar)
|
|
Checks for Arabic Hamza forms (i.e. HAMZA, WAW HAMZA, YEH HAMZA, HAMZA ABOVE, HAMZA BELOW, ALEF HAMZA BELOW, ALEF HAMZA ABOVE).
Tags:
Parameters:
method isHaraka [line 340]
boolean isHaraka(
string
$archar)
|
|
Checks for Arabic Harakat marks (i.e. FATHA, DAMMA, KASRA, SUKUN, TANWIN).
Tags:
Parameters:
method isLigature [line 404]
boolean isLigature(
string
$archar)
|
|
Checks for Arabic Ligatures like LamAlef (i.e. LAM ALEF, LAM ALEF HAMZA ABOVE, LAM ALEF HAMZA BELOW, LAM ALEF MADDA ABOVE).
Tags:
Parameters:
method isMoon [line 574]
boolean isMoon(
string
$archar)
|
|
Checks for Arabic Moon letters.
Tags:
Parameters:
method isShortharaka [line 361]
boolean isShortharaka(
string
$archar)
|
|
Checks for Arabic short Harakat marks (i.e. FATHA, DAMMA, KASRA, SUKUN).
Tags:
Parameters:
method isSmall [line 553]
boolean isSmall(
string
$archar)
|
|
Checks for Arabic Small letters (i.e. SMALL ALEF, SMALL WAW, SMALL YEH).
Tags:
Parameters:
method isSun [line 595]
boolean isSun(
string
$archar)
|
|
Checks for Arabic Sun letters.
Tags:
Parameters:
method isTanwin [line 382]
boolean isTanwin(
string
$archar)
|
|
Checks for Arabic Tanwin marks (i.e. FATHATAN, DAMMATAN, KASRATAN).
Tags:
Parameters:
method isTashkeel [line 319]
boolean isTashkeel(
string
$archar)
|
|
Checks for Arabic Tashkeel marks (i.e. FATHA, DAMMA, KASRA, SUKUN, SHADDA, FATHATAN, DAMMATAN, KASRATAN).
Tags:
Parameters:
method isTehlike [line 532]
boolean isTehlike(
string
$archar)
|
|
Checks for Arabic Teh forms (i.e. TEH, TEH MARBUTA).
Tags:
Parameters:
method isWawlike [line 511]
boolean isWawlike(
string
$archar)
|
|
Checks for Arabic Waw like forms (i.e. WAW, WAW HAMZA, SMALL WAW).
Tags:
Parameters:
method isWeak [line 469]
boolean isWeak(
string
$archar)
|
|
Checks for Arabic Weak letters (i.e. ALEF, WAW, YEH, ALEF_MAKSURA).
Tags:
Parameters:
method isYehlike [line 490]
boolean isYehlike(
string
$archar)
|
|
Checks for Arabic Yeh forms (i.e. YEH, YEH HAMZA, SMALL YEH, ALEF MAKSURA).
Tags:
Parameters:
method normalise [line 241]
string normalise(
string
$text)
|
|
Takes a string, it applies the various filters in this class to return a unicode normalised string suitable for activities such as searching, indexing, etc.
Tags:
Parameters:
method normaliseHamza [line 165]
string normaliseHamza(
string
$text)
|
|
Normalise all Hamza characters to their corresponding aleph character in an Arabic text.
Tags:
Parameters:
method normaliseLamaleph [line 193]
string normaliseLamaleph(
string
$text)
|
|
Unicode uses some special characters where the lamaleph and any hamza above them are combined into one code point. Some input system use them. This function expands these characters.
Tags:
Parameters:
method stripTashkeel [line 141]
string stripTashkeel(
string
$text)
|
|
Strip all tashkeel characters from an Arabic text.
Tags:
Parameters:
method stripTatweel [line 128]
string stripTatweel(
string
$text)
|
|
Strip all tatweel characters from an Arabic text.
Tags:
Parameters:
method unichr [line 226]
Return unicode char by its code point.
Tags:
Parameters:
method unshape [line 271]
string unshape(
string
$text)
|
|
Takes Arabic text in its joined form, it untangles the characters and unshapes them. This can be used to process text that was processed through OCR or by extracting text from a PDF document. Note that the result text may need further processing. In most cases, you will want to use the utf8Strrev function from this class to reverse the string. Most of the work of setting up the characters for this function is done through the ArUnicode.constants.php constants and the constructor loading.
Tags:
Parameters:
method utf8Strrev [line 284]
string utf8Strrev(
string
$str, [boolean
$reverse_numbers = false])
|
|
Take a UTF8 string and reverse it.
Tags:
Parameters:
|
|