Arabic
[ class tree: Arabic ] [ index: Arabic ] [ all elements ]

Class: ArAutoSummarize

Source Location: /sub/ArAutoSummarize.class.php

Class Overview


This PHP class do automatic keyphrase extraction to provide a quick mini-summary for a long Arabic document


Author(s):

Copyright:

  • 2009 Khaled Al-Shamaa

Variables

Methods



Class Details

[line 144]
This PHP class do automatic keyphrase extraction to provide a quick mini-summary for a long Arabic document



Tags:

author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
copyright:  2009 Khaled Al-Shamaa
link:  http://www.ar-php.org
license:  LGPL


[ Top ]


Class Variables

$_commonChars = array('É','å','í','ä','æ','Ê','á','Ç','Ó','ã', 'e', 't', 'a', 'o', 'i', 'n', 's')

[line 149]



Tags:

access:  protected

Type:   mixed


[ Top ]

$_commonWords = array()

[line 153]



Tags:

access:  protected

Type:   mixed


[ Top ]

$_importantWords = array()

[line 154]



Tags:

access:  protected

Type:   mixed


[ Top ]

$_normalizeAlef = array('Ã','Å','Â')

[line 146]



Tags:

access:  protected

Type:   mixed


[ Top ]

$_normalizeDiacritics = array('ó','ð','õ','ñ','ö','ò','ú','ø')

[line 147]



Tags:

access:  protected

Type:   mixed


[ Top ]

$_separators = array('.',"\n",'¡','º','(','[','{',')',']','}',',',';')

[line 151]



Tags:

access:  protected

Type:   mixed


[ Top ]



Class Methods


constructor __construct [line 159]

ArAutoSummarize __construct( )

Loads initialize values



Tags:

access:  public


[ Top ]

method cleanCommon [line 480]

string cleanCommon( string $str, [string $inputCharset = null], [string $outputCharset = null], [object $main = null])

Extracting common Arabic words (roughly) from input Arabic string (document content)



Tags:

return:  Arabic document as a string free of common words (roughly)
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input normalized Arabic document as a string
string   $inputCharset   (optional) Input charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set input charset)
string   $outputCharset   (optional) Output charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set output charset)
object   $main   Main Ar-PHP object to access charset converter options

[ Top ]

method doRateSummarize [line 311]

string doRateSummarize( string $str, integer $rate, string $keywords, [string $inputCharset = null], [string $outputCharset = null], [object $main = null])

Summarize percentage of the input Arabic string (document content) into output



Tags:

return:  Output summary requested
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input Arabic document as a string
integer   $rate   Rate of output summary sentence number as percentage of the input Arabic string (document content)
string   $keywords   List of keywords higlited by search process
string   $inputCharset   (optional) Input charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set input charset)
string   $outputCharset   (optional) Output charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set output charset)
object   $main   Main Ar-PHP object to access charset converter options

[ Top ]

method doSummarize [line 288]

string doSummarize( string $str, integer $int, string $keywords, [string $inputCharset = null], [string $outputCharset = null], [object $main = null])

Summarize input Arabic string (document content) into specific number of sentences in the output



Tags:

return:  Output summary requested
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input Arabic document as a string
integer   $int   Number of sentences required in output summary
string   $keywords   List of keywords higlited by search process
string   $inputCharset   (optional) Input charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set input charset)
string   $outputCharset   (optional) Output charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set output charset)
object   $main   Main Ar-PHP object to access charset converter options

[ Top ]

method getMetaKeywords [line 384]

string getMetaKeywords( string $str, integer $int, [string $inputCharset = null], [string $outputCharset = null], [object $main = null])

Extract keywords from a given Arabic string (document content)



Tags:

return:  List of the keywords extracting from input Arabic string (document content)
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input Arabic document as a string
integer   $int   Number of keywords required to be extracting from input string (document content)
string   $inputCharset   (optional) Input charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set input charset)
string   $outputCharset   (optional) Output charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set output charset)
object   $main   Main Ar-PHP object to access charset converter options

[ Top ]

method highlightRateSummary [line 361]

string highlightRateSummary( string $str, integer $rate, string $keywords, string $style, [string $inputCharset = null], [string $outputCharset = null], [object $main = null])

Highlight key sentences (summary) as percentage of the input string (document content) using CSS and send the result back as an output.



Tags:

return:  Output highlighted key sentences summary (using CSS)
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input Arabic document as a string
integer   $rate   Rate of highlighted key sentences summary number as percentage of the input Arabic string (document content)
string   $keywords   List of keywords higlited by search process
string   $style   Name of the CSS class you would like to apply
string   $inputCharset   (optional) Input charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set input charset)
string   $outputCharset   (optional) Output charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set output charset)
object   $main   Main Ar-PHP object to access charset converter options

[ Top ]

method highlightSummary [line 336]

string highlightSummary( string $str, integer $int, string $keywords, string $style, [string $inputCharset = null], [string $outputCharset = null], [object $main = null])

Highlight key sentences (summary) of the input string (document content) using CSS and send the result back as an output



Tags:

return:  Output highlighted key sentences summary (using CSS)
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input Arabic document as a string
integer   $int   Number of key sentences required to be highlighted in the input string (document content)
string   $keywords   List of keywords higlited by search process
string   $style   Name of the CSS class you would like to apply
string   $inputCharset   (optional) Input charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set input charset)
string   $outputCharset   (optional) Output charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set output charset)
object   $main   Main Ar-PHP object to access charset converter options

[ Top ]

method loadExtra [line 182]

void loadExtra( )

Load enhanced Arabic stop words list



Tags:

access:  public


[ Top ]

method _acceptedWord [line 661]

boolean _acceptedWord( string $word)

Check some conditions to know if a given string is a formal valid word or not



Tags:

return:  True if passed string is accepted as a valid word else it will return False
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

string   $word   String to be checked if it is a valid word or not

[ Top ]

method _doNormalize [line 458]

string _doNormalize( string $str)

Normalized Arabic document



Tags:

return:  Normalized Arabic document
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

string   $str   Input Arabic document as a string

[ Top ]

method _draftStem [line 507]

string _draftStem( string $str)

Remove less significant Arabic letter from given string (document content).

Please note that output will not be human readable.




Tags:

return:  Output string after removing less significant Arabic letter (not human readable output)
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

string   $str   Input Arabic document as a string

[ Top ]

method _minAcceptedRank [line 639]

integer _minAcceptedRank( array $arr, integer $int)

Calculate minimum rank for sentences which will be including in the summary



Tags:

return:  Minimum accepted sentence rank (sentences with rank more than this will be listed in the document summary)
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

array   $arr   Sentences ranks
integer   $int   Number of sentences you need to include in your summary

[ Top ]

method _rankSentences [line 564]

array _rankSentences( array $sentences, $stemmedSentences, array $arr, array $stemedSentences)

Ranks sentences in a given Arabic string (document content).



Tags:

return:  Two dimension array, first item is an array of document sentences, second item is an array of ranks of document sentences.
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

array   $sentences   Sentences of the input Arabic document as an array
array   $stemedSentences   Stemmed sentences of the input Arabic document as an array
array   $arr   Words ranks array (word as an index and value refer to the word frequency)
   $stemmedSentences  

[ Top ]

method _rankWords [line 523]

hash _rankWords( string $str)

Ranks words in a given Arabic string (document content). That rank refers to the frequency of that word appears in that given document.



Tags:

return:  Associated array where document words referred by index and those words ranks referred by values of those array items.
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

string   $str   Input Arabic document as a string

[ Top ]

method _summarize [line 206]

string _summarize( string $str, string $keywords, integer $int, string $mode, string $output, [ $style = null], [string $inputCharset = null], [string $outputCharset = null], [object $main = null])

Core summarize function that implement required steps in the algorithm



Tags:

return:  Output summary requested
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input Arabic document as a string
string   $keywords   List of keywords higlited by search process
integer   $int   Sentences value (see $mode effect also)
string   $mode   Mode of sentences count [number|rate]
string   $output   Output mode [summary|highlight]
string   $inputCharset   (optional) Input charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set input charset)
string   $outputCharset   (optional) Output charset [utf-8|windows-1256|iso-8859-6] default value is NULL (use set output charset)
object   $main   Main Ar-PHP object to access charset converter options
   $style  

[ Top ]


Documentation generated on Fri, 12 Mar 2010 01:01:42 +0300 by phpDocumentor 1.4.0