Al-Kashi
was one of the best mathematicians in the Islamic world
In French, the law of cosines is named Théorème d'Al-Kashi (Theorem of Al-Kashi),
as al-Kashi was the first to provide an explicit statement of the law of cosines
in a form suitable for triangulation.
In one of his numerical approximations of pi, he correctly computed 2pi to 16 decimal
places of accuracy. This was far more accurate than the estimates earlier given.
We aim in Al-Kashi project to provide a rich PHP package full of statistical functions useful
for online business intelligent and data mining, possible applications may include an online
log file analysis, Ad's and Campaign statistics, or survey/voting results on-fly analysis. It is
published under GPL license;
you can download it from PHPClasses.org website,
and you can check the change log here.
Khaled Al-Sham'aa
Would you like to know more about statistical concepts and procedures implemented in this project?
Please download this free electronic book
assembled from Wikipedia articles to get detailed background information.
لمزيد من المعلومات عن هذا المشروع باللغة العربية إحيلكم إلى
هذه التدوينات
Example Data
The data was extracted from the 1974 Motor Trend US magazine,
and comprises fuel consumption and 10 aspects of automobile
design and performance for 32 automobiles (1973-74 models).
You can download example data file from here.
$sep = "\t"; $nl = "\n";
$content = file_get_contents('data.txt');
$records = explode($nl, $content);
$header = explode($sep, trim(array_shift($records)));
$data = array_fill_keys($header, array());
foreach ($records as $id=>$record) {
$record = trim($record);
if ($record == '') continue;
$fields = explode($sep, $record);
$titles = $header;
foreach ($fields as $field) {
$title = array_shift($titles);
$data[$title][] = $field;
}
}
$x = $data['wt'];
$y = $data['mpg'];
require('kashi.php');
$kashi = new Kashi();
Summary Statistics: Mean (x) | 3.21725 | Mean (x, "geometric") | 3.0701885671208 | Mean (x, "harmonic") | 2.9182632148104 | Median (x) | 3.325 | Mode (x) | Array
(
[0] => 3.44
)
| Variance (x) | 0.95737896774194 | SD (x) | 0.9784574429897 | %CV (x) | 30.412850819479 | Skewness (x) | 0.46591610679299 | Is it significant (i.e. test it against 0)? | bool(false)
| Kurtosis (x) | 0.41659466963493 | Is it significant (i.e. test it against 0)? | bool(false)
|
Rank (x)
| 9, 12, 7, 16, 18, 21, 23, 15, 13, 18, 18, 29, 25, 26, 30, 32, 31, 6, 2, 3, 8, 22, 17, 27, 28, 4, 5, 1, 14, 10, 23, 11 |
// $x is an array of values
echo 'Arithmetic Mean: ' . $kashi->mean($x) . ' ';
echo 'Aeometric Mean: ' . $kashi->mean($x, "geometric") . ' ';
echo 'Harmonic Mean: ' . $kashi->mean($x, "harmonic") . ' ';
echo 'Mode: ' . print_r($kashi->mode($x)) . ' ';
echo 'Median: ' . $kashi->median($x) . ' ';
echo 'Variance: ' . $kashi->variance($x) . ' ';
echo 'SD: ' . $kashi->sd($x) . ' ';
echo '%CV: ' . $kashi->cv($x) . ' ';
echo 'Skewness: ' . $kashi->skew($x) . ' ';
echo 'Is it significant (i.e. test it against 0)? ';
var_dump($kashi->isSkew($x));
echo 'Kurtosis: ' . $kashi->kurt($x) . ' ';
echo 'Is it significant (i.e. test it against 0)? ';
var_dump($kashi->isKurt($x));
echo 'Rank (x): ';
echo implode(', ', $kashi->rank($x)) . ' ';
Top
Statistical Graphics: Boxplot | Array
(
[min] => 1.513
[q1] => 2.62
[median] => 3.325
[q3] => 3.73
[max] => 5.282
[outliers] => Array
(
[0] => 5.345
[1] => 5.424
)
)
| Histogram | Array
(
[1.513-2.002] => 4
[2.002-2.491] => 4
[2.491-2.98] => 4
[2.98-3.469] => 9
[3.469-3.957] => 7
[3.957-4.446] => 1
[4.446-4.935] => 0
[4.935-5.424] => 3
)
| Normal Q-Q Plot | x = -0.62609901275838, -0.36012989155586, -0.83051087731871, -0.039176085543034, 0.11776987461046, 0.36012989155586, 0.62609901275838, -0.11776987461046, -0.27769043950814, 0.19709908415753, 0.27769043950814, 1.2298587580185, 0.72451438304624, 0.83051087731871, 1.417797139161, 2.1538746917937, 1.6759397215193, -0.94678175657479, -1.6759397215193, -1.417797139161, -0.72451438304624, 0.44509652516901, 0.039176085543034, 0.94678175657479, 1.0775155681381, -1.2298587580185, -1.0775155681381, -2.1538746917937, -0.19709908415753, -0.53340970683585, 0.53340970683585, -0.44509652516901
y = 2.62, 2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44, 3.44, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.2, 1.615, 1.835, 2.465, 3.52, 3.435, 3.84, 3.845, 1.935, 2.14, 1.513, 3.17, 2.77, 3.57, 2.78 | Ternary Plot | x = 0.729, 0.722, 0.734, 0.706, 0.695, 0.675, 0.659, 0.723, 0.701, 0.692, 0.679, 0.663, 0.676, 0.654, 0.577, 0.574, 0.625, 0.779, 0.785, 0.788, 0.716, 0.667, 0.664, 0.645, 0.691, 0.763, 0.766, 0.796, 0.689, 0.723, 0.672, 0.718
y = 0.356, 0.36, 0.369, 0.382, 0.376, 0.419, 0.407, 0.364, 0.406, 0.387, 0.408, 0.398, 0.395, 0.422, 0.463, 0.459, 0.403, 0.312, 0.317, 0.31, 0.394, 0.407, 0.417, 0.41, 0.368, 0.34, 0.323, 0.3, 0.375, 0.354, 0.381, 0.377 |
echo 'Boxplot:
';
print_r($kashi->boxplot($x));
echo ' ';
echo 'Histogram:
';
print_r($kashi->hist($x, 8));
echo ' ';
echo 'Normal Q-Q Plot: ';
$qq = $kashi->qqnorm($x);
echo 'x = ' . implode(', ', $qq['x']) . ' ';
echo 'y = ' . implode(', ', $qq['y']) . ' ';
echo 'Ternary Plot: ';
$xy = $kashi->ternary($data['wt'], $data['mpg'], $data['qsec']);
echo 'x = ' . implode(', ', $xy['x']) . ' ';
echo 'y = ' . implode(', ', $xy['y']) . ' ';
Top
Correlation, Regression, and t-Test: Covariance (x, y) | -5.1166846774194 | Correlation (x, y) | -0.86765937651723 | Significant of Correlation | 1.2939593840855E-10 | Path Analysis | Array
(
[1] => -0.70763801614376
[2] => -0.20274707094052
[3] => 0.15145821845688
)
| Regression (y = a + b*x) | Array
(
[intercept] => 37.285126167342
[slope] => -5.3444715727227
[r-square] => 0.75283279365826
[adj-r-square] => 0.74459388678021
[intercept-se] => 1.8776273372559
[intercept-2.5%] => 33.450499570026
[intercept-97.5%] => 41.119752764658
[slope-se] => 0.55910104509932
[slope-2.5%] => -6.486308238383
[slope-97.5%] => -4.2026349070623
[F-statistic] => 91.375325003762
[p-value] => 1.2939604943085E-10
)
| Multiple Regression (y = a + b1*x1 + b2*x2) | Array
(
[intercept] => 37.227270116447
[b1] => -3.8778307424046
[b2] => -0.031772946982161
[r-square] => 0.82678545188279
[adj-r-square] => 0.81483962097816
[intercept-se] => 0
[intercept-2.5%] => 37.227270116447
[intercept-97.5%] => 37.227270116447
[b1-se] => 0
[b1-2.5%] => -3.8778307424046
[b1-97.5%] => -3.8778307424046
[b2-se] => 0
[b2-2.5%] => -0.031772946982161
[b2-97.5%] => -0.031772946982161
[F-statistic] => 69.211213391777
[p-value] => 9.1090543852236E-12
)
| t-Test unpaired | -15.632569384303 | Test of null hypothesis that mean of x = mean of y (assumed equal variances) | 2.2204460492503E-16 | Test of null hypothesis that mean of x = mean of y (assumed unequal variances) | 0 | t-Test paired | -13.847209446072 | Test of null hypothesis that mean of x-y = 0 Probability is | 8.1046280797636E-15 |
echo 'Covariance: ' . $kashi->cov($x, $y) . ' ';
echo 'Correlation: ' . $kashi->cor($x, $y) . ' ';
$r = $kashi->cor($x, $y);
$n = count($x);
echo 'Significant of Correlation: ' . $kashi->corTest($r, $n) . ' ';
echo 'Path Analysis: ' . print_r($kashi->path($y, array(1=>$x, $data['hp'], $data['qsec'])), true) . ' ';
echo 'Regression: ' . print_r($kashi->lm($y, $x), true) . ' ';
echo 'Multiple Regression: ' . print_r($kashi->lm($data['mpg'], $data['wt'], $data['hp'])), true) . ' ';
echo 't-Test unpaired: ' . $kashi->tTest($x, $y, false) . ' ';
echo 'Test (assumed equal variances): ' . $kashi->tDist($kashi->tTest($x, $y, false), $kashi->tTestDf($x, $y, true, false)) . ' ';
echo 'Test (assumed unequal variances): ' . $kashi->tDist($kashi->tTest($x, $y, false), $kashi->tTestDf($x, $y, false, false)) . ' ';
echo 't-Test paired: ' . $kashi->tTest($x, $y, true) . ' ';
echo 'Test: ' . $kashi->tDist($kashi->tTest($x, $y, true), $kashi->tTestDf($x, $y, false, true)) . ' ';
Top
Distributions: Normal distribution (x=0.5, mean=0, sd=1) | 0.3520653267643 | Probability for the Student t-distribution (t=3, n=10) one-tailed | 0.01334365502257 | Probability for the Student t-distribution (t=3, n=10) two-tailed | 0.0066718275112848 | Probability for F distribution (f=2, df1=12, df2=15) | 0.10268840717083 | Inverse of the standard normal cumulative distribution, with a probability of (p=0.95) | 1.6448536251337 | t-value of the Student's t-distribution for the probability $p and $n degrees of freedom (p=0.05, n=29) | 2.0452296438589 |
Standardize (x) (mean=0 & variance=1) | -0.61039956748153, -0.34978526910097, -0.91700462439985, -0.002299537926887, 0.22765425476185, 0.24809459188973, 0.36051644609311, -0.027849959336746, -0.068730633592521, 0.22765425476185, 0.22765425476185, 0.8715248742903, 0.52403914311621, 0.57513998593593, 2.0775047648356, 2.2553356978483, 2.1745963661931, -1.0396466471672, -1.6375265081579, -1.4126827997511, -0.76881218022266, 0.3094156032734, 0.22254417047987, 0.63646099731959, 0.64157108160156, -1.3104811141117, -1.1009676585508, -1.7417722275101, -0.048290296464633, -0.45709703902238, 0.36051644609311, -0.44687687045844 |
echo 'Normal distribution (x=0.5, mean=0, sd=1): ' . $kashi->norm(0.5, 0, 1) . ' ';
echo 'Probability for the Student t-distribution (t=3, n=10) one-tailed: ';
echo $kashi->tDist(3, 10, 1) . ' ';
echo 'Probability for the Student t-distribution (t=3, n=10) two-tailed: ';
echo $kashi->tDist(3, 10, 2) . ' ';
echo 'F probability distribution (f=2, df1=12, df2=15): ' . $kashi->fDist(2, 12, 15) . ' ';
echo 'Inverse of the standard normal cumulative distribution (p=0.95): ';
echo $kashi->inverseNormCDF(0.95) . ' ';
echo 't-value of the Student\'s t-distribution (p=0.05, n=29): ';
echo $kashi->inverseTCDF(0.05, 29) . ' ';
echo 'Standardize (x) (i.e. mean=0 & variance=1): ';
echo implode(', ', $kashi->standardize($x)) . ' ';
Top
Chi-square test or Contingency tables (A/B testing):
Calculate the probability that number of cylinders distribution in automatic and manual transmission cars is same | 0.012646605046107 |
$table['Automatic'] = array('4 Cylinders' => 3, '6 Cylinders' => 4, '8 Cylinders' => 12);
$table['Manual'] = array('4 Cylinders' => 8, '6 Cylinders' => 3, '8 Cylinders' => 2);
$results = $kashi->chiTest($table);
$probability = $kashi->chiDist($result['chi'], $result['df']);
echo 'Chi-square test probability: ' . $probability . ' ';
Top
Diversity index: Shannon index for number of forward gears | 1.0130227035447 | Simpson index for number of cylinders | 0.357421875 |
$gear = array('3' => 15, '4' => 12, '5' => 5);
$cyl = array('4' => 11, '6' => 7, '8' => 14);
echo 'Shannon index for gear: ' . $kashi->diversity($gear) . ' ';
echo 'Simpson index for cyl: ' . $kashi->diversity($cyl, 'simpson') . ' ';
Top
Analysis of Variance (ANOVA): Analysis of variance procedure (ANOVA)
Typical ANOVA example output (mpg ~ cyl):ANOVA table
Variate: mpg
Source of
variation d.f. s.s. m.s. v.r. F pr.
cyl 2 824.78 412.39 39.70 <.001
Residual 29 301.26 10.39
Total 31 1126.05
Tables of means
Grand mean 20.09
cyl 4 6 8
26.66 19.74 15.10
rep. 11 7 14
Standard errors of means
e.s.e. 1.218 min.rep
0.861 max.rep
Standard errors of differences of means
s.e.d. 1.723X min.rep
1.218X max.rep
Least significant differences of means (5% level)
l.s.d. 3.524X min.rep
2.492X max.rep
Stratum standard errors and coefficients of variation
d.f. s.e. cv%
29 3.223 16.0
| Array
(
[TDF] => 2
[EDF] => 29
[TotDF] => 31
[SST] => 824.7845900974
[SSE] => 301.2625974026
[SSTot] => 1126.0471875
[MST] => 412.3922950487
[MSE] => 10.388365427676
[VRT] => 39.697515255869
[F] => 4.9789191744003E-9
[Mean] => 20.090625
[Means] => Array
(
[4] => 26.6636364
[6] => 19.7428571
[8] => 15.1000000
)
[Reps] => Array
(
[4] => 11
[6] => 7
[8] => 14
)
[SE] => Array
(
[min] => 1.2182168131961
[max] => 0.86140936956643
)
[SED] => Array
(
[min] => 1.7228187391329
[max] => 1.2182168131961
)
[LSD] => Array
(
[min] => 3.5235599562701
[max] => 2.491533138996
)
[CV] => 16.042799717154
)
|
require('kashi_anova.php');
// $obj = new KashiANOVA($dbname, $dbuser, $dbpass, $dbhost);
$obj = new KashiANOVA('test', 'root', '', 'localhost');
$str = file_get_contents('anova_data.txt');
$obj->loadString($str);
// mpg ~ cyl
$result = $obj->anova('cyl', 'mpg');
print_r($result);
Top
Cluster Analysis: K-Means Clustering | Array
(
[Chrysler Imperial] => 0
[Pontiac Firebird] => 0
[AMC Javelin] => 0
[Camaro Z28] => 0
[Lincoln Continental] => 0
[Cadillac Fleetwood] => 0
[Merc 450SLC] => 0
[Merc 450SL] => 0
[Merc 450SE] => 0
[Dodge Challenger] => 0
[Duster 360] => 0
[Ford Pantera L] => 0
[Hornet Sportabout] => 0
[Maserati Bora] => 0
[Toyota Corona] => 1
[Porsche 914-2] => 1
[Lotus Europa] => 1
[Ferrari Dino] => 1
[Fiat X1-9] => 1
[Mazda RX4] => 1
[Toyota Corolla] => 1
[Honda Civic] => 1
[Fiat 128] => 1
[Mazda RX4 Wag] => 1
[Merc 280C] => 1
[Merc 280] => 1
[Merc 230] => 1
[Merc 240D] => 1
[Valiant] => 1
[Hornet 4 Drive] => 1
[Datsun 710] => 1
[Volvo 142E] => 1
)
| Hierarchical Clustering | 32 15 14 0.034867528963888
33 12 11 0.046511652279906
34 1 0 0.048063902847295
35 10 9 0.048146270217687
36 33 13 0.048374485470338
37 24 4 0.06456633193609
38 19 17 0.067898627038737
39 22 21 0.092305891561629
40 39 37 0.11301195978463
41 32 16 0.11529825256692
42 31 2 0.1155541020107
43 5 3 0.11717892926293
44 40 36 0.11995870908923
45 23 6 0.12445889917409
46 38 25 0.12703468709516
47 46 42 0.19819935352147
48 8 7 0.20845446781686
49 48 20 0.22553907135502
50 45 44 0.23476357897562
51 47 18 0.24068916220486
52 50 41 0.25528946686225
53 34 29 0.26595333894602
54 51 27 0.27674027068183
55 54 26 0.28056404941297
56 49 43 0.28521660028422
57 56 35 0.30779338554525
58 30 28 0.35715746216011
59 55 53 0.37801491177356
60 59 57 0.42234403985919
61 60 52 0.52592878486916
62 61 58 0.49319668374021
|
require('kashi_cluster.php');
$obj = new KashiCluster();
$obj->dataLoad($data);
$result = $obj->kMean(2);
print_r($result);
// Heretical tree output has no header, and consists of four columns. For each row, the first column is the
// identifier of the node, the second and third columns are child nodes identifier, and the fourth column used
// to determine the height of the node when rendering a tree.
$tree = $obj->hClust();
echo "$tree ";
Top
Time Series Analysis: Moving Average | 2.894, 3.062, 3.201, 3.375, 3.362, 3.362, 3.358, 3.458, 3.566, 3.692, 4.054, 4.4508, 4.7058, 4.3998, 3.9668, 3.2838, 2.692, 2.327, 2.574, 3.019, 3.421, 3.315, 3.039, 2.6546, 2.5206, 2.3056, 2.6326, 2.7606 |
echo 'Moving Average for x: ' . implode(', ', $kashi->movingAvg($x, 5)) . ' ';
Top
Matrix Functions: | 1 2 | | 5 7 |
A = | 3 4 | , B = | 6 8 | A + B | | 6 9 |
| 9 12 | | B - A | | 4 5 |
| 3 4 | | A * 2 | | 2 4 |
| 6 8 | | A * B | | 17 23 |
| 39 53 | | Transpose of B, t(B) | | 5 6 |
| 7 8 | | Determinant of A, |A| | -2 | Cofactor of A | | 4 -3 |
| -2 1 | | Adjoint of A | | 4 -2 |
| -3 1 | | Inverse of A | | -2 1 |
| 1.5 -0.5 | |
$A[1][1] = 1;
$A[1][2] = 2;
$A[2][1] = 3;
$A[2][2] = 4;
$B[1][1] = 5;
$B[1][2] = 7;
$B[2][1] = 6;
$B[2][2] = 8;
echo 'A + B = ', print_r($kashi->mAddition($A, $B), true), ' ';
echo 'B - A = ', print_r($kashi->mSubtraction($B, $A), true), ' ';
echo 'A * 2 = ', print_r($kashi->mMultiplication($A, 2), true), ' ';
echo 'A * B = ', print_r($kashi->mMultiplication($A, $B), true), ' ';
echo 'Transpose of B, t(B) = ', print_r($kashi->mTranspose($B), true), ' ';
echo 'Determinat of A, |A| = ', print_r($kashi->mDeterminant($A), true), ' ';
echo 'Cofactor of A = ', print_r($kashi->mCofactor($A), true), ' ';
echo 'Adjoint of A = ', print_r($kashi->mAdjoint($A), true), ' ';
echo 'Inverse of A = ', print_r($kashi->mInverse($A), true), ' ';
Top
Solve System of Linear Equations: System of Linear Equations
2*X2 + X3 = 9
X1 + 4*X2 - X3 = 23
-X1 + X2 + X3 = 2 | Array
(
[1] => 2
[2] => 5
[3] => -1
)
|
$X[1] = array(1=>0, 2, 1);
$X[2] = array(1=>1, 4, -1);
$X[3] = array(1=>-1, 1, 1);
$Y = array(1=>9, 23, 2);
print_r($kashi->solve($X, $Y));
Top
To-do list:
Principal Component Analysis (PCA)
Multiple Linear Regression and Relative Weights
Analysis of Covariance
Extra Clustering Methods (i.e. Linkage Criteria)
Eigenvalues and Eigenvectors of Matrices
Export Graphics in SVG Format
Example Data Description (Motor Trend Car Road Tests):
Format: A data frame with 32 observations on 12 variables.
ID |
Title |
Description |
1 |
model |
Car models |
2 |
mpg |
Miles/(US) gallon |
3 |
cyl |
Number of cylinders |
4 |
disp |
Displacement (cu.in.) |
5 |
hp |
Gross horsepower |
6 |
drat |
Rear axle ratio |
7 |
wt |
Weight (lb/1000) |
8 |
qsec |
1/4 mile time |
9 |
vs |
V/S |
10 |
am |
Transmission (0 = automatic, 1 = manual) |
11 |
gear |
Number of forward gears |
12 |
carb |
Number of carburetors |
You can download example data file from here.
Top
|