Histograms For All The World
Description:
A certain university has been doing histograms with the student's results at the end of the course. We are going to explain the statistical procedure of this Institution using the results of a small class of advanced students. The following are the results (percentage rounded, no decimals) of 40 students:
73 82 70 74 87 69 22 49 73 52
86 45 19 15 2 51 3 23 42 50
69 58 89 71 59 70 47 41 51 71
67 69 60 38 74 56 67 56 46 70
- The results should be sorted:
2 3 15 19 22 23 38 41 42 45 46 47 49 50 51 51 52 56 56 58 59 60 67 67 69 69 69 70 70 70 71 71 73 73 74 74 82 86 87 89
- We count the number of elements,
n
.
In our case n = 40
- We determine the minimum and maximum results. Then, the range of results.
In our case, min = 2
and max = 89
Then the range is:
range = max - min = 89 - 2 = 87
- The number of intervals or classes, denoted as k, is determined using the Sturges' rule, which employs the base 10 logarithm (log10):
In our case:k = 1 + 3.32 * log10 (n)
k = 1 + 3.32 * log10 (40) = 6.3189
The value of k should be rounded to an integer value, so k = 6
.
- The amplitude,
A
, of each class or interval is determined by:
In our case:A = range / k
A = 87 / 6 = 14.5
Let's take the ceiling of A
, so A = 15.0
We determine the values for the classes:
min -----> min + A -----> min + 2A -----> min -----> min + 3A ----->...-----> min + kA
In our case will be :
2 ----> 17 ----> 32 ----> 47----> 62----> 77----> 92
Now we have to determine the lower and upper bounds for each intervals. Our variable is discrete (rounded percentages), so it would be:
lower upper dif 2 16 14 17 31 14 32 46 14 47 61 14 62 76 14 77 92 15
Now we have to determine the frequencies,
F
each class or interval hasNum. of class lower upper values F 1 2 16 2, 3, 15 3 2 17 31 19, 22, 23 3 3 32 46 38,41,42,45,46 5 4 47 61 47,49,50,51,51,52,56,56,58,59,60 11 5 62 76 67,67,69,69,69,70,70,70,71,71,73,73,74,74 14 6 77 92 82,86,87,89 4
Finally, it should be calculated the accumulated frequency
Fa+
, for each class (last column). It would be:Num. of class lower upper F Fa+ 1 2 16 3 3 2 17 31 3 6 3 32 46 5 11 4 47 61 11 22 5 62 76 14 36 6 77 92 4 40
The accumulated frequency for a class, is the amount of values that are beloww the upper bound of this class.
This university prepared a MOOC course and they estimate a number of 80000 students worldwide to take the course. They have been doing this work manually but now they need a code.
Could you help in creating a code to do the histograms for thousands of students?
For that task we need a function, let's name it hist_maker()
, will receive an array of results for each student and will have to output the a list of lists, each of these lists having the result of the classes in order to make a graph.
The data for each class:
[Num. of class, [lower, upper], F, Fa+]
For arrays with a lot of elements we should have that the lower bound will be equal to upper bound(will be a value) and the output should be reduced to appear the percentage value twice:
[Num. of class, [value, value], F, Fa+]
For our case will be:
students_results = [73, 82, 70, 74, 87, 69, 22, 49, 73, 52,
86, 45, 19, 15, 2, 51, 3, 23, 42, 50,
69, 58, 89, 71, 59, 70, 47, 41, 51, 71,
67, 69, 60, 38, 74, 56, 67, 56, 46, 70]
hist_maker(students_results) == [[1, [2, 16], 3, 3], [2, [17, 31], 3, 6], [3, [32, 46], 5, 11], [4, [47, 61], 11, 22], [5, [62, 76], 14, 36], [6, [77, 92], 4, 40]]
Three important tips:
If ΣF ≠ n something has to be wrong.
In our case ΣF = 3 + 3 + 5 + 11 + 14 + 4 = 40
If Fa+ for the last class is different from n (number of elements introduced)
something has to be wrong.In our case for the 6-th class is 40.
The upper bound can't be higher than 100, so your code should correct every upper bound more than 100. Suposee that the interval for the last obtained using Sturges'rule is
[97, 102]
, it should be[97, 100]
10 <= len(students_results) <= 10^5
Enjoy it!!
Similar Kata:
Stats:
Created | Nov 18, 2015 |
Published | Nov 19, 2015 |
Warriors Trained | 293 |
Total Skips | 44 |
Total Code Submissions | 507 |
Total Times Completed | 27 |
Python Completions | 27 |
Total Stars | 9 |
% of votes with a positive feedback rating | 93% of 14 |
Total "Very Satisfied" Votes | 12 |
Total "Somewhat Satisfied" Votes | 2 |
Total "Not Satisfied" Votes | 0 |
Total Rank Assessments | 11 |
Average Assessed Rank | 6 kyu |
Highest Assessed Rank | 4 kyu |
Lowest Assessed Rank | 7 kyu |