From Test-Scratch-Wiki

(Created page with "'''Finding the Standard Deviation of Numbers''' simply means figuring out how much the numbers deviate from each other, or basically how spread apart a set of numbers is. It i...")
 
 
Line 1: Line 1:
'''Finding the Standard Deviation of Numbers''' simply means figuring out how much the numbers deviate from each other, or basically how spread apart a set of numbers is. It is a common value used in Statistics.
+
'''标准差''',中文环境中又常称“均方差”,标准差能反映一个数据集的离散程度。平均数相同的两组数据,标准差未必相同。标准差是一组数据平均值分散程度的一种度量。一个较大的标准差,代表大部分数值和其平均值之间差异较大;一个较小的标准差,代表这些数值较接近平均值。
  
==How to do it in Scratch==
+
==Scratch中如何操作==
  
===Definition of Standard Deviation===
+
===标准差的定义===
  
Standard deviation is the square root of variance.
+
标准差是方差的算术平方根。
  
What is variance? Variance is the [[Finding the Mode of Numbers|average]] of the squared differences from the mean. Both standard deviation and variance are measures of spread—that is, how much much a set of numbers vary (of how far they are apart).  Let's take a look at an example to make this clearer.
+
什么是方差? 方差(样本方差)是每个样本值与全体样本值的平均数之差的平方值的平均数。概率论中方差用来度量随机变量和其数学期望(即均值)之间的偏离程度。我们来看一个例子来说明这一点。
  
Let's say we have 5 flowers, whose heights are 25 centimeters, 60 centimeters, 40 centimeters, 45 centimeters, and 55 centimeters. The average of their heights is:
+
假设我们有5朵花,其高度为25厘米,60厘米,40厘米,45厘米和55厘米。他们的平均高度是:
  
 
(25 + 60 + 40 + 45 + 55) / 5 = 45
 
(25 + 60 + 40 + 45 + 55) / 5 = 45
  
This tells us that the average height of the flowers is 45 centimeters. What is the flowers' variance then?
+
这告诉我们花的平均高度是45厘米。那么,花朵的方差是什么?
  
 
Flower #1: ((25) - (45))^2 = (-20)^2 = 400<br />
 
Flower #1: ((25) - (45))^2 = (-20)^2 = 400<br />
Line 23: Line 23:
 
(400 + 25 + 0 + 100 + 225) / 5 = 150
 
(400 + 25 + 0 + 100 + 225) / 5 = 150
  
So the variance of the flowers is 150 centimeters. The standard deviation of the flowers is therefore equal to the square root of 150, or about 12.247...
+
所以花的方差是150厘米。花的标准偏差因此等于150的平方根,即大约12.247 ...
  
There are two types of standard deviation though. One type of standard deviation, know as population standard deviation, is the standard deviation of an entire population. For example, if there were only 5 flowers in the world, then 12.247 would be the population standard deviation of the flowers' heights.
+
有两种标准差:
  
The other type of standard deviation is called sample standard deviation. Sample standard deviation is the standard deviation of only part of a population. For example, let's take the five flowers. There are obviously more than 5 flowers in the world, so the five flowers are only part of a population. A sample standard deviation would be needed.
+
1、总体标准差,针对总体数据的偏差。例如,如果世界上只有5朵花,那么12.247就是花高度的总体标准差。
  
The only difference between the two types of standard deviation is how variance is calculated. Population standard deviation would follow the rules described above. Sample standard deviation, though, would take the sum of the squared differences from the mean, and then divide that by the number of data points minus one. For example, let's take a look back at the flowers and recalculate their variance:
+
2、样本标准差。样本标准差是只有一部分数据的标准差。例如:我们拿五朵花。世界上显然有五朵以上的花,所以五朵花只是全部数据的一部分。针对从总体抽样,利用样本来计算总体偏差。就必须将算出的标准偏差的值适度放大。
 +
 
 +
两种标准偏差之间唯一的区别是如何计算方差。总体标准差将遵循上例规则,然而,样本标准差将取平均值的平方差的总和,然后除以数据集的数量减1。例如,让我们回顾一下花朵并重新计算它们的方差:
  
 
(400 + 25 + 0 + 100 + 225) / (5 - 1) = 187.5
 
(400 + 25 + 0 + 100 + 225) / (5 - 1) = 187.5
  
Here, 5 is the number of flowers that have known heights. 1 is subtracted from that because this is a sample standard deviation. The last step in calculating the sample standard deviation is to take the square root of 187.5, which is about 13.693...
+
在这里,5是已知高度的花朵的数量。从中减去1,因为这是一个样本标准差。计算样本标准差的最后一步是取187.5的平方根,即大约13.693 ...
  
===Variables===
+
===变量===
  
A [[list]] will be needed during this tutorial:
+
在本教程中将需要一个[[列表]]:
  
*Data
+
*数据集列表
 +
该列表将包含所有数据样本,如花朵的高度。
  
This list will hold all data samples, like the height of flowers. Meanwhile, seven [[variable]]s will also be needed:
+
同时,还需要七个变量:
  
 
*Average
 
*Average
Line 51: Line 54:
 
*Number2
 
*Number2
  
===Coding===
+
===代码===
  
The beginning of this tutorial will be for sample standard deviation.
+
本教程先演示计算样本标准差。
  
The first step in figuring out sample standard deviation is figuring out the average of some numbers. The [[script]] is shown below:
+
计算“样本标准差”的第一步是计算出一些数字的“平均值”。该脚本如下所示:
  
 
<scratchblocks>
 
<scratchblocks>
 
when gf clicked
 
when gf clicked
set [Sum v] to (0)//Resetting the variables.
+
set [Sum v] to (0)//初始化变量。
 
set [Number v] to (1)
 
set [Number v] to (1)
 
repeat (length of [Data v])
 
repeat (length of [Data v])
 
   change [Sum v] by (item (Number) of [Data v])
 
   change [Sum v] by (item (Number) of [Data v])
   change [Number v] by (1)//The variable (Number) helps keep track of what number the script is on.
+
   change [Number v] by (1)//变量(Number) 是代码取数据集列表的指针。
 
end
 
end
 
set [Average v] to ((Sum) / (length of [Data v]))
 
set [Average v] to ((Sum) / (length of [Data v]))
 
</scratchblocks>
 
</scratchblocks>
  
The second step in calculating sample standard deviation is calculating the variance:
+
计算“样本标准差”的第二步是计算“方差”:
  
 
<scratchblocks>
 
<scratchblocks>
Line 88: Line 91:
 
</scratchblocks>
 
</scratchblocks>
  
The final step in calculating sample standard deviation is taking the square root of variance:
+
计算样本标准差的最后一步是取方差的平方根:
  
 
<scratchblocks>
 
<scratchblocks>
Line 109: Line 112:
 
</scratchblocks>
 
</scratchblocks>
  
Only a small tweak is needed to calculate population standard deviation. The code for that would be:
+
计算“总体标准差”,只需要很小的调整,代码如下:
  
 
<scratchblocks>
 
<scratchblocks>
Line 130: Line 133:
 
</scratchblocks>
 
</scratchblocks>
  
===Final Product===
+
===完整代码===
  
The code for calculating sample standard deviation is:
+
计算“样本标准差”的代码是:
  
 
<scratchblocks>
 
<scratchblocks>
Line 153: Line 156:
 
</scratchblocks>
 
</scratchblocks>
  
The code for calculating population standard deviation is:
+
计算总体标准差的代码是:
  
 
<scratchblocks>
 
<scratchblocks>
Line 174: Line 177:
 
</scratchblocks>
 
</scratchblocks>
  
==See Also==
+
==相关链接==
  
 
*[[Finding the Mode of Numbers]]
 
*[[Finding the Mode of Numbers]]

Latest revision as of 16:30, 26 March 2018

标准差,中文环境中又常称“均方差”,标准差能反映一个数据集的离散程度。平均数相同的两组数据,标准差未必相同。标准差是一组数据平均值分散程度的一种度量。一个较大的标准差,代表大部分数值和其平均值之间差异较大;一个较小的标准差,代表这些数值较接近平均值。

Scratch中如何操作

标准差的定义

标准差是方差的算术平方根。

什么是方差? 方差(样本方差)是每个样本值与全体样本值的平均数之差的平方值的平均数。概率论中方差用来度量随机变量和其数学期望(即均值)之间的偏离程度。我们来看一个例子来说明这一点。

假设我们有5朵花,其高度为25厘米,60厘米,40厘米,45厘米和55厘米。他们的平均高度是:

(25 + 60 + 40 + 45 + 55) / 5 = 45

这告诉我们花的平均高度是45厘米。那么,花朵的方差是什么?

Flower #1: ((25) - (45))^2 = (-20)^2 = 400
Flower #2: ((40) - (45))^2 = (-5)^2 = 25
Flower #3: ((45) - (45))^2 = (0)^2 = 0
Flower #4: ((55) - (45))^2 = (10)^2 = 100
Flower #5: ((60) - (45))^2 = (15)^2 = 225

(400 + 25 + 0 + 100 + 225) / 5 = 150

所以花的方差是150厘米。花的标准偏差因此等于150的平方根,即大约12.247 ...

有两种标准差:

1、总体标准差,针对总体数据的偏差。例如,如果世界上只有5朵花,那么12.247就是花高度的总体标准差。

2、样本标准差。样本标准差是只有一部分数据的标准差。例如:我们拿五朵花。世界上显然有五朵以上的花,所以五朵花只是全部数据的一部分。针对从总体抽样,利用样本来计算总体偏差。就必须将算出的标准偏差的值适度放大。

两种标准偏差之间唯一的区别是如何计算方差。总体标准差将遵循上例规则,然而,样本标准差将取平均值的平方差的总和,然后除以数据集的数量减1。例如,让我们回顾一下花朵并重新计算它们的方差:

(400 + 25 + 0 + 100 + 225) / (5 - 1) = 187.5

在这里,5是已知高度的花朵的数量。从中减去1,因为这是一个样本标准差。计算样本标准差的最后一步是取187.5的平方根,即大约13.693 ...

变量

在本教程中将需要一个列表:

  • 数据集列表

该列表将包含所有数据样本,如花朵的高度。

同时,还需要七个变量:

  • Average
  • Sum
  • Variance
  • Standard Deviation
  • Number
  • Sum2
  • Number2

代码

本教程先演示计算样本标准差。

计算“样本标准差”的第一步是计算出一些数字的“平均值”。该脚本如下所示:

when gf clicked
set [Sum v] to (0)//初始化变量。
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (item (Number) of [Data v])
  change [Number v] by (1)//变量(Number) 是代码取数据集列表的指针。
end
set [Average v] to ((Sum) / (length of [Data v]))

计算“样本标准差”的第二步是计算“方差”:

when gf clicked
set [Sum v] to (0)//Resetting the variables.
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (item (Number) of [Data v])
  change [Number v] by (1)
end
set [Average v] to ((Sum) / (length of [Data v]))
set [Sum2 v] to (0)
set [Number2 v] to (1)
repeat (length of [Data v])
  change [Sum2 v] by (((item (Number2) of [Data v]) - (Average)) * ((item (Number2) of [Data v]) - (Average)))
  change [Number2 v] by (1)
end
set [Variance v] to ((Sum2) / ((length of [Data v]) - (1)))

计算样本标准差的最后一步是取方差的平方根:

when gf clicked
set [Sum v] to (0)//Resetting the variables.
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (item (Number) of [Data v])
  change [Number v] by (1)
end
set [Average v] to ((Sum) / (length of [Data v]))
set [Sum2 v] to (0)
set [Number2 v] to (1)
repeat (length of [Data v])
  change [Sum2 v] by (((item (Number2) of [Data v]) - (Average)) * ((item (Number2) of [Data v]) - (Average)))
  change [Number2 v] by (1)
end
set [Variance v] to ((Sum2) / ((length of [Data v]) - (1)))
set [Standard Deviation v] to ([sqrt v] of (Variance))

计算“总体标准差”,只需要很小的调整,代码如下:

when gf clicked
set [Sum v] to (0)//Resetting the variables.
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (item (Number) of [Data v])
  change [Number v] by (1)
end
set [Average v] to ((Sum) / (length of [Data v]))
set [Sum2 v] to (0)
set [Number2 v] to (1)
repeat (length of [Data v])
  change [Sum2 v] by (((item (Number2) of [Data v]) - (Average)) * ((item (Number2) of [Data v]) - (Average)))
  change [Number2 v] by (1)
end
set [Variance v] to ((Sum2) / ((length of [Data v]) - (1)))
set [Standard Deviation v] to ([sqrt v] of (Variance))

完整代码

计算“样本标准差”的代码是:

when gf clicked
set [Sum v] to (0)//Resetting the variables.
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (item (Number) of [Data v])
  change [Number v] by (1)
end
set [Average v] to ((Sum) / (length of [Data v]))
set [Sum2 v] to (0)
set [Number2 v] to (1)
repeat (length of [Data v])
  change [Sum2 v] by (((item (Number2) of [Data v]) - (Average)) * ((item (Number2) of [Data v]) - (Average)))
  change [Number2 v] by (1)
end
set [Variance v] to ((Sum2) / ((length of [Data v]) - (1)))
set [Standard Deviation v] to ([sqrt v] of (Variance))

计算总体标准差的代码是:

when gf clicked
set [Sum v] to (0)//Resetting the variables.
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (item (Number) of [Data v])
  change [Number v] by (1)
end
set [Average v] to ((Sum) / (length of [Data v]))
set [Sum v] to (0)//Resetting the variables.
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (((item (Number) of [Data v]) - (Average)) * ((item (Number) of [Data v]) - (Average)))
  change [Number v] by (1)
end
set [Variance v] to ((Sum) / (length of [Data v]))
set [Standard Deviation v] to ([sqrt v] of (Variance))

相关链接