From Test-Scratch-Wiki

Revision as of 16:30, 26 March 2018 by HY2009 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

标准差,中文环境中又常称“均方差”,标准差能反映一个数据集的离散程度。平均数相同的两组数据,标准差未必相同。标准差是一组数据平均值分散程度的一种度量。一个较大的标准差,代表大部分数值和其平均值之间差异较大;一个较小的标准差,代表这些数值较接近平均值。

Scratch中如何操作

标准差的定义

标准差是方差的算术平方根。

什么是方差? 方差(样本方差)是每个样本值与全体样本值的平均数之差的平方值的平均数。概率论中方差用来度量随机变量和其数学期望(即均值)之间的偏离程度。我们来看一个例子来说明这一点。

假设我们有5朵花,其高度为25厘米,60厘米,40厘米,45厘米和55厘米。他们的平均高度是:

(25 + 60 + 40 + 45 + 55) / 5 = 45

这告诉我们花的平均高度是45厘米。那么,花朵的方差是什么?

Flower #1: ((25) - (45))^2 = (-20)^2 = 400
Flower #2: ((40) - (45))^2 = (-5)^2 = 25
Flower #3: ((45) - (45))^2 = (0)^2 = 0
Flower #4: ((55) - (45))^2 = (10)^2 = 100
Flower #5: ((60) - (45))^2 = (15)^2 = 225

(400 + 25 + 0 + 100 + 225) / 5 = 150

所以花的方差是150厘米。花的标准偏差因此等于150的平方根,即大约12.247 ...

有两种标准差:

1、总体标准差,针对总体数据的偏差。例如,如果世界上只有5朵花,那么12.247就是花高度的总体标准差。

2、样本标准差。样本标准差是只有一部分数据的标准差。例如:我们拿五朵花。世界上显然有五朵以上的花,所以五朵花只是全部数据的一部分。针对从总体抽样,利用样本来计算总体偏差。就必须将算出的标准偏差的值适度放大。

两种标准偏差之间唯一的区别是如何计算方差。总体标准差将遵循上例规则,然而,样本标准差将取平均值的平方差的总和,然后除以数据集的数量减1。例如,让我们回顾一下花朵并重新计算它们的方差:

(400 + 25 + 0 + 100 + 225) / (5 - 1) = 187.5

在这里,5是已知高度的花朵的数量。从中减去1,因为这是一个样本标准差。计算样本标准差的最后一步是取187.5的平方根,即大约13.693 ...

变量

在本教程中将需要一个列表:

  • 数据集列表

该列表将包含所有数据样本,如花朵的高度。

同时,还需要七个变量:

  • Average
  • Sum
  • Variance
  • Standard Deviation
  • Number
  • Sum2
  • Number2

代码

本教程先演示计算样本标准差。

计算“样本标准差”的第一步是计算出一些数字的“平均值”。该脚本如下所示:

when gf clicked
set [Sum v] to (0)//初始化变量。
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (item (Number) of [Data v])
  change [Number v] by (1)//变量(Number) 是代码取数据集列表的指针。
end
set [Average v] to ((Sum) / (length of [Data v]))

计算“样本标准差”的第二步是计算“方差”:

when gf clicked
set [Sum v] to (0)//Resetting the variables.
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (item (Number) of [Data v])
  change [Number v] by (1)
end
set [Average v] to ((Sum) / (length of [Data v]))
set [Sum2 v] to (0)
set [Number2 v] to (1)
repeat (length of [Data v])
  change [Sum2 v] by (((item (Number2) of [Data v]) - (Average)) * ((item (Number2) of [Data v]) - (Average)))
  change [Number2 v] by (1)
end
set [Variance v] to ((Sum2) / ((length of [Data v]) - (1)))

计算样本标准差的最后一步是取方差的平方根:

when gf clicked
set [Sum v] to (0)//Resetting the variables.
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (item (Number) of [Data v])
  change [Number v] by (1)
end
set [Average v] to ((Sum) / (length of [Data v]))
set [Sum2 v] to (0)
set [Number2 v] to (1)
repeat (length of [Data v])
  change [Sum2 v] by (((item (Number2) of [Data v]) - (Average)) * ((item (Number2) of [Data v]) - (Average)))
  change [Number2 v] by (1)
end
set [Variance v] to ((Sum2) / ((length of [Data v]) - (1)))
set [Standard Deviation v] to ([sqrt v] of (Variance))

计算“总体标准差”,只需要很小的调整,代码如下:

when gf clicked
set [Sum v] to (0)//Resetting the variables.
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (item (Number) of [Data v])
  change [Number v] by (1)
end
set [Average v] to ((Sum) / (length of [Data v]))
set [Sum2 v] to (0)
set [Number2 v] to (1)
repeat (length of [Data v])
  change [Sum2 v] by (((item (Number2) of [Data v]) - (Average)) * ((item (Number2) of [Data v]) - (Average)))
  change [Number2 v] by (1)
end
set [Variance v] to ((Sum2) / ((length of [Data v]) - (1)))
set [Standard Deviation v] to ([sqrt v] of (Variance))

完整代码

计算“样本标准差”的代码是:

when gf clicked
set [Sum v] to (0)//Resetting the variables.
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (item (Number) of [Data v])
  change [Number v] by (1)
end
set [Average v] to ((Sum) / (length of [Data v]))
set [Sum2 v] to (0)
set [Number2 v] to (1)
repeat (length of [Data v])
  change [Sum2 v] by (((item (Number2) of [Data v]) - (Average)) * ((item (Number2) of [Data v]) - (Average)))
  change [Number2 v] by (1)
end
set [Variance v] to ((Sum2) / ((length of [Data v]) - (1)))
set [Standard Deviation v] to ([sqrt v] of (Variance))

计算总体标准差的代码是:

when gf clicked
set [Sum v] to (0)//Resetting the variables.
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (item (Number) of [Data v])
  change [Number v] by (1)
end
set [Average v] to ((Sum) / (length of [Data v]))
set [Sum v] to (0)//Resetting the variables.
set [Number v] to (1)
repeat (length of [Data v])
  change [Sum v] by (((item (Number) of [Data v]) - (Average)) * ((item (Number) of [Data v]) - (Average)))
  change [Number v] by (1)
end
set [Variance v] to ((Sum) / (length of [Data v]))
set [Standard Deviation v] to ([sqrt v] of (Variance))

相关链接