（一）机器学习中的集成学习入门

（二）bagging 方法

（三）使用Python进行交易的随机森林算法

（四）Python中随机森林的实现与解释

（五）如何用 Python 从头开始实现 Bagging 算法

（六）如何利用Python从头开始实现随机森林算法

### 算法

weight smart polite fit attractive
180 no no no no
150 yes Yes no no
175 yes Yes yes yes
165 Yes yes yes yes
190 no yes No no
201 yes yes yes yes
185 yes yes no yes
168 Yes No Yes yes

#### Step 1：初始化样本权重

weight smart polite fit attractive Sample weight
180 no no no no 1/8
150 yes Yes no no 1/8
175 yes Yes yes yes 1/8
165 Yes yes yes yes 1/8
190 no yes No no 1/8
201 yes yes yes yes 1/8
185 yes yes no yes 1/8
168 Yes No Yes yes 1/8

#### Step 3：计算最终分类中树的重要性

s i g n i f i c a n c e = 1 2 l o g ( 1 − t o t a l e r r o r t o t a l e r r o r ) significance = \frac{1}{2}log(\frac{1-totalerror}{totalerror})

totalerror 是错误分类的样本的权重之和。回到我们的例子，total error 应该等于如下：

t o t a l e r r o r = s u m o f w e i g h t s f o r i n c o r r e c t l y c l a s s i f i e d s a m p l e s = 1 8 ∗ 1 = 1 8 total error = sum of weights for incorrectly classified samples= \frac{1}{8} * 1 = \frac{1}{8}

s i g n i f i c a n c e = 1 2 l o g ( 1 − 1 8 1 8 ) = 0.97 significance = \frac{1}{2} log(\frac{1-\frac{1}{8}}{\frac{1}{8}}) = 0.97

#### Step 4：更新样本权重，以便下一个决策树将上一个决策树所产生的错误考虑在内

n e w s a m p l e w e i g h t = s a m p l e w e i g h t ∗ e x p ( s i g n i f i c a n c e ) newsampleweight=sampleweight * exp(significance)

n e w s a m p l e w e i g h t = 1 8 ∗ e x p ( 0.97 ) = 1 8 ∗ 2.64 = 0.33 newsampleweight=\frac{1}{8} * exp(0.97) = \frac{1}{8} * 2.64 = 0.33

n e w s a m p l e w e i g h t = s a m p l e w e i g h t ∗ e x p ( − s i g n i f i c a n c e ) newsampleweight=sampleweight * exp(- significance)

n e w s a m p l e w e i g h t = 1 8 ∗ e x p ( − 0.97 ) = 1 8 ∗ 0.38 = 0.05 newsampleweight=\frac{1}{8} * exp(-0.97) = \frac{1}{8} * 0.38 = 0.05

weight smart polite fit attractive Sample weight new weight normalized weight
180 no no no no 1/8 0.05 0.07
150 yes Yes no no 1/8 0.33 0.49
175 yes Yes yes yes 1/8 0.05 0.07
165 Yes yes yes yes 1/8 0.05 0.07
190 no yes No no 1/8 0.05 0.07
201 yes yes yes yes 1/8 0.05 0.07
185 yes yes no yes 1/8 0.05 0.07
168 Yes No Yes yes 1/8 0.05 0.07

### 代码

            

from

sklearn

.

ensemble

import

from

sklearn

.

tree

import

DecisionTreeClassifier

from

sklearn

.

datasets

import

import

pandas

as

pd

import

numpy

as

np

from

sklearn

.

model_selection

import

train_test_split

from

sklearn

.

metrics

import

confusion_matrix

from

sklearn

.

preprocessing

import

LabelEncoder




            
breast_cancer

=

(

)

X

=

pd

.

DataFrame

(

breast_cancer

.

data

,

columns

=

breast_cancer

.

feature_names

)

y

=

pd

.

Categorical

.

from_codes

(

breast_cancer

.

target

,

breast_cancer

.

target_names

)




            
encoder

=

LabelEncoder

(

)

binary_encoded_y

=

pd

.

Series

(

encoder

.

fit_transform

(

y

)

)




            
train_X

,

test_X

,

train_y

,

test_y

=

train_test_split

(

X

,

binary_encoded_y

,

random_state

=

1

)




            
classifier

=

(

DecisionTreeClassifier

(

max_depth

=

1

)

,

n_estimators

=

200

)

classifier

.

fit

(

train_X

,

train_y

)




            
predictions

=

classifier

.

predict

(

test_X

)




            
confusion_matrix

(

test_y

,

predictions

)




[[86, 2], [3, 52]]

QQ号联系： 360901061

【本文对您有帮助就好】