数据准备 Sample
(数据准备/数据采样/数据获取/数据采集)
通过使用一个或多个数据表提取和准备用于模型构建的数据样本来对数据进行采样。
采样包括定义或子集数据行的操作。样本应足够大,以有效地包含重要信息。
[1]:
from westat import *
# westat 自带了 GiveMeSomeCredit 和 UCI_Credit_Card 两个数据集,可使用 GiveMeSomeCredit() 或 credit_card() 导入相关数据
# data=GiveMeSomeCredit()
# data_train = data.train
# data_test = data.test
# data=credit_card()
data=credit_card()
data.head()
[1]:
ID | LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_0 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | PAY_6 | BILL_AMT1 | BILL_AMT2 | BILL_AMT3 | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 20000.00 | 2 | 2 | 1 | 24 | 2 | 2 | -1 | -1 | -2 | -2 | 3913.00 | 3102.00 | 689.00 | 0.00 | 0.00 | 0.00 | 0.00 | 689.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1 |
1 | 2 | 120000.00 | 2 | 2 | 2 | 26 | -1 | 2 | 0 | 0 | 0 | 2 | 2682.00 | 1725.00 | 2682.00 | 3272.00 | 3455.00 | 3261.00 | 0.00 | 1000.00 | 1000.00 | 1000.00 | 0.00 | 2000.00 | 1 |
2 | 3 | 90000.00 | 2 | 2 | 2 | 34 | 0 | 0 | 0 | 0 | 0 | 0 | 29239.00 | 14027.00 | 13559.00 | 14331.00 | 14948.00 | 15549.00 | 1518.00 | 1500.00 | 1000.00 | 1000.00 | 1000.00 | 5000.00 | 0 |
3 | 4 | 50000.00 | 2 | 2 | 1 | 37 | 0 | 0 | 0 | 0 | 0 | 0 | 46990.00 | 48233.00 | 49291.00 | 28314.00 | 28959.00 | 29547.00 | 2000.00 | 2019.00 | 1200.00 | 1100.00 | 1069.00 | 1000.00 | 0 |
4 | 5 | 50000.00 | 1 | 2 | 1 | 57 | -1 | 0 | -1 | 0 | 0 | 0 | 8617.00 | 5670.00 | 35835.00 | 20940.00 | 19146.00 | 19131.00 | 2000.00 | 36681.00 | 10000.00 | 9000.00 | 689.00 | 679.00 | 0 |
[2]:
# 将目标变量重命名为“y”
data.rename(columns={'target':'y'},inplace=True)
data.head()
[2]:
ID | LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_0 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | PAY_6 | BILL_AMT1 | BILL_AMT2 | BILL_AMT3 | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 20000.00 | 2 | 2 | 1 | 24 | 2 | 2 | -1 | -1 | -2 | -2 | 3913.00 | 3102.00 | 689.00 | 0.00 | 0.00 | 0.00 | 0.00 | 689.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1 |
1 | 2 | 120000.00 | 2 | 2 | 2 | 26 | -1 | 2 | 0 | 0 | 0 | 2 | 2682.00 | 1725.00 | 2682.00 | 3272.00 | 3455.00 | 3261.00 | 0.00 | 1000.00 | 1000.00 | 1000.00 | 0.00 | 2000.00 | 1 |
2 | 3 | 90000.00 | 2 | 2 | 2 | 34 | 0 | 0 | 0 | 0 | 0 | 0 | 29239.00 | 14027.00 | 13559.00 | 14331.00 | 14948.00 | 15549.00 | 1518.00 | 1500.00 | 1000.00 | 1000.00 | 1000.00 | 5000.00 | 0 |
3 | 4 | 50000.00 | 2 | 2 | 1 | 37 | 0 | 0 | 0 | 0 | 0 | 0 | 46990.00 | 48233.00 | 49291.00 | 28314.00 | 28959.00 | 29547.00 | 2000.00 | 2019.00 | 1200.00 | 1100.00 | 1069.00 | 1000.00 | 0 |
4 | 5 | 50000.00 | 1 | 2 | 1 | 57 | -1 | 0 | -1 | 0 | 0 | 0 | 8617.00 | 5670.00 | 35835.00 | 20940.00 | 19146.00 | 19131.00 | 2000.00 | 36681.00 | 10000.00 | 9000.00 | 689.00 | 679.00 | 0 |