fix: backend/core/generate_evaluation_plots.py

fix: 脚本入口路径
fix:generate_evaluation_plots
2026-03-20 17:05:02 +08:00 · 2026-03-20 17:03:27 +08:00 · 2026-03-20 17:01:30 +08:00 · 2026-03-20 16:52:24 +08:00 · 2026-03-20 16:30:08 +08:00 · 2026-03-20 16:14:11 +08:00
46 changed files with 5398 additions and 6554 deletions
--- a/README.md
+++ b/README.md
@@ -1,253 +1,279 @@
-# 基于多维特征挖掘的员工缺勤分析与预测系统
+# 中国企业员工缺勤分析与预测系统
 ## 项目简介
-本系统基于 UCI Absenteeism 数据集，利用机器学习算法对员工考勤数据进行深度分析，挖掘影响缺勤的多维度特征，构建缺勤预测模型，为企业人力资源管理提供科学、客观的决策支持。
+本项目面向企业人力资源管理与运营分析场景，围绕员工缺勤事件构建了一个集数据分析、风险预测、群体画像与可视化展示于一体的毕业设计系统。系统支持缺勤趋势分析、影响因素挖掘、单次缺勤时长预测、多模型对比以及员工群体聚类展示。
-## 功能特性
+后端采用 `Flask + scikit-learn + PyTorch`，前端采用 `Vue 3 + Element Plus + ECharts`。当前版本同时支持传统机器学习模型和 `LSTM+MLP` 深度学习模型。
-### F01 数据概览与全局统计
+## 功能模块
- 基础统计指标展示（样本总数、员工总数、缺勤总时长等）
+
 ### 1. 数据概览
 - 基础统计指标展示
 - 月度缺勤趋势分析
 - 星期分布分析
- 缺勤原因分布分析
+- 请假类型与原因分布分析
 - 季节分布分析
-### F02 多维特征挖掘与影响因素分析
+### 2. 影响因素分析
 - 特征重要性排序（基于随机森林）
 - 相关性热力图分析
 - 群体对比分析（饮酒/吸烟/学历/子女等维度）
-### F03 员工缺勤风险预测
+- 特征重要性排序
- 单次缺勤预测
+- 相关性热力图
- 风险等级评估（低/中/高）
+- 多维群体对比分析
 - 模型性能展示（R²、MSE、RMSE、MAE）
-### F04 员工画像与群体聚类
+### 3. 缺勤预测
- K-Means 聚类结果展示
+
- 员工群体雷达图
+- 单次缺勤时长预测
- 聚类散点图可视化
+- 风险等级评估
 - 多模型结果对比
 - 传统模型与深度学习模型切换
 ### 4. 员工画像
 - 聚类结果展示
 - 群体画像分析
 - 群体散点图可视化
 ## 技术栈
 ### 后端
 - Python 3.11
 - Flask 2.3.3
- scikit-learn 1.3.0
+- Flask-CORS 4.0.0
 - XGBoost 1.7.6
 - LightGBM 4.1.0
 - pandas 2.0.3
 - numpy 1.24.3
 - scikit-learn 1.3.0
 - xgboost 1.7.6
 - lightgbm 4.1.0
 - PyTorch 2.6.0
 ### 前端
- Vue 3.4
+
- Element Plus 2.4
+- Vue 3
- ECharts 5.4
+- Vite
- Axios 1.6
+- Element Plus
- Vue Router 4.2
+- ECharts
- Vite 5.0
+- Axios
 - Vue Router
 ## 项目结构
 ```text
 forsetsystem/
 ├── backend/
 │   ├── api/                      # 接口层
 │   ├── core/                     # 数据生成、特征工程、训练、聚类、深度学习
 │   ├── services/                 # 业务服务层
 │   ├── data/
 │   │   └── raw/
 │   │       └── china_enterprise_absence_events.csv
 │   ├── models/                   # 模型文件与训练工件
 │   ├── app.py                    # 后端入口
 │   ├── config.py                 # 项目配置
 │   └── requirements.txt
 ├── frontend/
 │   ├── src/
 │   │   ├── api/
 │   │   ├── router/
 │   │   ├── styles/
 │   │   ├── views/
 │   │   ├── App.vue
 │   │   └── main.js
 │   ├── package.json
 │   └── vite.config.js
 ├── docs/                         # 系统文档、论文文档与安装说明
 └── README.md
 ```
 ## 环境要求
 | 项目 | 要求 |
 |------|------|
-| 操作系统 | Windows 10/11、Linux、macOS |
+| 操作系统 | Windows 10 / Windows 11 |
 | Python | 3.11 |
 | Node.js | 16.0+ |
 | Conda | Anaconda 或 Miniconda |
-| pnpm | 8.0+ |
+| Node.js | 16+ |
 | pnpm | 8+ |
 | CUDA | 建议与 PyTorch `cu124` 轮子匹配 |
 ## 安装部署
-### 1. 克隆项目
+推荐使用 `conda` 虚拟环境，并优先安装官方 GPU 版 `PyTorch`。
-```bash
+### 1. 创建并激活 conda 环境
 git clone <repository-url>
 cd forsetsystem
 ```
 ### 2. 后端环境配置
 #### 创建 Conda 环境
 ```powershell
 conda create -n forsetenv python=3.11 -y
 conda activate forsetenv
 ```
-#### 安装机器学习库（使用 conda-forge）
+### 2. 安装 PyTorch GPU 版
 ```powershell
-conda install -c conda-forge pandas=2.0.3 numpy=1.24.3 scikit-learn=1.3.0 xgboost=1.7.6 lightgbm=4.1.0 joblib=1.3.1 -y
+pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
 ```
-#### 安装 Web 框架
+### 3. 安装其余后端依赖
 ```powershell
 pip install Flask==2.3.3 Flask-CORS==4.0.0 python-dotenv==1.0.0
 pip install pandas==2.0.3 numpy==1.24.3 scikit-learn==1.3.0 joblib==1.3.1
 pip install xgboost==1.7.6 lightgbm==4.1.0
 ```
-#### 验证安装
+如需直接使用依赖文件，可在安装 GPU 版 `PyTorch` 后执行：
 ```powershell
-python -c "import pandas,numpy,sklearn,xgboost,lightgbm,flask;print('All libraries installed successfully')"
+pip install -r backend/requirements.txt
 ```
-#### 训练模型
+### 4. 安装前端依赖
 ```powershell
 cd backend
 python core/train_model.py
 ```
 ### 3. 前端环境配置
 ```bash
 cd frontend
 pnpm install
 ```
-## 运行说明
+## 启动方式
-### 启动后端服务
+### 1. 生成数据集
 ```powershell
 conda activate forsetenv
 cd backend
 python core/generate_dataset.py
 ```
 ### 2. 训练模型
 ```powershell
 python core/train_model.py
 ```
 ### 3. 启动后端
 ```powershell
 python app.py
 ```
-后端服务运行在 http://localhost:5000
+后端默认地址：
-### 启动前端服务
+```text
 http://127.0.0.1:5000
 ```
-```bash
+### 4. 启动前端
-cd frontend
+
 ```powershell
 cd ..\frontend
 pnpm dev
 ```
-前端服务运行在 http://localhost:5173
+前端默认地址：
-### 访问系统
+```text
-
+http://127.0.0.1:5173
 打开浏览器访问 http://localhost:5173
 ## 项目结构
 ```
 forsetsystem/
 ├── backend/                      # 后端项目
 │   ├── api/                      # API 接口层
 │   │   ├── overview_routes.py    # 数据概览接口
 │   │   ├── analysis_routes.py    # 影响因素分析接口
 │   │   ├── predict_routes.py     # 预测接口
 │   │   └── cluster_routes.py     # 聚类接口
 │   ├── services/                 # 业务逻辑层
 │   ├── core/                     # 核心算法层
 │   │   ├── preprocessing.py      # 数据预处理
 │   │   ├── feature_mining.py     # 特征挖掘
 │   │   ├── train_model.py        # 模型训练
 │   │   └── clustering.py         # 聚类分析
 │   ├── data/                     # 数据存储
 │   ├── models/                   # 模型存储
 │   ├── utils/                    # 工具函数
 │   ├── app.py                    # 应用入口
 │   ├── config.py                 # 配置文件
 │   └── requirements.txt          # 依赖清单
 │
 ├── frontend/                     # 前端项目
 │   ├── src/
 │   │   ├── api/                  # API 调用
 │   │   ├── views/                # 页面组件
 │   │   ├── router/               # 路由配置
 │   │   ├── App.vue               # 根组件
 │   │   └── main.js               # 入口文件
 │   ├── index.html
 │   ├── package.json
 │   └── vite.config.js
 │
 ├── data/                         # 原始数据
 │   └── Absenteeism_at_work.csv
 │
 ├── docs/                         # 项目文档
 │   ├── 00_需求规格说明书.md
 │   ├── 01_系统架构设计.md
 │   ├── 02_接口设计文档.md
 │   ├── 03_数据设计文档.md
 │   └── 04_UI原型设计.md
 │
 └── README.md
 ```
-## API 接口
+## 模型说明
-### 数据概览模块
+当前系统支持以下模型类型：
 | 接口 | 方法 | 说明 |
 |------|------|------|
 | /api/overview/stats | GET | 基础统计指标 |
 | /api/overview/trend | GET | 月度缺勤趋势 |
 | /api/overview/weekday | GET | 星期分布 |
 | /api/overview/reasons | GET | 缺勤原因分布 |
 | /api/overview/seasons | GET | 季节分布 |
-### 影响因素分析模块
+- `random_forest`
-| 接口 | 方法 | 说明 |
+- `gradient_boosting`
-|------|------|------|
+- `extra_trees`
-| /api/analysis/importance | GET | 特征重要性 |
+- `xgboost`
-| /api/analysis/correlation | GET | 相关性矩阵 |
+- `lightgbm`
-| /api/analysis/compare | GET | 群体对比分析 |
+- `lstm_mlp`
-### 预测模块
+其中：
 | 接口 | 方法 | 说明 |
 |------|------|------|
 | /api/predict/single | POST | 单次预测 |
 | /api/predict/model-info | GET | 模型信息 |
-### 聚类模块
+- 传统模型适合结构化特征解释与特征重要性分析
-| 接口 | 方法 | 说明 |
+- `LSTM+MLP` 适合结合事件序列与静态特征进行预测
 |------|------|------|
 | /api/cluster/result | GET | 聚类结果 |
 | /api/cluster/profile | GET | 群体画像 |
 | /api/cluster/scatter | GET | 散点数据 |
-## 作者信息
+## 数据与训练文件
- **作者**：张硕
+常用路径如下：
 - **学校**：河南农业大学软件学院
 - **项目类型**：本科毕业设计
 - **完成时间**：2026年3月
-## 后续改进计划
+- 数据集文件：[china_enterprise_absence_events.csv](D:/VScodeProject/forsetsystem/backend/data/raw/china_enterprise_absence_events.csv)
 - 配置文件：[config.py](D:/VScodeProject/forsetsystem/backend/config.py)
 - 数据生成脚本：[generate_dataset.py](D:/VScodeProject/forsetsystem/backend/core/generate_dataset.py)
 - 模型训练脚本：[train_model.py](D:/VScodeProject/forsetsystem/backend/core/train_model.py)
 - 深度学习脚本：[deep_learning_model.py](D:/VScodeProject/forsetsystem/backend/core/deep_learning_model.py)
-### 模型优化
+## 接口概览
 - [ ] 引入深度学习模型（如 LSTM）处理时序特征
 - [ ] 增加模型解释性分析（SHAP 值可视化）
 - [ ] 实现模型自动调参（Optuna/Hyperopt）
 - [ ] 支持多模型集成预测
-### 功能扩展
+### 数据概览
 - [ ] 增加用户认证与权限管理
 - [ ] 支持自定义数据集上传与分析
 - [ ] 增加数据导出功能（Excel/PDF 报告）
 - [ ] 实现预测结果的批量导出
 - [ ] 增加数据可视化大屏展示
-### 技术改进
+- `GET /api/overview/stats`
- [ ] 后端迁移至 FastAPI 提升性能
+- `GET /api/overview/trend`
- [ ] 引入 Redis 缓存常用查询结果
+- `GET /api/overview/weekday`
- [ ] 使用 Docker 容器化部署
+- `GET /api/overview/reasons`
- [ ] 增加 CI/CD 自动化测试与部署
+- `GET /api/overview/seasons`
 - [ ] 前端状态管理迁移至 Pinia
-### 数据层面
+### 影响因素分析
 - [ ] 支持数据库存储（MySQL/PostgreSQL）
 - [ ] 实现数据增量更新机制
 - [ ] 增加数据质量检测与清洗功能
-## 参考资料
+- `GET /api/analysis/importance`
 - `GET /api/analysis/correlation`
 - `GET /api/analysis/compare`
- [UCI Machine Learning Repository - Absenteeism at work Data Set](https://archive.ics.uci.edu/ml/datasets/Absenteeism+at+work)
+### 缺勤预测
- [Flask 官方文档](https://flask.palletsprojects.com/)
+
- [Vue 3 官方文档](https://vuejs.org/)
+- `GET /api/predict/models`
- [Element Plus 组件库](https://element-plus.org/)
+- `GET /api/predict/model-info`
- [ECharts 图表库](https://echarts.apache.org/)
+- `POST /api/predict/single`
 - `POST /api/predict/compare`
 ### 员工画像
 - `GET /api/cluster/result`
 - `GET /api/cluster/profile`
 - `GET /api/cluster/scatter`
 ## 文档目录
 详细设计文档见：
 - [docs/README.md](D:/VScodeProject/forsetsystem/docs/README.md)
 - [09_环境配置与安装说明.md](D:/VScodeProject/forsetsystem/docs/09_环境配置与安装说明.md)
 ## 常见问题
 ### 1. `flask_cors` 缺失
 执行：
 ```powershell
 pip install Flask-CORS
 ```
 ### 2. `xgboost` 或 `lightgbm` 缺失
 执行：
 ```powershell
 pip install xgboost==1.7.6 lightgbm==4.1.0
 ```
 ### 3. PyTorch 被安装成 CPU 版
 请重新执行官方 GPU 安装命令：
 ```powershell
 pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
 ```
 ### 4. 如何确认当前使用的是 conda 环境
 ```powershell
 conda info --envs
 where python
 ```
 ## 项目信息
 - 作者：张硕
 - 学校：河南农业大学软件学院
 - 项目类型：本科毕业设计
 - 完成时间：2026 年 3 月
--- a/backend/.gitignore
+++ b/backend/.gitignore
@@ -66,4 +66,6 @@ Thumbs.db
 .mypy_cache/
 .dmypy.json
 dmypy.json
-models
+models
 data
--- a/backend/api/analysis_routes.py
+++ b/backend/api/analysis_routes.py
@@ -8,7 +8,7 @@ analysis_bp = Blueprint('analysis', __name__, url_prefix='/api/analysis')
@analysis_bp.route('/importance', methods=['GET'])
 def get_importance():
    try:
-        model_type = request.args.get('model', 'rf')
+        model_type = request.args.get('model', 'random_forest')
        result = analysis_service.get_feature_importance(model_type)
        return jsonify({
            'code': 200,
@@ -43,7 +43,7 @@ def get_correlation():
@analysis_bp.route('/compare', methods=['GET'])
 def get_compare():
    try:
-        dimension = request.args.get('dimension', 'drinker')
+        dimension = request.args.get('dimension', 'industry')
        result = analysis_service.get_group_comparison(dimension)
        return jsonify({
            'code': 200,
--- a/backend/api/cluster_routes.py
+++ b/backend/api/cluster_routes.py
@@ -49,8 +49,8 @@ def get_profile():
 def get_scatter():
    try:
        n_clusters = request.args.get('n_clusters', 3, type=int)
-        x_axis = request.args.get('x_axis', 'Age')
+        x_axis = request.args.get('x_axis', '月均加班时长')
-        y_axis = request.args.get('y_axis', 'Absenteeism time in hours')
+        y_axis = request.args.get('y_axis', '缺勤时长（小时）')
        n_clusters = max(2, min(10, n_clusters))
--- a/backend/app.py
+++ b/backend/app.py
@@ -15,7 +15,7 @@ def create_app():
    def index():
        return {
            'code': 200,
-            'message': 'Employee Absenteeism Analysis System API',
+            'message': 'China Enterprise Absence Analysis System API',
            'data': {
                'version': '1.0.0',
                'endpoints': {
--- a/backend/config.py
+++ b/backend/config.py
@@ -1,14 +1,15 @@
 import os
 BASE_DIR = os.path.dirname(os.path.abspath(__file__))
 DATA_DIR = os.path.join(BASE_DIR, 'data')
 RAW_DATA_DIR = os.path.join(DATA_DIR, 'raw')
 PROCESSED_DATA_DIR = os.path.join(DATA_DIR, 'processed')
 MODELS_DIR = os.path.join(BASE_DIR, 'models')
-RAW_DATA_PATH = os.path.join(RAW_DATA_DIR, 'Absenteeism_at_work.csv')
+RAW_DATA_FILENAME = 'china_enterprise_absence_events.csv'
 RAW_DATA_PATH = os.path.join(RAW_DATA_DIR, RAW_DATA_FILENAME)
 CLEAN_DATA_PATH = os.path.join(PROCESSED_DATA_DIR, 'clean_data.csv')
 RF_MODEL_PATH = os.path.join(MODELS_DIR, 'rf_model.pkl')
@@ -17,132 +18,133 @@ KMEANS_MODEL_PATH = os.path.join(MODELS_DIR, 'kmeans_model.pkl')
 SCALER_PATH = os.path.join(MODELS_DIR, 'scaler.pkl')
 ENCODER_PATH = os.path.join(MODELS_DIR, 'encoder.pkl')
-CSV_SEPARATOR = ';'
+CSV_SEPARATOR = ','
 RANDOM_STATE = 42
 TEST_SIZE = 0.2
-FEATURE_NAMES = [
+TARGET_COLUMN = '缺勤时长（小时）'
-    'ID',
+EMPLOYEE_ID_COLUMN = '员工编号'
-    'Reason for absence',
+COMPANY_ID_COLUMN = '企业编号'
-    'Month of absence',
+EVENT_SEQUENCE_COLUMN = '事件序号'
-    'Day of the week',
+EVENT_DATE_INDEX_COLUMN = '事件日期索引'
    'Seasons',
    'Transportation expense',
    'Distance from Residence to Work',
    'Service time',
    'Age',
    'Work load Average/day ',
    'Hit target',
    'Disciplinary failure',
    'Education',
    'Son',
    'Social drinker',
    'Social smoker',
    'Pet',
    'Weight',
    'Height',
    'Body mass index',
    'Absenteeism time in hours'
 ]
 CATEGORICAL_FEATURES = [
    'Reason for absence',
    'Month of absence',
    'Day of the week',
    'Seasons',
    'Disciplinary failure',
    'Education',
    'Social drinker',
    'Social smoker'
 ]
 NUMERICAL_FEATURES = [
    'Transportation expense',
    'Distance from Residence to Work',
    'Service time',
    'Age',
    'Work load Average/day ',
    'Hit target',
    'Son',
    'Pet',
    'Body mass index'
 ]
 REASON_NAMES = {
    0: '未知原因',
    1: '传染病',
    2: '肿瘤',
    3: '血液疾病',
    4: '内分泌疾病',
    5: '精神行为障碍',
    6: '神经系统疾病',
    7: '眼部疾病',
    8: '耳部疾病',
    9: '循环系统疾病',
    10: '呼吸系统疾病',
    11: '消化系统疾病',
    12: '皮肤疾病',
    13: '肌肉骨骼疾病',
    14: '泌尿生殖疾病',
    15: '妊娠相关',
    16: '围产期疾病',
    17: '先天性畸形',
    18: '症状体征',
    19: '损伤中毒',
    20: '外部原因',
    21: '健康因素',
    22: '医疗随访',
    23: '医疗咨询',
    24: '献血',
    25: '实验室检查',
    26: '无故缺勤',
    27: '理疗',
    28: '牙科咨询'
 }
 WEEKDAY_NAMES = {
-    2: '周一',
+    1: '周一',
-    3: '周二',
+    2: '周二',
-    4: '周三',
+    3: '周三',
-    5: '周四',
+    4: '周四',
-    6: '周五'
+    5: '周五',
    6: '周六',
    7: '周日',
 }
 SEASON_NAMES = {
-    1: '夏季',
+    1: '冬季',
-    2: '秋季',
+    2: '春季',
-    3: '冬季',
+    3: '夏季',
-    4: '春季'
+    4: '秋季',
 }
-EDUCATION_NAMES = {
+INDUSTRY_NAMES = [
-    1: '高中',
+    '制造业',
-    2: '本科',
+    '互联网',
-    3: '研究生',
+    '零售连锁',
-    4: '博士'
+    '物流运输',
-}
+    '金融服务',
    '医药健康',
    '建筑工程',
 ]
 LEAVE_TYPE_NAMES = [
    '病假',
    '事假',
    '年假',
    '调休',
    '婚假',
    '丧假',
    '产检育儿假',
    '工伤假',
    '其他',
 ]
 REASON_CATEGORY_NAMES = [
    '身体不适',
    '家庭事务',
    '子女照护',
    '交通受阻',
    '突发事件',
    '职业疲劳',
    '就医复查',
 ]
 FEATURE_NAME_CN = {
-    'ID': '员工标识',
+    '企业编号': '企业编号',
-    'Reason for absence': '缺勤原因',
+    '所属行业': '所属行业',
-    'Month of absence': '缺勤月份',
+    '企业规模': '企业规模',
-    'Day of the week': '星期几',
+    '所在城市等级': '所在城市等级',
-    'Seasons': '季节',
+    '用工类型': '用工类型',
-    'Transportation expense': '交通费用',
+    '部门条线': '部门条线',
-    'Distance from Residence to Work': '通勤距离',
+    '岗位序列': '岗位序列',
-    'Service time': '工龄',
+    '岗位级别': '岗位级别',
-    'Age': '年龄',
+    '员工编号': '员工编号',
-    'Work load Average/day ': '日均工作负荷',
+    '性别': '性别',
-    'Hit target': '达标率',
+    '年龄': '年龄',
-    'Disciplinary failure': '违纪记录',
+    '司龄年数': '司龄年数',
-    'Education': '学历',
+    '最高学历': '最高学历',
-    'Son': '子女数量',
+    '婚姻状态': '婚姻状态',
-    'Social drinker': '饮酒习惯',
+    '是否本地户籍': '是否本地户籍',
-    'Social smoker': '吸烟习惯',
+    '子女数量': '子女数量',
-    'Pet': '宠物数量',
+    '是否独生子女家庭负担': '独生子女家庭负担',
-    'Weight': '体重',
+    '居住类型': '居住类型',
-    'Height': '身高',
+    '班次类型': '班次类型',
-    'Body mass index': 'BMI指数',
+    '是否夜班岗位': '是否夜班岗位',
-    'Absenteeism time in hours': '缺勤时长'
+    '月均加班时长': '月均加班时长',
    '近30天出勤天数': '近30天出勤天数',
    '近90天缺勤次数': '近90天缺勤次数',
    '近180天请假总时长': '近180天请假总时长',
    '通勤时长分钟': '通勤时长分钟',
    '通勤距离公里': '通勤距离公里',
    '是否跨城通勤': '是否跨城通勤',
    '绩效等级': '绩效等级',
    '近12月违纪次数': '近12月违纪次数',
    '团队人数': '团队人数',
    '直属上级管理跨度': '直属上级管理跨度',
    'BMI': 'BMI',
    '是否慢性病史': '是否慢性病史',
    '年度体检异常标记': '年度体检异常',
    '近30天睡眠时长均值': '睡眠时长',
    '每周运动频次': '运动频次',
    '是否吸烟': '是否吸烟',
    '是否饮酒': '是否饮酒',
    '心理压力等级': '心理压力等级',
    '是否长期久坐岗位': '是否久坐岗位',
    '缺勤月份': '缺勤月份',
    '星期几': '星期几',
    '是否节假日前后': '节假日前后',
    '季节': '季节',
    '请假申请渠道': '请假申请渠道',
    '请假类型': '请假类型',
    '请假原因大类': '请假原因大类',
    '是否提供医院证明': '医院证明',
    '是否临时请假': '临时请假',
    '是否连续缺勤': '连续缺勤',
    '前一工作日是否加班': '前一工作日加班',
    '事件日期': '事件日期',
    '事件日期索引': '事件日期索引',
    '事件序号': '事件序号',
    '员工历史事件数': '员工历史事件数',
    '缺勤时长（小时）': '缺勤时长',
    '加班通勤压力指数': '加班通勤压力指数',
    '家庭负担指数': '家庭负担指数',
    '健康风险指数': '健康风险指数',
    '岗位稳定性指数': '岗位稳定性指数',
    '节假日风险标记': '节假日风险标记',
    '排班压力标记': '排班压力标记',
    '缺勤历史强度': '缺勤历史强度',
    '生活规律指数': '生活规律指数',
    '管理负荷指数': '管理负荷指数',
    '工龄分层': '工龄分层',
    '年龄分层': '年龄分层',
    '通勤分层': '通勤分层',
    '加班分层': '加班分层',
 }
--- a/backend/core/clustering.py
+++ b/backend/core/clustering.py
@@ -1,9 +1,6 @@
 import pandas as pd
 import numpy as np
 from sklearn.cluster import KMeans
 from sklearn.preprocessing import MinMaxScaler
 import joblib
 import os
 import config
 from core.preprocessing import get_clean_data
@@ -14,216 +11,123 @@ class KMeansAnalyzer:
        self.n_clusters = n_clusters
        self.model = None
        self.scaler = MinMaxScaler()
        self.data = None
        self.data_scaled = None
        self.labels = None
-        
+        self.feature_cols = [
-    def _get_feature_columns(self, df):
+            '年龄',
-        df.columns = [col.strip() for col in df.columns]
+            '司龄年数',
-        
+            '月均加班时长',
-        feature_map = {
+            '通勤时长分钟',
-            'Age': None,
+            'BMI',
-            'Service time': None,
+            '缺勤时长（小时）',
-            'Work load Average/day': None,
+        ]
-            'Body mass index': None,
+
            'Absenteeism time in hours': None
        }
        for key in feature_map:
            if key in df.columns:
                feature_map[key] = key
            else:
                for col in df.columns:
                    if key.replace(' ', '').lower() == col.replace(' ', '').lower():
                        feature_map[key] = col
                        break
        actual_features = [v for v in feature_map.values() if v is not None]
        return actual_features
    def fit(self, n_clusters=None):
        if n_clusters:
            self.n_clusters = n_clusters
-        
+        df = get_clean_data().reset_index(drop=True)
-        df = get_clean_data()
+        data = df[self.feature_cols].values
-        df = df.reset_index(drop=True)
+        data_scaled = self.scaler.fit_transform(data)
-        
+        self.model = KMeans(n_clusters=self.n_clusters, random_state=config.RANDOM_STATE, n_init=10)
-        feature_cols = self._get_feature_columns(df)
+        self.labels = self.model.fit_predict(data_scaled)
        if not feature_cols:
            feature_cols = ['Age', 'Service time', 'Body mass index', 'Absenteeism time in hours']
            feature_cols = [c for c in feature_cols if c in df.columns]
        self.data = df[feature_cols].values
        self.scaler = MinMaxScaler()
        self.data_scaled = self.scaler.fit_transform(self.data)
        self.model = KMeans(
            n_clusters=self.n_clusters,
            random_state=config.RANDOM_STATE,
            n_init=10
        )
        self.labels = self.model.fit_predict(self.data_scaled)
        return self.model
-    
+
    def get_cluster_results(self, n_clusters=3):
        if self.model is None or self.n_clusters != n_clusters:
            self.fit(n_clusters)
        centers = self.scaler.inverse_transform(self.model.cluster_centers_)
        unique, counts = np.unique(self.labels, return_counts=True)
        total = len(self.labels)
-        
+        names = self._generate_cluster_names(centers)
        cluster_names = self._generate_cluster_names(centers)
        feature_cols = self._get_feature_columns(get_clean_data())
        clusters = []
-        for i, (cluster_id, count) in enumerate(zip(unique, counts)):
+        for cluster_id, count in zip(unique, counts):
-            center_dict = {}
+            center = centers[int(cluster_id)]
            for j, fname in enumerate(feature_cols):
                if j < len(centers[i]):
                    center_dict[fname] = round(centers[i][j], 2)
            clusters.append({
                'id': int(cluster_id),
-                'name': cluster_names.get(cluster_id, f'群体{cluster_id+1}'),
+                'name': names.get(int(cluster_id), f'群体{int(cluster_id) + 1}'),
                'member_count': int(count),
                'percentage': round(count / total * 100, 1),
-                'center': center_dict,
+                'center': {
-                'description': self._generate_description(cluster_names.get(cluster_id, ''))
+                    feature: round(float(value), 2)
                    for feature, value in zip(self.feature_cols, center)
                },
                'description': self._generate_description(names.get(int(cluster_id), '')),
            })
-        
+        return {'n_clusters': self.n_clusters, 'clusters': clusters}
-        return {
+
            'n_clusters': self.n_clusters,
            'clusters': clusters
        }
    def get_cluster_profile(self, n_clusters=3):
        if self.model is None or self.n_clusters != n_clusters:
            self.fit(n_clusters)
        centers_scaled = self.model.cluster_centers_
-        
+        names = self._generate_cluster_names(self.scaler.inverse_transform(centers_scaled))
        df = get_clean_data()
        df.columns = [col.strip() for col in df.columns]
        feature_cols = self._get_feature_columns(df)
        dimensions = ['年龄', '工龄', '工作负荷', 'BMI', '缺勤倾向'][:len(feature_cols)]
        cluster_names = self._generate_cluster_names(
            self.scaler.inverse_transform(centers_scaled)
        )
        clusters = []
        for i in range(self.n_clusters):
            clusters.append({
                'id': i,
                'name': cluster_names.get(i, f'群体{i+1}'),
                'values': [round(v, 2) for v in centers_scaled[i]]
            })
        return {
-            'dimensions': dimensions,
+            'dimensions': ['年龄', '司龄', '加班', '通勤', 'BMI', '缺勤'],
-            'dimension_keys': feature_cols,
+            'dimension_keys': self.feature_cols,
-            'clusters': clusters
+            'clusters': [
                {
                    'id': idx,
                    'name': names.get(idx, f'群体{idx + 1}'),
                    'values': [round(float(v), 2) for v in centers_scaled[idx]],
                }
                for idx in range(self.n_clusters)
            ],
        }
-    
+
-    def get_scatter_data(self, n_clusters=3, x_axis='Age', y_axis='Absenteeism time in hours'):
+    def get_scatter_data(self, n_clusters=3, x_axis='月均加班时长', y_axis='缺勤时长（小时）'):
        if self.model is None or self.n_clusters != n_clusters:
            self.fit(n_clusters)
-        
+        df = get_clean_data().reset_index(drop=True)
-        df = get_clean_data()
+        if x_axis not in df.columns:
-        df = df.reset_index(drop=True)
+            x_axis = '月均加班时长'
-        df.columns = [col.strip() for col in df.columns]
+        if y_axis not in df.columns:
-        
+            y_axis = config.TARGET_COLUMN
        x_col = None
        y_col = None
        for col in df.columns:
            if x_axis.replace(' ', '').lower() in col.replace(' ', '').lower():
                x_col = col
            if y_axis.replace(' ', '').lower() in col.replace(' ', '').lower():
                y_col = col
        if x_col is None:
            x_col = df.columns[0]
        if y_col is None:
            y_col = df.columns[-1]
        points = []
        for idx in range(min(len(df), len(self.labels))):
            row = df.iloc[idx]
            points.append({
-                'employee_id': int(row['ID']),
+                'employee_id': str(row[config.EMPLOYEE_ID_COLUMN]),
-                'x': float(row[x_col]),
+                'x': float(row[x_axis]),
-                'y': float(row[y_col]),
+                'y': float(row[y_axis]),
-                'cluster_id': int(self.labels[idx])
+                'cluster_id': int(self.labels[idx]),
            })
        cluster_colors = {
            '0': '#67C23A',
            '1': '#E6A23C',
            '2': '#F56C6C',
            '3': '#909399',
            '4': '#409EFF'
        }
        return {
-            'x_axis': x_col,
+            'x_axis': x_axis,
-            'x_axis_name': config.FEATURE_NAME_CN.get(x_col, x_col),
+            'x_axis_name': config.FEATURE_NAME_CN.get(x_axis, x_axis),
-            'y_axis': y_col,
+            'y_axis': y_axis,
-            'y_axis_name': config.FEATURE_NAME_CN.get(y_col, y_col),
+            'y_axis_name': config.FEATURE_NAME_CN.get(y_axis, y_axis),
            'points': points[:500],
-            'cluster_colors': cluster_colors
+            'cluster_colors': {
                '0': '#5B8FF9',
                '1': '#61DDAA',
                '2': '#F6BD16',
                '3': '#E8684A',
                '4': '#6DC8EC',
            },
        }
-    
+
    def _generate_cluster_names(self, centers):
        names = {}
-        
+        for idx, center in enumerate(centers):
-        for i, center in enumerate(centers):
+            _, tenure, overtime, commute, bmi, absence = center
-            if len(center) >= 5:
+            if overtime > 38 and commute > 55 and absence > 8:
-                service_time = center[1]
+                names[idx] = '高压通勤型'
-                work_load = center[2]
+            elif bmi > 27 and absence > 8:
-                bmi = center[3]
+                names[idx] = '健康波动型'
-                absent = center[4]
+            elif tenure > 8 and absence < 6:
                names[idx] = '稳定低风险型'
            elif overtime > 28 and absence > 7:
                names[idx] = '轮班负荷型'
            else:
-                service_time = center[1] if len(center) > 1 else 0
+                names[idx] = f'群体{idx + 1}'
                work_load = 0
                bmi = center[2] if len(center) > 2 else 0
                absent = center[3] if len(center) > 3 else 0
            if service_time > 15 and absent < 3:
                names[i] = '模范型员工'
            elif work_load > 260 and absent > 5:
                names[i] = '压力型员工'
            elif bmi > 28:
                names[i] = '生活习惯型员工'
            else:
                names[i] = f'群体{i+1}'
        return names
-    
+
    def _generate_description(self, name):
        descriptions = {
-            '模范型员工': '工龄长、工作稳定、缺勤率低',
+            '高压通勤型': '加班和通勤压力都高，缺勤时长偏长。',
-            '压力型员工': '工作负荷大、缺勤较多',
+            '健康波动型': '健康相关风险更高，需要重点关注。',
-            '生活习惯型员工': 'BMI偏高、需关注健康'
+            '稳定低风险型': '司龄较长，缺勤水平稳定且偏低。',
            '轮班负荷型': '排班和工作负荷较重，缺勤风险较高。',
        }
-        return descriptions.get(name, '常规员工群体')
+        return descriptions.get(name, '常规员工群体。')
    def save_model(self):
        os.makedirs(config.MODELS_DIR, exist_ok=True)
        joblib.dump(self.model, config.KMEANS_MODEL_PATH)
    def load_model(self):
        if os.path.exists(config.KMEANS_MODEL_PATH):
            self.model = joblib.load(config.KMEANS_MODEL_PATH)
            self.n_clusters = self.model.n_clusters
 kmeans_analyzer = KMeansAnalyzer()
--- a/backend/core/deep_learning_model.py
+++ b/backend/core/deep_learning_model.py
@@ -0,0 +1,712 @@
 from __future__ import annotations
 import copy
 import os
 from typing import Dict, List, Optional, Tuple
 import numpy as np
 import pandas as pd
 from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
 import config
 from core.model_features import engineer_features
 try:
    import torch
    import torch.nn as nn
    from torch.utils.data import DataLoader, Dataset
 except ImportError:
    torch = None
    nn = None
    DataLoader = None
    Dataset = object
 WINDOW_SIZE = 8
 SEQUENCE_FEATURES = [
    '缺勤月份',
    '星期几',
    '是否节假日前后',
    '请假类型',
    '请假原因大类',
    '是否提供医院证明',
    '是否临时请假',
    '是否连续缺勤',
    '前一工作日是否加班',
    '月均加班时长',
    '通勤时长分钟',
    '是否夜班岗位',
    '是否慢性病史',
    '加班通勤压力指数',
    '缺勤历史强度',
 ]
 STATIC_FEATURES = [
    '所属行业',
    '婚姻状态',
    '岗位序列',
    '岗位级别',
    '年龄',
    '司龄年数',
    '子女数量',
    '班次类型',
    '绩效等级',
    'BMI',
    '健康风险指数',
    '家庭负担指数',
    '岗位稳定性指数',
 ]
 DEFAULT_EPOCHS = 80
 DEFAULT_BATCH_SIZE = 128
 EARLY_STOPPING_PATIENCE = 16
 TRANSFORMER_D_MODEL = 160
 TRANSFORMER_HEADS = 5
 TRANSFORMER_LAYERS = 3
 BaseTorchModule = nn.Module if nn is not None else object
 class SequenceStaticDataset(Dataset):
    def __init__(
        self,
        seq_num: np.ndarray,
        seq_cat: np.ndarray,
        static_num: np.ndarray,
        static_cat: np.ndarray,
        targets: np.ndarray,
    ):
        self.seq_num = torch.tensor(seq_num, dtype=torch.float32)
        self.seq_cat = torch.tensor(seq_cat, dtype=torch.long)
        self.static_num = torch.tensor(static_num, dtype=torch.float32)
        self.static_cat = torch.tensor(static_cat, dtype=torch.long)
        self.targets = torch.tensor(targets, dtype=torch.float32)
    def __len__(self) -> int:
        return len(self.targets)
    def __getitem__(self, index: int):
        return (
            self.seq_num[index],
            self.seq_cat[index],
            self.static_num[index],
            self.static_cat[index],
            self.targets[index],
        )
 class LearnedAttentionPooling(BaseTorchModule):
    def __init__(self, hidden_dim: int):
        super().__init__()
        self.score = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim),
            nn.Tanh(),
            nn.Linear(hidden_dim, 1),
        )
    def forward(self, sequence_x: torch.Tensor) -> torch.Tensor:
        attn_scores = self.score(sequence_x).squeeze(-1)
        attn_weights = torch.softmax(attn_scores, dim=1)
        return torch.sum(sequence_x * attn_weights.unsqueeze(-1), dim=1)
 class GatedResidualBlock(BaseTorchModule):
    def __init__(self, input_dim: int, hidden_dim: int, dropout: float = 0.15):
        super().__init__()
        self.proj = nn.Linear(input_dim, hidden_dim) if input_dim != hidden_dim else nn.Identity()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.LayerNorm(hidden_dim),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim, hidden_dim),
        )
        self.gate = nn.Sequential(
            nn.Linear(hidden_dim * 2, hidden_dim),
            nn.Sigmoid(),
        )
        self.out_norm = nn.LayerNorm(hidden_dim)
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        residual = self.proj(x)
        transformed = self.net(x)
        gate = self.gate(torch.cat([residual, transformed], dim=-1))
        return self.out_norm(residual + transformed * gate)
 class TemporalFusionRegressor(BaseTorchModule):
    def __init__(
        self,
        seq_num_dim: int,
        static_num_dim: int,
        seq_cat_cardinalities: List[int],
        static_cat_cardinalities: List[int],
    ):
        super().__init__()
        self.seq_cat_embeddings = nn.ModuleList(
            [nn.Embedding(cardinality, _embedding_dim(cardinality)) for cardinality in seq_cat_cardinalities]
        )
        self.static_cat_embeddings = nn.ModuleList(
            [nn.Embedding(cardinality, _embedding_dim(cardinality)) for cardinality in static_cat_cardinalities]
        )
        seq_cat_dim = sum(embedding.embedding_dim for embedding in self.seq_cat_embeddings)
        static_cat_dim = sum(embedding.embedding_dim for embedding in self.static_cat_embeddings)
        seq_input_dim = seq_num_dim + seq_cat_dim
        static_input_dim = static_num_dim + static_cat_dim
        self.position_embedding = nn.Parameter(torch.randn(WINDOW_SIZE, TRANSFORMER_D_MODEL) * 0.02)
        self.seq_projection = nn.Sequential(
            nn.Linear(seq_input_dim, TRANSFORMER_D_MODEL),
            nn.LayerNorm(TRANSFORMER_D_MODEL),
            nn.GELU(),
            nn.Dropout(0.12),
        )
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=TRANSFORMER_D_MODEL,
            nhead=TRANSFORMER_HEADS,
            dim_feedforward=TRANSFORMER_D_MODEL * 3,
            dropout=0.15,
            activation='gelu',
            batch_first=True,
            norm_first=True,
        )
        self.sequence_encoder = nn.TransformerEncoder(
            encoder_layer,
            num_layers=TRANSFORMER_LAYERS,
        )
        self.sequence_pool = LearnedAttentionPooling(TRANSFORMER_D_MODEL)
        self.sequence_head = nn.Sequential(
            nn.Linear(TRANSFORMER_D_MODEL * 3, 192),
            nn.LayerNorm(192),
            nn.GELU(),
            nn.Dropout(0.18),
            nn.Linear(192, 128),
            nn.GELU(),
        )
        self.static_net = nn.Sequential(
            GatedResidualBlock(static_input_dim, 128, dropout=0.15),
            GatedResidualBlock(128, 96, dropout=0.12),
        )
        self.context_gate = nn.Sequential(
            nn.Linear(128 + 96, 128 + 96),
            nn.Sigmoid(),
        )
        self.fusion = nn.Sequential(
            GatedResidualBlock(128 + 96, 160, dropout=0.18),
            nn.Dropout(0.12),
            nn.Linear(160, 96),
            nn.GELU(),
            nn.Dropout(0.08),
            nn.Linear(96, 1),
        )
        self.shortcut_head = nn.Sequential(
            nn.Linear(seq_num_dim + static_num_dim, 64),
            nn.LayerNorm(64),
            nn.GELU(),
            nn.Dropout(0.08),
            nn.Linear(64, 1),
        )
    def _embed_categorical(self, inputs: torch.Tensor, embeddings: nn.ModuleList) -> Optional[torch.Tensor]:
        if not embeddings:
            return None
        parts = [embedding(inputs[..., index]) for index, embedding in enumerate(embeddings)]
        return torch.cat(parts, dim=-1)
    def forward(self, seq_num_x, seq_cat_x, static_num_x, static_cat_x):
        seq_parts = [seq_num_x]
        seq_embedded = self._embed_categorical(seq_cat_x, self.seq_cat_embeddings)
        if seq_embedded is not None:
            seq_parts.append(seq_embedded)
        seq_input = torch.cat(seq_parts, dim=-1)
        seq_input = self.seq_projection(seq_input)
        seq_input = seq_input + self.position_embedding.unsqueeze(0)
        sequence_context = self.sequence_encoder(seq_input)
        sequence_last = sequence_context[:, -1, :]
        sequence_mean = sequence_context.mean(dim=1)
        sequence_attended = self.sequence_pool(sequence_context)
        sequence_repr = self.sequence_head(torch.cat([sequence_last, sequence_mean, sequence_attended], dim=1))
        static_parts = [static_num_x]
        static_embedded = self._embed_categorical(static_cat_x, self.static_cat_embeddings)
        if static_embedded is not None:
            static_parts.append(static_embedded)
        static_input = torch.cat(static_parts, dim=-1)
        static_repr = self.static_net(static_input)
        fused = torch.cat([sequence_repr, static_repr], dim=1)
        fused = fused * self.context_gate(fused)
        shortcut = self.shortcut_head(torch.cat([seq_num_x[:, -1, :], static_num_x], dim=1))
        return (self.fusion(fused) + shortcut).squeeze(1)
 class LSTMMLPRegressor(TemporalFusionRegressor):
    pass
 def is_available() -> bool:
    return torch is not None
 def _embedding_dim(cardinality: int) -> int:
    return int(min(24, max(4, round(cardinality ** 0.35 * 2))))
 def _split_feature_types(df: pd.DataFrame, features: List[str]) -> Tuple[List[str], List[str]]:
    categorical = []
    numerical = []
    for feature in features:
        if feature not in df.columns:
            continue
        if pd.api.types.is_numeric_dtype(df[feature]):
            numerical.append(feature)
        else:
            categorical.append(feature)
    return categorical, numerical
 def _fit_category_maps(df: pd.DataFrame, features: List[str]) -> Dict[str, Dict[str, int]]:
    category_maps = {}
    for feature in features:
        if feature not in df.columns:
            continue
        values = sorted(df[feature].astype(str).fillna('__MISSING__').unique().tolist())
        category_maps[feature] = {value: idx + 1 for idx, value in enumerate(values)}
    return category_maps
 def _encode_categorical_series(values: pd.Series, mapping: Dict[str, int]) -> np.ndarray:
    return values.astype(str).fillna('__MISSING__').map(lambda value: mapping.get(value, 0)).to_numpy(dtype=np.int64)
 def _safe_standardize(values: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
    if values.shape[1] == 0:
        return np.zeros((0,), dtype=np.float32), np.ones((0,), dtype=np.float32)
    mean = values.mean(axis=0)
    std = values.std(axis=0)
    std = np.where(std < 1e-6, 1.0, std)
    return mean.astype(np.float32), std.astype(np.float32)
 def _build_feature_layout(train_df: pd.DataFrame) -> Dict[str, List[str]]:
    used_features = sorted(set(SEQUENCE_FEATURES + STATIC_FEATURES))
    seq_cat_features, seq_num_features = _split_feature_types(train_df, SEQUENCE_FEATURES)
    static_cat_features, static_num_features = _split_feature_types(train_df, STATIC_FEATURES)
    all_cat_features = sorted(set(seq_cat_features + static_cat_features))
    return {
        'used_features': used_features,
        'seq_cat_features': seq_cat_features,
        'seq_num_features': seq_num_features,
        'static_cat_features': static_cat_features,
        'static_num_features': static_num_features,
        'all_cat_features': all_cat_features,
    }
 def _build_sequence_arrays(
    df: pd.DataFrame,
    feature_layout: Dict[str, List[str]],
    category_maps: Dict[str, Dict[str, int]],
    target_transform: str,
 ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
    df = engineer_features(df.copy())
    for feature in feature_layout['used_features']:
        if feature not in df.columns:
            df[feature] = 0
    df = df.sort_values(
        [config.EMPLOYEE_ID_COLUMN, config.EVENT_DATE_INDEX_COLUMN, config.EVENT_SEQUENCE_COLUMN]
    ).reset_index(drop=True)
    sequence_num_samples = []
    sequence_cat_samples = []
    static_num_samples = []
    static_cat_samples = []
    targets = []
    for _, group in df.groupby(config.EMPLOYEE_ID_COLUMN, sort=False):
        seq_num_values = group[feature_layout['seq_num_features']].astype(float).to_numpy(dtype=np.float32)
        static_num_values = group[feature_layout['static_num_features']].astype(float).to_numpy(dtype=np.float32)
        target_values = group[config.TARGET_COLUMN].astype(float).to_numpy(dtype=np.float32)
        if feature_layout['seq_cat_features']:
            seq_cat_values = np.column_stack(
                [
                    _encode_categorical_series(group[feature], category_maps[feature])
                    for feature in feature_layout['seq_cat_features']
                ]
            ).astype(np.int64)
        else:
            seq_cat_values = np.zeros((len(group), 0), dtype=np.int64)
        if feature_layout['static_cat_features']:
            static_cat_values = np.column_stack(
                [
                    _encode_categorical_series(group[feature], category_maps[feature])
                    for feature in feature_layout['static_cat_features']
                ]
            ).astype(np.int64)
        else:
            static_cat_values = np.zeros((len(group), 0), dtype=np.int64)
        for index in range(len(group)):
            start_index = max(0, index - WINDOW_SIZE + 1)
            num_slice = seq_num_values[start_index: index + 1]
            cat_slice = seq_cat_values[start_index: index + 1]
            num_window = np.zeros((WINDOW_SIZE, len(feature_layout['seq_num_features'])), dtype=np.float32)
            cat_window = np.zeros((WINDOW_SIZE, len(feature_layout['seq_cat_features'])), dtype=np.int64)
            num_window[-len(num_slice):] = num_slice
            if len(feature_layout['seq_cat_features']) > 0:
                cat_window[-len(cat_slice):] = cat_slice
            sequence_num_samples.append(num_window)
            sequence_cat_samples.append(cat_window)
            static_num_samples.append(static_num_values[index].astype(np.float32))
            static_cat_samples.append(static_cat_values[index].astype(np.int64))
            targets.append(float(target_values[index]))
    targets_array = np.array(targets, dtype=np.float32)
    if target_transform == 'log1p':
        targets_array = np.log1p(np.clip(targets_array, a_min=0, a_max=None)).astype(np.float32)
    return (
        np.array(sequence_num_samples, dtype=np.float32),
        np.array(sequence_cat_samples, dtype=np.int64),
        np.array(static_num_samples, dtype=np.float32),
        np.array(static_cat_samples, dtype=np.int64),
        targets_array,
    )
 def _train_validation_split(train_df: pd.DataFrame, validation_ratio: float = 0.15) -> Tuple[pd.DataFrame, pd.DataFrame]:
    employee_ids = train_df[config.EMPLOYEE_ID_COLUMN].dropna().astype(str).unique().tolist()
    rng = np.random.default_rng(config.RANDOM_STATE)
    rng.shuffle(employee_ids)
    validation_count = max(1, int(len(employee_ids) * validation_ratio))
    validation_ids = set(employee_ids[:validation_count])
    validation_df = train_df[train_df[config.EMPLOYEE_ID_COLUMN].astype(str).isin(validation_ids)].copy()
    fit_df = train_df[~train_df[config.EMPLOYEE_ID_COLUMN].astype(str).isin(validation_ids)].copy()
    if fit_df.empty or validation_df.empty:
        split_index = max(1, int(len(train_df) * (1 - validation_ratio)))
        fit_df = train_df.iloc[:split_index].copy()
        validation_df = train_df.iloc[split_index:].copy()
    return fit_df, validation_df
 def _prepare_inference_window(
    df: pd.DataFrame,
    feature_layout: Dict[str, List[str]],
    category_maps: Dict[str, Dict[str, int]],
    default_sequence_num_prefix: np.ndarray,
    default_sequence_cat_prefix: np.ndarray,
 ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
    df = engineer_features(df.copy())
    for feature in feature_layout['used_features']:
        if feature not in df.columns:
            df[feature] = 0
    row = df.iloc[0]
    seq_num_row = row[feature_layout['seq_num_features']].astype(float).to_numpy(dtype=np.float32)
    static_num_row = row[feature_layout['static_num_features']].astype(float).to_numpy(dtype=np.float32)
    if feature_layout['seq_cat_features']:
        seq_cat_row = np.array(
            [category_maps[feature].get(str(row[feature]), 0) for feature in feature_layout['seq_cat_features']],
            dtype=np.int64,
        )
    else:
        seq_cat_row = np.zeros((0,), dtype=np.int64)
    if feature_layout['static_cat_features']:
        static_cat_row = np.array(
            [category_maps[feature].get(str(row[feature]), 0) for feature in feature_layout['static_cat_features']],
            dtype=np.int64,
        )
    else:
        static_cat_row = np.zeros((0,), dtype=np.int64)
    sequence_num_window = np.vstack([default_sequence_num_prefix, seq_num_row.reshape(1, -1)]).astype(np.float32)
    if len(feature_layout['seq_cat_features']) > 0:
        sequence_cat_window = np.vstack([default_sequence_cat_prefix, seq_cat_row.reshape(1, -1)]).astype(np.int64)
    else:
        sequence_cat_window = np.zeros((WINDOW_SIZE, 0), dtype=np.int64)
    return sequence_num_window, sequence_cat_window, static_num_row, static_cat_row
 def _evaluate_model(
    model: nn.Module,
    loader: DataLoader,
    device: torch.device,
    target_transform: str,
 ) -> Tuple[float, Dict[str, float]]:
    model.eval()
    predictions = []
    targets = []
    with torch.no_grad():
        for batch_seq_num, batch_seq_cat, batch_static_num, batch_static_cat, batch_target in loader:
            batch_seq_num = batch_seq_num.to(device)
            batch_seq_cat = batch_seq_cat.to(device)
            batch_static_num = batch_static_num.to(device)
            batch_static_cat = batch_static_cat.to(device)
            batch_predictions = model(batch_seq_num, batch_seq_cat, batch_static_num, batch_static_cat)
            predictions.append(batch_predictions.cpu().numpy())
            targets.append(batch_target.numpy())
    y_pred = np.concatenate(predictions) if predictions else np.array([], dtype=np.float32)
    y_true = np.concatenate(targets) if targets else np.array([], dtype=np.float32)
    if target_transform == 'log1p':
        y_pred_eval = np.expm1(y_pred)
        y_true_eval = np.expm1(y_true)
    else:
        y_pred_eval = y_pred
        y_true_eval = y_true
    y_pred_eval = np.clip(y_pred_eval, a_min=0, a_max=None)
    mse = mean_squared_error(y_true_eval, y_pred_eval)
    metrics = {
        'r2': float(r2_score(y_true_eval, y_pred_eval)),
        'mse': float(mse),
        'rmse': float(np.sqrt(mse)),
        'mae': float(mean_absolute_error(y_true_eval, y_pred_eval)),
    }
    return metrics['rmse'], metrics
 def _compute_sample_weights(targets: torch.Tensor, target_transform: str) -> torch.Tensor:
    if target_transform == 'log1p':
        base_targets = torch.expm1(targets)
    else:
        base_targets = targets
    normalized = torch.clamp(base_targets / 12.0, min=0.0, max=2.0)
    return 1.0 + normalized * 0.8
 def train_lstm_mlp(
    train_df: pd.DataFrame,
    test_df: pd.DataFrame,
    model_path: str,
    target_transform: str = 'log1p',
    epochs: int = DEFAULT_EPOCHS,
    batch_size: int = DEFAULT_BATCH_SIZE,
 ) -> Optional[Dict]:
    if torch is None:
        return None
    fit_df, validation_df = _train_validation_split(train_df)
    feature_layout = _build_feature_layout(fit_df)
    category_maps = _fit_category_maps(fit_df, feature_layout['all_cat_features'])
    train_seq_num, train_seq_cat, train_static_num, train_static_cat, y_train = _build_sequence_arrays(
        fit_df, feature_layout, category_maps, target_transform
    )
    val_seq_num, val_seq_cat, val_static_num, val_static_cat, y_val = _build_sequence_arrays(
        validation_df, feature_layout, category_maps, target_transform
    )
    test_seq_num, test_seq_cat, test_static_num, test_static_cat, y_test_aligned = _build_sequence_arrays(
        test_df, feature_layout, category_maps, target_transform
    )
    seq_mean, seq_std = _safe_standardize(train_seq_num.reshape(-1, train_seq_num.shape[-1]))
    static_mean, static_std = _safe_standardize(train_static_num)
    train_seq_num = ((train_seq_num - seq_mean) / seq_std).astype(np.float32)
    val_seq_num = ((val_seq_num - seq_mean) / seq_std).astype(np.float32)
    test_seq_num = ((test_seq_num - seq_mean) / seq_std).astype(np.float32)
    train_static_num = ((train_static_num - static_mean) / static_std).astype(np.float32)
    val_static_num = ((val_static_num - static_mean) / static_std).astype(np.float32)
    test_static_num = ((test_static_num - static_mean) / static_std).astype(np.float32)
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    if device.type == 'cuda':
        print(f'[lstm_mlp] Training device: CUDA ({torch.cuda.get_device_name(device)})')
    else:
        print('[lstm_mlp] Training device: CPU')
    model = TemporalFusionRegressor(
        seq_num_dim=train_seq_num.shape[-1],
        static_num_dim=train_static_num.shape[-1],
        seq_cat_cardinalities=[len(category_maps[feature]) + 1 for feature in feature_layout['seq_cat_features']],
        static_cat_cardinalities=[len(category_maps[feature]) + 1 for feature in feature_layout['static_cat_features']],
    ).to(device)
    optimizer = torch.optim.AdamW(model.parameters(), lr=9e-4, weight_decay=3e-4)
    criterion = nn.SmoothL1Loss(beta=0.28, reduction='none')
    train_loader = DataLoader(
        SequenceStaticDataset(train_seq_num, train_seq_cat, train_static_num, train_static_cat, y_train),
        batch_size=batch_size,
        shuffle=True,
        drop_last=False,
    )
    val_loader = DataLoader(
        SequenceStaticDataset(val_seq_num, val_seq_cat, val_static_num, val_static_cat, y_val),
        batch_size=batch_size,
        shuffle=False,
    )
    total_steps = max(20, epochs * max(1, len(train_loader)))
    scheduler = torch.optim.lr_scheduler.OneCycleLR(
        optimizer,
        max_lr=0.0014,
        total_steps=total_steps,
        pct_start=0.12,
        div_factor=12.0,
        final_div_factor=40.0,
    )
    best_state = None
    best_metrics = None
    best_val_rmse = float('inf')
    stale_epochs = 0
    for epoch in range(epochs):
        model.train()
        running_loss = 0.0
        for batch_seq_num, batch_seq_cat, batch_static_num, batch_static_cat, batch_target in train_loader:
            batch_seq_num = batch_seq_num.to(device)
            batch_seq_cat = batch_seq_cat.to(device)
            batch_static_num = batch_static_num.to(device)
            batch_static_cat = batch_static_cat.to(device)
            batch_target = batch_target.to(device)
            optimizer.zero_grad(set_to_none=True)
            predictions = model(batch_seq_num, batch_seq_cat, batch_static_num, batch_static_cat)
            sample_weights = _compute_sample_weights(batch_target, target_transform)
            loss = criterion(predictions, batch_target)
            loss = (loss * sample_weights).mean()
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            optimizer.step()
            scheduler.step()
            running_loss += float(loss.item()) * len(batch_target)
        train_loss = running_loss / max(1, len(train_loader.dataset))
        val_rmse, val_metrics = _evaluate_model(model, val_loader, device, target_transform)
        improved = val_rmse + 1e-4 < best_val_rmse
        if improved:
            best_val_rmse = val_rmse
            best_metrics = val_metrics
            best_state = copy.deepcopy(model.state_dict())
            stale_epochs = 0
        else:
            stale_epochs += 1
        if epoch == 0 or (epoch + 1) % 5 == 0 or improved:
            print(
                f'[lstm_mlp] epoch={epoch + 1:02d} train_loss={train_loss:.4f} '
                f'val_r2={val_metrics["r2"]:.4f} val_rmse={val_metrics["rmse"]:.4f}'
            )
        if stale_epochs >= EARLY_STOPPING_PATIENCE:
            print(f'[lstm_mlp] Early stopping at epoch {epoch + 1}')
            break
    if best_state is None:
        best_state = copy.deepcopy(model.state_dict())
    model.load_state_dict(best_state)
    model.eval()
    with torch.no_grad():
        predictions = model(
            torch.tensor(test_seq_num, dtype=torch.float32).to(device),
            torch.tensor(test_seq_cat, dtype=torch.long).to(device),
            torch.tensor(test_static_num, dtype=torch.float32).to(device),
            torch.tensor(test_static_cat, dtype=torch.long).to(device),
        ).cpu().numpy()
    if target_transform == 'log1p':
        y_pred = np.expm1(predictions)
        y_true = np.expm1(y_test_aligned)
    else:
        y_pred = predictions
        y_true = y_test_aligned
    y_pred = np.clip(y_pred, a_min=0, a_max=None)
    mse = mean_squared_error(y_true, y_pred)
    default_sequence_num_prefix = train_seq_num[:, :-1, :].mean(axis=0).astype(np.float32)
    if train_seq_cat.shape[-1] > 0:
        default_sequence_cat_prefix = np.rint(train_seq_cat[:, :-1, :].mean(axis=0)).astype(np.int64)
    else:
        default_sequence_cat_prefix = np.zeros((WINDOW_SIZE - 1, 0), dtype=np.int64)
    bundle = {
        'state_dict': model.state_dict(),
        'architecture': 'temporal_fusion_transformer',
        'window_size': WINDOW_SIZE,
        'target_transform': target_transform,
        'feature_layout': feature_layout,
        'category_maps': category_maps,
        'seq_mean': seq_mean,
        'seq_std': seq_std,
        'static_mean': static_mean,
        'static_std': static_std,
        'default_sequence_num_prefix': default_sequence_num_prefix,
        'default_sequence_cat_prefix': default_sequence_cat_prefix,
        'seq_num_dim': train_seq_num.shape[-1],
        'static_num_dim': train_static_num.shape[-1],
        'seq_cat_cardinalities': [len(category_maps[feature]) + 1 for feature in feature_layout['seq_cat_features']],
        'static_cat_cardinalities': [len(category_maps[feature]) + 1 for feature in feature_layout['static_cat_features']],
        'best_validation_metrics': best_metrics,
    }
    torch.save(bundle, model_path)
    return {
        'metrics': {
            'r2': round(float(r2_score(y_true, y_pred)), 4),
            'mse': round(float(mse), 4),
            'rmse': round(float(np.sqrt(mse)), 4),
            'mae': round(float(mean_absolute_error(y_true, y_pred)), 4),
        },
        'metadata': {
            'sequence_window_size': WINDOW_SIZE,
            'sequence_feature_names': SEQUENCE_FEATURES,
            'static_feature_names': STATIC_FEATURES,
            'deep_learning_architecture': 'temporal_fusion_transformer',
            'deep_validation_r2': round(float(best_metrics['r2']), 4) if best_metrics else None,
        },
    }
 def load_lstm_mlp_bundle(model_path: str) -> Optional[Dict]:
    if torch is None or not os.path.exists(model_path):
        return None
    bundle = torch.load(model_path, map_location='cpu', weights_only=False)
    model = LSTMMLPRegressor(
        seq_num_dim=bundle['seq_num_dim'],
        static_num_dim=bundle['static_num_dim'],
        seq_cat_cardinalities=bundle['seq_cat_cardinalities'],
        static_cat_cardinalities=bundle['static_cat_cardinalities'],
    )
    model.load_state_dict(bundle['state_dict'])
    model.eval()
    bundle['model'] = model
    return bundle
 def predict_lstm_mlp(bundle: Dict, current_df: pd.DataFrame) -> float:
    sequence_num_window, sequence_cat_window, static_num_row, static_cat_row = _prepare_inference_window(
        current_df,
        bundle['feature_layout'],
        bundle['category_maps'],
        bundle['default_sequence_num_prefix'],
        bundle['default_sequence_cat_prefix'],
    )
    sequence_num_window = ((sequence_num_window - bundle['seq_mean']) / bundle['seq_std']).astype(np.float32)
    static_num_row = ((static_num_row - bundle['static_mean']) / bundle['static_std']).astype(np.float32)
    with torch.no_grad():
        prediction = bundle['model'](
            torch.tensor(sequence_num_window, dtype=torch.float32).unsqueeze(0),
            torch.tensor(sequence_cat_window, dtype=torch.long).unsqueeze(0),
            torch.tensor(static_num_row, dtype=torch.float32).unsqueeze(0),
            torch.tensor(static_cat_row, dtype=torch.long).unsqueeze(0),
        ).cpu().numpy()[0]
    if bundle.get('target_transform') == 'log1p':
        prediction = np.expm1(prediction)
    return float(max(0.5, prediction))
--- a/backend/core/feature_mining.py
+++ b/backend/core/feature_mining.py
@@ -1,4 +1,3 @@
 import pandas as pd
 import numpy as np
 import config
@@ -7,145 +6,67 @@ from core.preprocessing import get_clean_data
 def calculate_correlation():
    df = get_clean_data()
    numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
-    
+    for candidate in [config.EMPLOYEE_ID_COLUMN]:
-    if 'ID' in numeric_cols:
+        if candidate in numeric_cols:
-        numeric_cols.remove('ID')
+            numeric_cols.remove(candidate)
-    
+    return df[numeric_cols].corr()
    corr_matrix = df[numeric_cols].corr()
    return corr_matrix
 def get_correlation_for_heatmap():
    corr_matrix = calculate_correlation()
    key_features = [
-        'Age',
+        '月均加班时长',
-        'Service time',
+        '通勤时长分钟',
-        'Distance from Residence to Work',
+        '近90天缺勤次数',
-        'Work load Average/day ',
+        'BMI',
-        'Body mass index',
+        '近30天睡眠时长均值',
-        'Absenteeism time in hours'
+        '缺勤时长（小时）',
    ]
    key_features = [f for f in key_features if f in corr_matrix.columns]
    sub_matrix = corr_matrix.loc[key_features, key_features]
-    
+    return {
    result = {
        'features': [config.FEATURE_NAME_CN.get(f, f) for f in key_features],
-        'matrix': sub_matrix.values.round(2).tolist()
+        'matrix': sub_matrix.values.round(2).tolist(),
    }
    return result
 def calculate_feature_importance(model, feature_names):
    if hasattr(model, 'feature_importances_'):
        importance = model.feature_importances_
    else:
        raise ValueError("Model does not have feature_importances_ attribute")
    importance_dict = dict(zip(feature_names, importance))
    sorted_importance = sorted(importance_dict.items(), key=lambda x: x[1], reverse=True)
    return sorted_importance
 def get_feature_importance_from_model(model_path, feature_names):
    import joblib
    model = joblib.load(model_path)
    return calculate_feature_importance(model, feature_names)
 def group_comparison(dimension):
    df = get_clean_data()
    dimension_map = {
-        'drinker': ('Social drinker', {0: '不饮酒', 1: '饮酒'}),
+        'industry': ('所属行业', None, '所属行业'),
-        'smoker': ('Social smoker', {0: '不吸烟', 1: '吸烟'}),
+        'shift_type': ('班次类型', None, '班次类型'),
-        'education': ('Education', {1: '高中', 2: '本科', 3: '研究生', 4: '博士'}),
+        'job_family': ('岗位序列', None, '岗位序列'),
-        'children': ('Son', {0: '无子女'}, lambda x: x > 0, '有子女'),
+        'marital_status': ('婚姻状态', None, '婚姻状态'),
-        'pet': ('Pet', {0: '无宠物'}, lambda x: x > 0, '有宠物')
+        'chronic_disease': ('是否慢性病史', {0: '无慢性病史', 1: '有慢性病史'}, '慢性病史'),
    }
    if dimension not in dimension_map:
        raise ValueError(f"Invalid dimension: {dimension}")
-    
+
-    col, value_map = dimension_map[dimension][0], dimension_map[dimension][1]
+    column, value_map, dimension_name = dimension_map[dimension]
-    
+    groups = []
-    if dimension in ['children', 'pet']:
+    for value in sorted(df[column].unique()):
-        threshold_fn = dimension_map[dimension][2]
+        group_df = df[df[column] == value]
-        other_label = dimension_map[dimension][3]
+        groups.append({
-        
+            'name': value_map.get(value, value) if value_map else str(value),
-        groups = []
+            'value': int(value) if isinstance(value, (int, np.integer)) else str(value),
-        for val in [0]:
+            'avg_hours': round(group_df[config.TARGET_COLUMN].mean(), 2),
-            group_df = df[df[col] == val]
+            'count': int(len(group_df)),
-            if len(group_df) > 0:
+            'percentage': round(len(group_df) / len(df) * 100, 1),
-                groups.append({
+        })
-                    'name': value_map.get(val, str(val)),
+
-                    'value': val,
+    groups.sort(key=lambda item: item['avg_hours'], reverse=True)
-                    'avg_hours': round(group_df['Absenteeism time in hours'].mean(), 2),
+    top = groups[0]['avg_hours'] if groups else 0
-                    'count': len(group_df),
+    bottom = groups[-1]['avg_hours'] if len(groups) > 1 else 0
-                    'percentage': round(len(group_df) / len(df) * 100, 1)
+    diff_value = round(top - bottom, 2)
-                })
+    diff_percentage = round(diff_value / bottom * 100, 1) if bottom else 0
-        
+
        group_df = df[df[col].apply(threshold_fn)]
        if len(group_df) > 0:
            groups.append({
                'name': other_label,
                'value': 1,
                'avg_hours': round(group_df['Absenteeism time in hours'].mean(), 2),
                'count': len(group_df),
                'percentage': round(len(group_df) / len(df) * 100, 1)
            })
    else:
        groups = []
        for val in sorted(df[col].unique()):
            group_df = df[df[col] == val]
            if len(group_df) > 0:
                groups.append({
                    'name': value_map.get(val, str(val)),
                    'value': int(val),
                    'avg_hours': round(group_df['Absenteeism time in hours'].mean(), 2),
                    'count': len(group_df),
                    'percentage': round(len(group_df) / len(df) * 100, 1)
                })
    if len(groups) >= 2:
        diff_value = abs(groups[0]['avg_hours'] - groups[1]['avg_hours'])
        base = min(groups[0]['avg_hours'], groups[1]['avg_hours'])
        diff_percentage = round(diff_value / base * 100, 1) if base > 0 else 0
    else:
        diff_value = 0
        diff_percentage = 0
    return {
        'dimension': dimension,
-        'dimension_name': {
+        'dimension_name': dimension_name,
            'drinker': '饮酒习惯',
            'smoker': '吸烟习惯',
            'education': '学历',
            'children': '子女',
            'pet': '宠物'
        }.get(dimension, dimension),
        'groups': groups,
        'difference': {
            'value': diff_value,
-            'percentage': diff_percentage
+            'percentage': diff_percentage,
-        }
+        },
    }
 if __name__ == '__main__':
    print("Correlation matrix:")
    corr = get_correlation_for_heatmap()
    print(corr)
    print("\nGroup comparison (drinker):")
    comp = group_comparison('drinker')
    print(comp)
--- a/backend/core/generate_dataset.py
+++ b/backend/core/generate_dataset.py
@@ -0,0 +1,405 @@
 import os
 import sys
 import numpy as np
 import pandas as pd
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 import config
 INDUSTRIES = {
    '制造业': {'shift_bias': 0.9, 'overtime_bias': 0.8, 'night_bias': 0.8},
    '互联网': {'shift_bias': 0.2, 'overtime_bias': 1.0, 'night_bias': 0.2},
    '零售连锁': {'shift_bias': 0.7, 'overtime_bias': 0.5, 'night_bias': 0.3},
    '物流运输': {'shift_bias': 0.9, 'overtime_bias': 0.7, 'night_bias': 0.9},
    '金融服务': {'shift_bias': 0.1, 'overtime_bias': 0.7, 'night_bias': 0.1},
    '医药健康': {'shift_bias': 0.6, 'overtime_bias': 0.6, 'night_bias': 0.5},
    '建筑工程': {'shift_bias': 0.5, 'overtime_bias': 0.8, 'night_bias': 0.3},
 }
 def season_from_month(month):
    if month in [12, 1, 2]:
        return 1
    if month in [3, 4, 5]:
        return 2
    if month in [6, 7, 8]:
        return 3
    return 4
 def weighted_choice(rng, items, probs):
    probs = np.array(probs, dtype=float)
    probs = probs / probs.sum()
    return rng.choice(items, p=probs)
 def build_company_pool(rng, company_count=180):
    industries = list(INDUSTRIES.keys())
    scales = ['100人以下', '100-499人', '500-999人', '1000-4999人', '5000人及以上']
    city_tiers = ['一线', '新一线', '二线', '三线及以下']
    companies = []
    for idx in range(company_count):
        industry = weighted_choice(rng, industries, [0.22, 0.14, 0.14, 0.14, 0.1, 0.12, 0.14])
        companies.append({
            '企业编号': f'C{idx + 1:03d}',
            '所属行业': industry,
            '企业规模': weighted_choice(rng, scales, [0.15, 0.28, 0.2, 0.24, 0.13]),
            '所在城市等级': weighted_choice(rng, city_tiers, [0.18, 0.34, 0.3, 0.18]),
        })
    return companies
 def build_employee_pool(rng, companies, employee_count=2600):
    genders = ['男', '女']
    employment_types = ['正式员工', '劳务派遣', '外包驻场', '实习生']
    departments = ['生产', '研发', '销售', '客服', '职能', '仓储物流', '门店运营']
    job_families = ['管理', '专业技术', '销售业务', '生产操作', '行政支持', '客服坐席']
    job_levels = ['初级', '中级', '高级', '主管', '经理及以上']
    educations = ['中专及以下', '大专', '本科', '硕士', '博士']
    marital = ['未婚', '已婚', '离异/其他']
    housing = ['自有住房', '租房', '宿舍']
    shifts = ['标准白班', '两班倒', '三班倒', '弹性班']
    performance = ['A', 'B', 'C', 'D']
    stress = ['低', '中', '高']
    employees = []
    for idx in range(employee_count):
        company = companies[rng.integers(0, len(companies))]
        industry = company['所属行业']
        age = int(np.clip(rng.normal(33, 7), 20, 55))
        tenure = round(float(np.clip(age - 21 + rng.normal(0, 2), 0.2, 32)), 1)
        family_bias = 0.6 if age >= 30 else 0.25
        married = weighted_choice(rng, marital, [0.45, 0.48, 0.07] if age < 30 else [0.18, 0.72, 0.1])
        children = int(np.clip(rng.poisson(0.4 if married == '未婚' else family_bias), 0, 3))
        industry_profile = INDUSTRIES[industry]
        shift = weighted_choice(
            rng,
            shifts,
            [
                max(0.1, 1 - industry_profile['shift_bias']),
                0.35 * industry_profile['shift_bias'],
                0.25 * industry_profile['shift_bias'],
                0.2,
            ],
        )
        night_flag = int(shift == '三班倒' or (shift == '两班倒' and rng.random() < industry_profile['night_bias']))
        overtime = float(np.clip(rng.normal(22 + 18 * industry_profile['overtime_bias'], 10), 0, 90))
        commute_minutes = float(np.clip(rng.normal(42, 18), 8, 130))
        commute_km = float(np.clip(commute_minutes * rng.uniform(0.35, 0.75), 2, 65))
        performance_level = weighted_choice(rng, performance, [0.18, 0.46, 0.26, 0.1])
        chronic_flag = int(rng.random() < max(0.05, (age - 26) * 0.01))
        check_abnormal = int(chronic_flag == 1 or rng.random() < 0.14)
        sleep_hours = round(float(np.clip(rng.normal(6.9 - 0.35 * night_flag, 0.8), 4.5, 9.0)), 1)
        exercise = int(np.clip(rng.poisson(2.2), 0, 7))
        smoking = int(rng.random() < (0.22 if rng.random() < 0.55 else 0.08))
        drinking = int(rng.random() < 0.27)
        stress_level = weighted_choice(
            rng,
            stress,
            [0.22, 0.52, 0.26 + min(0.15, overtime / 120)],
        )
        bmi = round(float(np.clip(rng.normal(24.2, 3.2), 17.5, 36.5)), 1)
        history_count = int(np.clip(rng.poisson(1.2 + chronic_flag * 0.6 + children * 0.15), 0, 8))
        history_hours = float(np.clip(rng.normal(18 + chronic_flag * 10 + history_count * 3, 10), 0, 120))
        discipline = int(np.clip(rng.poisson(0.2), 0, 4))
        team_size = int(np.clip(rng.normal(11, 5), 3, 40))
        manager_span = int(np.clip(team_size + rng.normal(3, 2), 4, 60))
        local_hukou = int(rng.random() < 0.58)
        cross_city = int(commute_minutes > 65 or (local_hukou == 0 and rng.random() < 0.35))
        sedentary = int(weighted_choice(rng, [0, 1], [0.45, 0.55]) if company['所属行业'] in ['互联网', '金融服务'] else rng.random() < 0.3)
        employees.append({
            '企业编号': company['企业编号'],
            '所属行业': industry,
            '企业规模': company['企业规模'],
            '所在城市等级': company['所在城市等级'],
            '用工类型': weighted_choice(rng, employment_types, [0.74, 0.12, 0.1, 0.04]),
            '部门条线': weighted_choice(rng, departments, [0.18, 0.16, 0.14, 0.11, 0.12, 0.14, 0.15]),
            '岗位序列': weighted_choice(rng, job_families, [0.08, 0.24, 0.16, 0.2, 0.12, 0.2]),
            '岗位级别': weighted_choice(rng, job_levels, [0.34, 0.32, 0.18, 0.11, 0.05]),
            '员工编号': f'E{idx + 1:05d}',
            '性别': weighted_choice(rng, genders, [0.56, 0.44]),
            '年龄': age,
            '司龄年数': tenure,
            '最高学历': weighted_choice(rng, educations, [0.14, 0.28, 0.4, 0.15, 0.03]),
            '婚姻状态': married,
            '是否本地户籍': local_hukou,
            '子女数量': children,
            '是否独生子女家庭负担': int(children >= 2 or (married == '已婚' and rng.random() < 0.18)),
            '居住类型': weighted_choice(rng, housing, [0.38, 0.48, 0.14]),
            '班次类型': shift,
            '是否夜班岗位': night_flag,
            '月均加班时长': round(overtime, 1),
            '近30天出勤天数': int(np.clip(rng.normal(21.5, 2.2), 14, 27)),
            '近90天缺勤次数': history_count,
            '近180天请假总时长': round(history_hours, 1),
            '通勤时长分钟': round(commute_minutes, 1),
            '通勤距离公里': round(commute_km, 1),
            '是否跨城通勤': cross_city,
            '绩效等级': performance_level,
            '近12月违纪次数': discipline,
            '团队人数': team_size,
            '直属上级管理跨度': manager_span,
            'BMI': bmi,
            '是否慢性病史': chronic_flag,
            '年度体检异常标记': check_abnormal,
            '近30天睡眠时长均值': sleep_hours,
            '每周运动频次': exercise,
            '是否吸烟': smoking,
            '是否饮酒': drinking,
            '心理压力等级': stress_level,
            '是否长期久坐岗位': sedentary,
        })
    return employees
 def sample_event(rng, employee):
    month = int(rng.integers(1, 13))
    weekday = int(rng.integers(1, 8))
    near_holiday = int(rng.random() < (0.3 if month in [1, 2, 4, 5, 9, 10] else 0.16))
    leave_type_items = ['病假', '事假', '年假', '调休', '婚假', '丧假', '产检育儿假', '工伤假', '其他']
    leave_probs = [0.26, 0.22, 0.11, 0.14, 0.03, 0.02, 0.07, 0.03, 0.12]
    if employee['是否慢性病史'] == 1 or employee['年度体检异常标记'] == 1:
        leave_probs = [0.34, 0.18, 0.08, 0.1, 0.02, 0.02, 0.08, 0.04, 0.14]
    elif employee['子女数量'] >= 2:
        leave_probs = [0.22, 0.24, 0.1, 0.12, 0.03, 0.02, 0.12, 0.02, 0.13]
    leave_type = weighted_choice(rng, leave_type_items, leave_probs)
    if leave_type in ['病假', '工伤假']:
        reason_category = weighted_choice(rng, ['身体不适', '就医复查', '职业疲劳'], [0.52, 0.3, 0.18])
    elif leave_type == '产检育儿假':
        reason_category = weighted_choice(rng, ['子女照护', '家庭事务', '身体不适'], [0.6, 0.25, 0.15])
    elif leave_type in ['婚假', '丧假']:
        reason_category = weighted_choice(rng, ['家庭事务', '突发事件'], [0.72, 0.28])
    elif leave_type in ['年假', '调休']:
        reason_category = weighted_choice(rng, ['职业疲劳', '家庭事务', '交通受阻'], [0.52, 0.28, 0.2])
    else:
        reason_category = weighted_choice(
            rng,
            ['身体不适', '家庭事务', '子女照护', '交通受阻', '突发事件', '职业疲劳'],
            [0.2, 0.22, 0.14, 0.12, 0.12, 0.2],
        )
    medical_certificate = int(
        leave_type in ['病假', '工伤假']
        or reason_category in ['身体不适', '就医复查']
        or (employee['是否慢性病史'] == 1 and leave_type == '其他')
    )
    urgent_leave = int(
        leave_type in ['病假', '工伤假']
        or reason_category in ['突发事件', '身体不适']
        or (near_holiday == 0 and leave_type == '事假' and rng.random() < 0.35)
    )
    continuous_absence = int(
        leave_type in ['病假', '工伤假', '产检育儿假']
        and (employee['近90天缺勤次数'] >= 2 or employee['近180天请假总时长'] >= 28)
    )
    previous_overtime = int(
        employee['月均加班时长'] >= 30
        or (employee['月均加班时长'] >= 24 and weekday in [1, 2, 5])
        or (employee['是否夜班岗位'] == 1 and rng.random() < 0.65)
    )
    season = season_from_month(month)
    channel = weighted_choice(rng, ['系统申请', '主管代提', '临时电话报备'], [0.68, 0.18, 0.14])
    pressure_score = (
        employee['月均加班时长'] * 0.032
        + employee['通勤时长分钟'] * 0.018
        + employee['是否夜班岗位'] * 0.75
        + employee['是否跨城通勤'] * 0.32
        + previous_overtime * 0.35
    )
    health_score = (
        employee['是否慢性病史'] * 1.2
        + employee['年度体检异常标记'] * 0.55
        + (employee['BMI'] >= 28) * 0.3
        + (employee['近30天睡眠时长均值'] < 6.4) * 0.45
    )
    family_score = employee['子女数量'] * 0.18 + employee['是否独生子女家庭负担'] * 0.28
    resilience_score = (
        (0.55 if employee['绩效等级'] == 'A' else 0.25 if employee['绩效等级'] == 'B' else 0.0)
        + min(employee['司龄年数'] / 26, 0.65)
        + min(employee['每周运动频次'] * 0.06, 0.25)
    )
    base = 0.35
    base += pressure_score
    base += health_score
    base += family_score
    base += 0.4 if employee['心理压力等级'] == '高' else (0.18 if employee['心理压力等级'] == '中' else -0.05)
    base += 0.18 if near_holiday else 0.0
    base += 0.35 if continuous_absence else 0.0
    base += 0.28 if employee['近90天缺勤次数'] >= 3 else 0.0
    base += 0.18 if employee['近180天请假总时长'] >= 36 else 0.0
    base -= resilience_score
    leave_bonus = {
        '病假': 2.1,
        '事假': 0.8,
        '年假': 0.15,
        '调休': 0.1,
        '婚假': 3.1,
        '丧假': 2.8,
        '产检育儿假': 2.35,
        '工伤假': 3.9,
        '其他': 0.55,
    }
    reason_bonus = {
        '身体不适': 1.0,
        '家庭事务': 0.55,
        '子女照护': 0.75,
        '交通受阻': 0.2,
        '突发事件': 0.6,
        '职业疲劳': 0.7,
        '就医复查': 1.15,
    }
    industry_bonus = {
        '制造业': 0.42,
        '互联网': 0.22,
        '零售连锁': 0.28,
        '物流运输': 0.5,
        '金融服务': 0.12,
        '医药健康': 0.24,
        '建筑工程': 0.4,
    }
    season_bonus = {1: 0.35, 2: 0.0, 3: 0.15, 4: 0.05}
    weekday_bonus = {1: 0.05, 2: 0.0, 3: 0.0, 4: 0.05, 5: 0.15, 6: 0.25, 7: 0.3}
    duration = base
    duration += leave_bonus[leave_type]
    duration += reason_bonus[reason_category]
    duration += industry_bonus[employee['所属行业']]
    duration += season_bonus[season]
    duration += weekday_bonus[weekday]
    duration += 0.55 if medical_certificate else 0.0
    duration += 0.28 if urgent_leave else -0.06
    if leave_type == '病假' and employee['是否慢性病史'] == 1:
        duration += 0.85
    if leave_type == '工伤假':
        duration += 1.0 + employee['是否夜班岗位'] * 0.3
    if leave_type in ['婚假', '丧假']:
        duration += 0.7 + 0.18 * near_holiday
    if leave_type == '产检育儿假':
        duration += 0.55 + employee['子女数量'] * 0.12
    if leave_type in ['年假', '调休']:
        duration *= 0.82 if near_holiday == 0 else 0.9
    duration = round(float(np.clip(duration + rng.normal(0, 0.35), 0.5, 18.0)), 1)
    event = employee.copy()
    event.update({
        '缺勤月份': month,
        '星期几': weekday,
        '是否节假日前后': near_holiday,
        '季节': season,
        '请假申请渠道': channel,
        '请假类型': leave_type,
        '请假原因大类': reason_category,
        '是否提供医院证明': medical_certificate,
        '是否临时请假': urgent_leave,
        '是否连续缺勤': continuous_absence,
        '前一工作日是否加班': previous_overtime,
        '缺勤时长（小时）': duration,
    })
    return event
 def attach_event_timeline(df):
    df = df.copy()
    rng = np.random.default_rng(config.RANDOM_STATE)
    base_date = np.datetime64('2025-01-01')
    timelines = []
    for employee_id, group in df.groupby('员工编号', sort=False):
        group = group.copy().reset_index(drop=True)
        event_count = len(group)
        offsets = np.sort(rng.integers(0, 365, size=event_count))
        group['事件日期'] = [
            str(pd.Timestamp(base_date + np.timedelta64(int(offset), 'D')).date())
            for offset in offsets
        ]
        group['事件日期索引'] = offsets.astype(int)
        group['事件序号'] = np.arange(1, event_count + 1)
        group['员工历史事件数'] = event_count
        timelines.append(group)
    return pd.concat(timelines, ignore_index=True)
 def validate_dataset(df):
    required_columns = [
        '员工编号',
        '所属行业',
        '岗位序列',
        '月均加班时长',
        '通勤时长分钟',
        '是否慢性病史',
        '请假类型',
        '事件序号',
        '事件日期索引',
        '员工历史事件数',
        '缺勤时长（小时）',
    ]
    for column in required_columns:
        if column not in df.columns:
            raise ValueError(f'Missing required column: {column}')
    if len(df) < 10000:
        raise ValueError('Synthetic dataset is smaller than expected')
    if df['员工编号'].nunique() < 2000:
        raise ValueError('Employee coverage is too small')
    high_risk_ratio = (df['缺勤时长（小时）'] > 8).mean()
    if not 0.15 <= high_risk_ratio <= 0.4:
        raise ValueError(f'High risk ratio out of range: {high_risk_ratio:.3f}')
    medical_mean = df[df['是否提供医院证明'] == 1]['缺勤时长（小时）'].mean()
    no_medical_mean = df[df['是否提供医院证明'] == 0]['缺勤时长（小时）'].mean()
    if medical_mean <= no_medical_mean:
        raise ValueError('Medical certificate signal is not effective')
    night_mean = df[df['是否夜班岗位'] == 1]['缺勤时长（小时）'].mean()
    day_mean = df[df['是否夜班岗位'] == 0]['缺勤时长（小时）'].mean()
    if night_mean <= day_mean:
        raise ValueError('Night shift signal is not effective')
 def generate_dataset(output_path=None, sample_count=12000, random_state=None):
    rng = np.random.default_rng(config.RANDOM_STATE if random_state is None else random_state)
    companies = build_company_pool(rng)
    employees = build_employee_pool(rng, companies)
    events = []
    employee_idx = rng.integers(0, len(employees), size=sample_count)
    for idx in employee_idx:
        events.append(sample_event(rng, employees[int(idx)]))
    df = attach_event_timeline(pd.DataFrame(events))
    validate_dataset(df)
    if output_path:
        os.makedirs(os.path.dirname(output_path), exist_ok=True)
        df.to_csv(output_path, index=False, encoding='utf-8-sig')
    return df
 def ensure_dataset():
    if not os.path.exists(config.RAW_DATA_PATH):
        generate_dataset(config.RAW_DATA_PATH)
        return
    try:
        df = pd.read_csv(config.RAW_DATA_PATH)
        validate_dataset(df)
    except Exception:
        generate_dataset(config.RAW_DATA_PATH)
 if __name__ == '__main__':
    dataset = generate_dataset(config.RAW_DATA_PATH)
    print(f'Generated dataset: {config.RAW_DATA_PATH}')
    print(dataset.head())
--- a/backend/core/generate_evaluation_plots.py
+++ b/backend/core/generate_evaluation_plots.py
@@ -0,0 +1,330 @@
 import json
 import os
 import sys
 from pathlib import Path
 import joblib
 import matplotlib
 import matplotlib.pyplot as plt
 import numpy as np
 import pandas as pd
 from sklearn.metrics import confusion_matrix
 from sklearn.model_selection import train_test_split
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 import config
 from core.deep_learning_model import (
    _build_sequence_arrays,
    load_lstm_mlp_bundle,
 )
 from core.model_features import (
    NUMERICAL_OUTLIER_COLUMNS,
    TARGET_COLUMN,
    apply_outlier_bounds,
    engineer_features,
    fit_outlier_bounds,
    make_target_bins,
    normalize_columns,
 )
 from core.preprocessing import get_clean_data
 matplotlib.rcParams['font.sans-serif'] = [
    'Microsoft YaHei',
    'SimHei',
    'Noto Sans CJK SC',
    'Arial Unicode MS',
    'DejaVu Sans',
 ]
 matplotlib.rcParams['axes.unicode_minus'] = False
 BASE_DIR = Path(config.BASE_DIR)
 MODELS_DIR = Path(config.MODELS_DIR)
 OUTPUT_DIR = BASE_DIR / 'outputs' / 'eval_figures'
 PREDICTION_CSV = OUTPUT_DIR / 'lstm_predictions.csv'
 SUMMARY_JSON = OUTPUT_DIR / 'evaluation_summary.json'
 MODEL_DISPLAY_NAMES = {
    'lstm_mlp': '时序注意力融合网络',
    'xgboost': 'XGBoost',
    'gradient_boosting': 'GBDT',
    'random_forest': '随机森林',
    'extra_trees': '极端随机树',
    'lightgbm': 'LightGBM',
 }
 def ensure_output_dir():
    OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
 def load_metrics():
    metrics_path = MODELS_DIR / 'model_metrics.pkl'
    if not metrics_path.exists():
        raise FileNotFoundError(f'未找到模型评估文件: {metrics_path}')
    metrics = joblib.load(metrics_path)
    return dict(sorted(metrics.items(), key=lambda item: item[1].get('r2', -999), reverse=True))
 def get_test_split():
    raw_df = normalize_columns(get_clean_data())
    target_bins = make_target_bins(raw_df[TARGET_COLUMN].values)
    raw_train_df, raw_test_df = train_test_split(
        raw_df,
        test_size=config.TEST_SIZE,
        random_state=config.RANDOM_STATE,
        stratify=target_bins,
    )
    return raw_train_df.reset_index(drop=True), raw_test_df.reset_index(drop=True)
 def classify_risk(values):
    values = np.asarray(values, dtype=float)
    return np.where(values < 4, '低风险', np.where(values <= 8, '中风险', '高风险'))
 def load_lstm_predictions():
    model_path = MODELS_DIR / 'lstm_mlp_model.pt'
    if not model_path.exists():
        raise FileNotFoundError(f'未找到深度学习模型文件: {model_path}')
    bundle = load_lstm_mlp_bundle(str(model_path))
    if bundle is None:
        raise RuntimeError('无法加载深度学习模型，请确认 torch 环境和模型文件正常。')
    raw_train_df, raw_test_df = get_test_split()
    outlier_bounds = fit_outlier_bounds(raw_train_df, NUMERICAL_OUTLIER_COLUMNS)
    fit_df = apply_outlier_bounds(raw_train_df, outlier_bounds)
    test_df = apply_outlier_bounds(raw_test_df, outlier_bounds)
    feature_layout = bundle['feature_layout']
    category_maps = bundle['category_maps']
    target_transform = bundle['target_transform']
    _, _, _, _, _ = _build_sequence_arrays(
        fit_df,
        feature_layout,
        category_maps,
        target_transform,
    )
    test_seq_num, test_seq_cat, test_static_num, test_static_cat, y_test = _build_sequence_arrays(
        test_df,
        feature_layout,
        category_maps,
        target_transform,
    )
    test_seq_num = ((test_seq_num - bundle['seq_mean']) / bundle['seq_std']).astype(np.float32)
    test_static_num = ((test_static_num - bundle['static_mean']) / bundle['static_std']).astype(np.float32)
    import torch
    model = bundle['model']
    model.eval()
    with torch.no_grad():
        predictions = model(
            torch.tensor(test_seq_num, dtype=torch.float32),
            torch.tensor(test_seq_cat, dtype=torch.long),
            torch.tensor(test_static_num, dtype=torch.float32),
            torch.tensor(test_static_cat, dtype=torch.long),
        ).cpu().numpy()
    if target_transform == 'log1p':
        y_true = np.expm1(y_test)
        y_pred = np.expm1(predictions)
    else:
        y_true = y_test
        y_pred = predictions
    y_true = np.asarray(y_true, dtype=float)
    y_pred = np.clip(np.asarray(y_pred, dtype=float), a_min=0.0, a_max=None)
    residuals = y_pred - y_true
    prediction_df = pd.DataFrame({
        '真实值': np.round(y_true, 4),
        '预测值': np.round(y_pred, 4),
        '残差': np.round(residuals, 4),
        '真实风险等级': classify_risk(y_true),
        '预测风险等级': classify_risk(y_pred),
    })
    prediction_df.to_csv(PREDICTION_CSV, index=False, encoding='utf-8-sig')
    return prediction_df
 def plot_model_comparison(metrics):
    model_names = [MODEL_DISPLAY_NAMES.get(name, name) for name in metrics.keys()]
    r2_values = [metrics[name]['r2'] for name in metrics]
    rmse_values = [metrics[name]['rmse'] for name in metrics]
    mae_values = [metrics[name]['mae'] for name in metrics]
    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    bar_colors = ['#0f766e' if name == 'lstm_mlp' else '#94a3b8' for name in metrics.keys()]
    axes[0].bar(model_names, r2_values, color=bar_colors)
    axes[0].set_title('模型R2对比')
    axes[0].set_ylabel('R2')
    axes[0].tick_params(axis='x', rotation=20)
    axes[1].bar(model_names, rmse_values, color=bar_colors)
    axes[1].set_title('模型RMSE对比')
    axes[1].set_ylabel('RMSE')
    axes[1].tick_params(axis='x', rotation=20)
    axes[2].bar(model_names, mae_values, color=bar_colors)
    axes[2].set_title('模型MAE对比')
    axes[2].set_ylabel('MAE')
    axes[2].tick_params(axis='x', rotation=20)
    fig.tight_layout()
    fig.savefig(OUTPUT_DIR / '01_模型性能对比.png', dpi=220, bbox_inches='tight')
    plt.close(fig)
 def plot_actual_vs_pred(prediction_df):
    y_true = prediction_df['真实值'].to_numpy()
    y_pred = prediction_df['预测值'].to_numpy()
    max_value = max(float(y_true.max()), float(y_pred.max()))
    fig, ax = plt.subplots(figsize=(7, 7))
    ax.scatter(y_true, y_pred, s=18, alpha=0.55, color='#0f766e', edgecolors='none')
    ax.plot([0, max_value], [0, max_value], color='#dc2626', linestyle='--', linewidth=1.5)
    ax.set_title('LSTM模型真实值与预测值对比')
    ax.set_xlabel('真实缺勤时长（小时）')
    ax.set_ylabel('预测缺勤时长（小时）')
    fig.tight_layout()
    fig.savefig(OUTPUT_DIR / '02_LSTM真实值_vs_预测值.png', dpi=220, bbox_inches='tight')
    plt.close(fig)
 def plot_residuals(prediction_df):
    y_pred = prediction_df['预测值'].to_numpy()
    residuals = prediction_df['残差'].to_numpy()
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    axes[0].hist(residuals, bins=30, color='#2563eb', alpha=0.85, edgecolor='white')
    axes[0].axvline(0, color='#dc2626', linestyle='--', linewidth=1.2)
    axes[0].set_title('LSTM残差分布')
    axes[0].set_xlabel('残差（预测值 - 真实值）')
    axes[0].set_ylabel('样本数')
    axes[1].scatter(y_pred, residuals, s=18, alpha=0.55, color='#7c3aed', edgecolors='none')
    axes[1].axhline(0, color='#dc2626', linestyle='--', linewidth=1.2)
    axes[1].set_title('LSTM残差散点图')
    axes[1].set_xlabel('预测缺勤时长（小时）')
    axes[1].set_ylabel('残差')
    fig.tight_layout()
    fig.savefig(OUTPUT_DIR / '03_LSTM残差分析.png', dpi=220, bbox_inches='tight')
    plt.close(fig)
 def plot_confusion_matrix(prediction_df):
    labels = ['低风险', '中风险', '高风险']
    cm = confusion_matrix(
        prediction_df['真实风险等级'],
        prediction_df['预测风险等级'],
        labels=labels,
    )
    fig, ax = plt.subplots(figsize=(6, 5))
    image = ax.imshow(cm, cmap='GnBu')
    ax.set_title('LSTM风险等级混淆矩阵')
    ax.set_xlabel('预测类别')
    ax.set_ylabel('真实类别')
    ax.set_xticks(range(len(labels)))
    ax.set_xticklabels(labels)
    ax.set_yticks(range(len(labels)))
    ax.set_yticklabels(labels)
    for row in range(cm.shape[0]):
        for col in range(cm.shape[1]):
            ax.text(col, row, int(cm[row, col]), ha='center', va='center', color='#111827')
    fig.colorbar(image, ax=ax, fraction=0.046, pad=0.04)
    fig.tight_layout()
    fig.savefig(OUTPUT_DIR / '04_LSTM风险等级混淆矩阵.png', dpi=220, bbox_inches='tight')
    plt.close(fig)
 def plot_feature_importance():
    candidate_files = [
        ('xgboost', MODELS_DIR / 'xgboost_model.pkl'),
        ('random_forest', MODELS_DIR / 'random_forest_model.pkl'),
        ('extra_trees', MODELS_DIR / 'extra_trees_model.pkl'),
    ]
    selected_features_path = MODELS_DIR / 'selected_features.pkl'
    feature_names_path = MODELS_DIR / 'feature_names.pkl'
    selected_features = joblib.load(selected_features_path) if selected_features_path.exists() else None
    feature_names = joblib.load(feature_names_path) if feature_names_path.exists() else None
    for model_name, model_path in candidate_files:
        if not model_path.exists():
            continue
        model = joblib.load(model_path)
        if not hasattr(model, 'feature_importances_'):
            continue
        importances = model.feature_importances_
        names = selected_features or feature_names or [f'feature_{idx}' for idx in range(len(importances))]
        if len(names) != len(importances):
            names = [f'feature_{idx}' for idx in range(len(importances))]
        top_items = sorted(zip(names, importances), key=lambda item: item[1], reverse=True)[:15]
        top_items.reverse()
        fig, ax = plt.subplots(figsize=(8, 6))
        ax.barh(
            [config.FEATURE_NAME_CN.get(name, name) for name, _ in top_items],
            [float(value) for _, value in top_items],
            color='#0f766e',
        )
        ax.set_title(f'{MODEL_DISPLAY_NAMES.get(model_name, model_name)}特征重要性 Top15')
        ax.set_xlabel('重要性')
        fig.tight_layout()
        fig.savefig(OUTPUT_DIR / '05_特征重要性_Top15.png', dpi=220, bbox_inches='tight')
        plt.close(fig)
        return model_name
    return None
 def save_summary(metrics, prediction_df, feature_model_name):
    residuals = prediction_df['残差'].to_numpy()
    summary = {
        'best_model': next(iter(metrics.keys())),
        'metrics': metrics,
        'lstm_prediction_summary': {
            'prediction_count': int(len(prediction_df)),
            'residual_mean': round(float(residuals.mean()), 4),
            'residual_std': round(float(residuals.std()), 4),
            'risk_accuracy': round(
                float((prediction_df['真实风险等级'] == prediction_df['预测风险等级']).mean()),
                4,
            ),
        },
        'feature_importance_model': feature_model_name,
        'generated_files': sorted([file.name for file in OUTPUT_DIR.iterdir() if file.is_file()]),
    }
    SUMMARY_JSON.write_text(json.dumps(summary, ensure_ascii=False, indent=2), encoding='utf-8')
 def main():
    ensure_output_dir()
    metrics = load_metrics()
    prediction_df = load_lstm_predictions()
    plot_model_comparison(metrics)
    plot_actual_vs_pred(prediction_df)
    plot_residuals(prediction_df)
    plot_confusion_matrix(prediction_df)
    feature_model_name = plot_feature_importance()
    save_summary(metrics, prediction_df, feature_model_name)
    print(f'评估图片已生成: {OUTPUT_DIR}')
    print(f'LSTM预测明细: {PREDICTION_CSV}')
    print(f'评估摘要: {SUMMARY_JSON}')
 if __name__ == '__main__':
    main()
--- a/backend/core/model_features.py
+++ b/backend/core/model_features.py
@@ -0,0 +1,326 @@
 import numpy as np
 import pandas as pd
 from sklearn.preprocessing import LabelEncoder
 import config
 TARGET_COLUMN = config.TARGET_COLUMN
 ID_COLUMN = config.EMPLOYEE_ID_COLUMN
 COMPANY_COLUMN = config.COMPANY_ID_COLUMN
 LEAKY_COLUMNS = [ID_COLUMN, COMPANY_COLUMN]
 ORDINAL_COLUMNS = [
    '企业规模',
    '所在城市等级',
    '岗位级别',
    '最高学历',
    '绩效等级',
    '心理压力等级',
    '工龄分层',
    '年龄分层',
    '通勤分层',
    '加班分层',
 ]
 NUMERICAL_OUTLIER_COLUMNS = [
    '年龄',
    '司龄年数',
    '月均加班时长',
    '近30天出勤天数',
    '近90天缺勤次数',
    '近180天请假总时长',
    '通勤时长分钟',
    '通勤距离公里',
    '团队人数',
    '直属上级管理跨度',
    'BMI',
    '近30天睡眠时长均值',
    '每周运动频次',
 ]
 DEFAULT_PREDICTION_INPUT = {
    'industry': '制造业',
    'company_size': '1000-4999人',
    'city_tier': '新一线',
    'age': 31,
    'tenure_years': 4.5,
    'education_level': '本科',
    'marital_status': '已婚',
    'job_family': '专业技术',
    'job_level': '中级',
    'employment_type': '正式员工',
    'shift_type': '标准白班',
    'is_night_shift': 0,
    'monthly_overtime_hours': 26,
    'attendance_days_30d': 22,
    'absence_count_90d': 1,
    'leave_hours_180d': 18,
    'commute_minutes': 42,
    'commute_km': 18,
    'cross_city_commute': 0,
    'performance_level': 'B',
    'disciplinary_count_12m': 0,
    'team_size': 10,
    'manager_span': 14,
    'bmi': 24.5,
    'chronic_disease_flag': 0,
    'annual_check_abnormal_flag': 0,
    'sleep_hours': 7.1,
    'exercise_frequency': 2,
    'smoking_flag': 0,
    'drinking_flag': 0,
    'stress_level': '中',
    'sedentary_job_flag': 1,
    'local_hukou_flag': 1,
    'children_count': 1,
    'single_child_burden_flag': 0,
    'absence_month': 5,
    'weekday': 2,
    'near_holiday_flag': 0,
    'leave_channel': '系统申请',
    'leave_type': '病假',
    'leave_reason_category': '身体不适',
    'medical_certificate_flag': 1,
    'urgent_leave_flag': 1,
    'continuous_absence_flag': 0,
    'previous_day_overtime_flag': 1,
 }
 def make_target_bins(y):
    y_series = pd.Series(y)
    bins = pd.cut(
        y_series,
        bins=[0, 4, 8, 12, np.inf],
        labels=['low', 'medium', 'high', 'extreme'],
        include_lowest=True,
    )
    return bins.astype(str)
 def normalize_columns(df):
    df = df.copy()
    df.columns = [col.strip() for col in df.columns]
    return df
 def prepare_modeling_dataframe(df):
    df = normalize_columns(df)
    drop_cols = [col for col in LEAKY_COLUMNS if col in df.columns]
    if drop_cols:
        df = df.drop(columns=drop_cols)
    return df
 def fit_outlier_bounds(df, columns, lower_pct=1, upper_pct=99):
    bounds = {}
    for col in columns:
        if col in df.columns and pd.api.types.is_numeric_dtype(df[col]):
            bounds[col] = (
                float(df[col].quantile(lower_pct / 100)),
                float(df[col].quantile(upper_pct / 100)),
            )
    return bounds
 def apply_outlier_bounds(df, bounds):
    df = df.copy()
    for col, (lower, upper) in bounds.items():
        if col in df.columns:
            df[col] = df[col].clip(lower, upper)
    return df
 def engineer_features(df):
    df = df.copy()
    df['加班通勤压力指数'] = (
        df['月均加班时长'] * 0.45
        + df['通勤时长分钟'] * 0.35
        + df['是否夜班岗位'] * 12
        + df['前一工作日是否加班'] * 6
    ) / 10
    df['家庭负担指数'] = (
        df['子女数量'] * 1.2
        + df['是否独生子女家庭负担'] * 1.5
        + (df['婚姻状态'] == '已婚').astype(int) * 0.6
    )
    df['健康风险指数'] = (
        df['是否慢性病史'] * 2
        + df['年度体检异常标记'] * 1.2
        + (df['BMI'] >= 28).astype(int) * 1.1
        + df['是否吸烟'] * 0.8
        + df['是否饮酒'] * 0.4
        + (df['近30天睡眠时长均值'] < 6.5).astype(int) * 1.2
    )
    df['岗位稳定性指数'] = (
        df['司龄年数'] * 0.3
        + (df['绩效等级'] == 'A').astype(int) * 1.2
        + (df['绩效等级'] == 'B').astype(int) * 0.8
        - df['近12月违纪次数'] * 0.7
    )
    df['节假日风险标记'] = (
        (df['是否节假日前后'] == 1) | (df['请假类型'].isin(['事假', '年假', '调休']))
    ).astype(int)
    df['排班压力标记'] = (
        (df['班次类型'].isin(['两班倒', '三班倒'])) | (df['是否夜班岗位'] == 1)
    ).astype(int)
    df['缺勤历史强度'] = df['近90天缺勤次数'] * 1.5 + df['近180天请假总时长'] / 12
    df['生活规律指数'] = (
        df['近30天睡眠时长均值'] * 0.6
        + df['每周运动频次'] * 0.7
        - df['是否吸烟'] * 1.1
        - df['是否饮酒'] * 0.5
    )
    df['管理负荷指数'] = df['团队人数'] * 0.4 + df['直属上级管理跨度'] * 0.25
    df['工龄分层'] = pd.cut(df['司龄年数'], bins=[0, 2, 5, 10, 40], labels=['1', '2', '3', '4'])
    df['年龄分层'] = pd.cut(df['年龄'], bins=[18, 25, 32, 40, 60], labels=['1', '2', '3', '4'])
    df['通勤分层'] = pd.cut(df['通勤时长分钟'], bins=[0, 25, 45, 70, 180], labels=['1', '2', '3', '4'])
    df['加班分层'] = pd.cut(df['月均加班时长'], bins=[-1, 10, 25, 45, 120], labels=['1', '2', '3', '4'])
    return df
 def fit_label_encoders(df, ordinal_columns=None):
    ordinal_columns = ordinal_columns or ORDINAL_COLUMNS
    df = df.copy()
    encoders = {}
    object_columns = df.select_dtypes(include=['object', 'category']).columns.tolist()
    encode_columns = sorted(set(object_columns + [col for col in ordinal_columns if col in df.columns]))
    for col in encode_columns:
        encoder = LabelEncoder()
        df[col] = encoder.fit_transform(df[col].astype(str))
        encoders[col] = encoder
    return df, encoders
 def apply_label_encoders(df, encoders):
    df = df.copy()
    for col, encoder in encoders.items():
        if col not in df.columns:
            continue
        value_map = {cls: idx for idx, cls in enumerate(encoder.classes_)}
        df[col] = df[col].astype(str).map(lambda value: value_map.get(value, 0))
    return df
 def extract_xy(df):
    y = df[TARGET_COLUMN].values if TARGET_COLUMN in df.columns else None
    X_df = df.drop(columns=[TARGET_COLUMN]) if TARGET_COLUMN in df.columns else df.copy()
    return X_df, y
 def build_prediction_dataframe(data):
    feature_row = {
        '企业编号': 'PREDICT_COMPANY',
        '所属行业': data.get('industry', DEFAULT_PREDICTION_INPUT['industry']),
        '企业规模': data.get('company_size', DEFAULT_PREDICTION_INPUT['company_size']),
        '所在城市等级': data.get('city_tier', DEFAULT_PREDICTION_INPUT['city_tier']),
        '用工类型': data.get('employment_type', DEFAULT_PREDICTION_INPUT['employment_type']),
        '部门条线': data.get('department_line', '研发'),
        '岗位序列': data.get('job_family', DEFAULT_PREDICTION_INPUT['job_family']),
        '岗位级别': data.get('job_level', DEFAULT_PREDICTION_INPUT['job_level']),
        '员工编号': 'PREDICT_EMPLOYEE',
        '性别': data.get('gender', '男'),
        '年龄': data.get('age', DEFAULT_PREDICTION_INPUT['age']),
        '司龄年数': data.get('tenure_years', DEFAULT_PREDICTION_INPUT['tenure_years']),
        '最高学历': data.get('education_level', DEFAULT_PREDICTION_INPUT['education_level']),
        '婚姻状态': data.get('marital_status', DEFAULT_PREDICTION_INPUT['marital_status']),
        '是否本地户籍': data.get('local_hukou_flag', DEFAULT_PREDICTION_INPUT['local_hukou_flag']),
        '子女数量': data.get('children_count', DEFAULT_PREDICTION_INPUT['children_count']),
        '是否独生子女家庭负担': data.get(
            'single_child_burden_flag',
            DEFAULT_PREDICTION_INPUT['single_child_burden_flag'],
        ),
        '居住类型': data.get('housing_type', '租房'),
        '班次类型': data.get('shift_type', DEFAULT_PREDICTION_INPUT['shift_type']),
        '是否夜班岗位': data.get('is_night_shift', DEFAULT_PREDICTION_INPUT['is_night_shift']),
        '月均加班时长': data.get(
            'monthly_overtime_hours',
            DEFAULT_PREDICTION_INPUT['monthly_overtime_hours'],
        ),
        '近30天出勤天数': data.get(
            'attendance_days_30d',
            DEFAULT_PREDICTION_INPUT['attendance_days_30d'],
        ),
        '近90天缺勤次数': data.get('absence_count_90d', DEFAULT_PREDICTION_INPUT['absence_count_90d']),
        '近180天请假总时长': data.get('leave_hours_180d', DEFAULT_PREDICTION_INPUT['leave_hours_180d']),
        '通勤时长分钟': data.get('commute_minutes', DEFAULT_PREDICTION_INPUT['commute_minutes']),
        '通勤距离公里': data.get('commute_km', DEFAULT_PREDICTION_INPUT['commute_km']),
        '是否跨城通勤': data.get(
            'cross_city_commute',
            DEFAULT_PREDICTION_INPUT['cross_city_commute'],
        ),
        '绩效等级': data.get('performance_level', DEFAULT_PREDICTION_INPUT['performance_level']),
        '近12月违纪次数': data.get(
            'disciplinary_count_12m',
            DEFAULT_PREDICTION_INPUT['disciplinary_count_12m'],
        ),
        '团队人数': data.get('team_size', DEFAULT_PREDICTION_INPUT['team_size']),
        '直属上级管理跨度': data.get('manager_span', DEFAULT_PREDICTION_INPUT['manager_span']),
        'BMI': data.get('bmi', DEFAULT_PREDICTION_INPUT['bmi']),
        '是否慢性病史': data.get(
            'chronic_disease_flag',
            DEFAULT_PREDICTION_INPUT['chronic_disease_flag'],
        ),
        '年度体检异常标记': data.get(
            'annual_check_abnormal_flag',
            DEFAULT_PREDICTION_INPUT['annual_check_abnormal_flag'],
        ),
        '近30天睡眠时长均值': data.get('sleep_hours', DEFAULT_PREDICTION_INPUT['sleep_hours']),
        '每周运动频次': data.get(
            'exercise_frequency',
            DEFAULT_PREDICTION_INPUT['exercise_frequency'],
        ),
        '是否吸烟': data.get('smoking_flag', DEFAULT_PREDICTION_INPUT['smoking_flag']),
        '是否饮酒': data.get('drinking_flag', DEFAULT_PREDICTION_INPUT['drinking_flag']),
        '心理压力等级': data.get('stress_level', DEFAULT_PREDICTION_INPUT['stress_level']),
        '是否长期久坐岗位': data.get(
            'sedentary_job_flag',
            DEFAULT_PREDICTION_INPUT['sedentary_job_flag'],
        ),
        '缺勤月份': data.get('absence_month', DEFAULT_PREDICTION_INPUT['absence_month']),
        '星期几': data.get('weekday', DEFAULT_PREDICTION_INPUT['weekday']),
        '是否节假日前后': data.get('near_holiday_flag', DEFAULT_PREDICTION_INPUT['near_holiday_flag']),
        '季节': _season_from_month(data.get('absence_month', DEFAULT_PREDICTION_INPUT['absence_month'])),
        '请假申请渠道': data.get('leave_channel', DEFAULT_PREDICTION_INPUT['leave_channel']),
        '请假类型': data.get('leave_type', DEFAULT_PREDICTION_INPUT['leave_type']),
        '请假原因大类': data.get(
            'leave_reason_category',
            DEFAULT_PREDICTION_INPUT['leave_reason_category'],
        ),
        '是否提供医院证明': data.get(
            'medical_certificate_flag',
            DEFAULT_PREDICTION_INPUT['medical_certificate_flag'],
        ),
        '是否临时请假': data.get('urgent_leave_flag', DEFAULT_PREDICTION_INPUT['urgent_leave_flag']),
        '是否连续缺勤': data.get(
            'continuous_absence_flag',
            DEFAULT_PREDICTION_INPUT['continuous_absence_flag'],
        ),
        '前一工作日是否加班': data.get(
            'previous_day_overtime_flag',
            DEFAULT_PREDICTION_INPUT['previous_day_overtime_flag'],
        ),
    }
    return pd.DataFrame([feature_row])
 def _season_from_month(month):
    month = int(month)
    if month in [12, 1, 2]:
        return 1
    if month in [3, 4, 5]:
        return 2
    if month in [6, 7, 8]:
        return 3
    return 4
 def align_feature_frame(df, feature_names):
    aligned = df.copy()
    for feature in feature_names:
        if feature not in aligned.columns:
            aligned[feature] = 0
    return aligned[feature_names]
 def to_float_array(df):
    return df.values.astype(float)
--- a/backend/core/preprocessing.py
+++ b/backend/core/preprocessing.py
@@ -1,10 +1,11 @@
 import pandas as pd
 import numpy as np
 from sklearn.preprocessing import StandardScaler
 import joblib
 import os
 import joblib
 import pandas as pd
 from sklearn.preprocessing import StandardScaler
 import config
 from core.generate_dataset import ensure_dataset
 class DataPreprocessor:
@@ -12,67 +13,57 @@ class DataPreprocessor:
        self.scaler = StandardScaler()
        self.is_fitted = False
        self.feature_names = None
-        
+
    def load_raw_data(self):
        ensure_dataset()
        df = pd.read_csv(config.RAW_DATA_PATH, sep=config.CSV_SEPARATOR)
        df.columns = df.columns.str.strip()
        return df
-    
+
    def clean_data(self, df):
        df = df.copy()
        df = df.drop_duplicates()
-        
+
        for col in df.columns:
-            if df[col].isnull().sum() > 0:
+            if df[col].isnull().sum() == 0:
-                if df[col].dtype in ['int64', 'float64']:
+                continue
-                    df[col].fillna(df[col].median(), inplace=True)
+            if pd.api.types.is_numeric_dtype(df[col]):
-                else:
+                df[col] = df[col].fillna(df[col].median())
-                    df[col].fillna(df[col].mode()[0], inplace=True)
+            else:
-        
+                df[col] = df[col].fillna(df[col].mode()[0])
        return df
-    
+
    def fit_transform(self, df):
        df = self.clean_data(df)
-        
+        if config.TARGET_COLUMN in df.columns:
-        if 'Absenteeism time in hours' in df.columns:
+            y = df[config.TARGET_COLUMN].values
-            y = df['Absenteeism time in hours'].values
+            feature_df = df.drop(columns=[config.TARGET_COLUMN])
            feature_df = df.drop(columns=['Absenteeism time in hours'])
        else:
            y = None
            feature_df = df
-        
+
        self.feature_names = list(feature_df.columns)
-        
+        X = self.scaler.fit_transform(feature_df.values)
        X = feature_df.values
        X = self.scaler.fit_transform(X)
        self.is_fitted = True
        return X, y
-    
+
    def transform(self, df):
        if not self.is_fitted:
            raise ValueError("Preprocessor has not been fitted yet.")
-        
+
        df = self.clean_data(df)
-        
+        if config.TARGET_COLUMN in df.columns:
-        if 'Absenteeism time in hours' in df.columns:
+            feature_df = df.drop(columns=[config.TARGET_COLUMN])
            feature_df = df.drop(columns=['Absenteeism time in hours'])
        else:
            feature_df = df
-        
+        return self.scaler.transform(feature_df.values)
-        X = feature_df.values
+
        X = self.scaler.transform(X)
        return X
    def save_preprocessor(self):
        os.makedirs(config.MODELS_DIR, exist_ok=True)
        joblib.dump(self.scaler, config.SCALER_PATH)
        joblib.dump(self.feature_names, os.path.join(config.MODELS_DIR, 'feature_names.pkl'))
-    
+
    def load_preprocessor(self):
        self.scaler = joblib.load(config.SCALER_PATH)
        feature_names_path = os.path.join(config.MODELS_DIR, 'feature_names.pkl')
@@ -84,22 +75,18 @@ class DataPreprocessor:
 def get_clean_data():
    preprocessor = DataPreprocessor()
    df = preprocessor.load_raw_data()
-    df = preprocessor.clean_data(df)
+    return preprocessor.clean_data(df)
    return df
 def save_clean_data():
    preprocessor = DataPreprocessor()
    df = preprocessor.load_raw_data()
    df = preprocessor.clean_data(df)
    os.makedirs(config.PROCESSED_DATA_DIR, exist_ok=True)
    df.to_csv(config.CLEAN_DATA_PATH, index=False, sep=',')
    return df
 if __name__ == '__main__':
-    df = save_clean_data()
+    data = save_clean_data()
-    print(f"Clean data saved. Shape: {df.shape}")
+    print(f"Clean data saved. Shape: {data.shape}")
    print(df.head())
--- a/backend/core/train_model.py
+++ b/backend/core/train_model.py
@@ -1,123 +1,95 @@
 import sys
 import os
 import sys
 import time
 import inspect
 from datetime import datetime
 import joblib
 import numpy as np
 from sklearn.ensemble import ExtraTreesRegressor, GradientBoostingRegressor, RandomForestRegressor
 from sklearn.feature_selection import SelectKBest, f_regression
 from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
 from sklearn.model_selection import RandomizedSearchCV, train_test_split
 from sklearn.preprocessing import RobustScaler
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 import pandas as pd
 import numpy as np
 import time
 from sklearn.ensemble import (
    RandomForestRegressor, 
    GradientBoostingRegressor,
    ExtraTreesRegressor,
    StackingRegressor
 )
 from sklearn.linear_model import Ridge
 from sklearn.model_selection import train_test_split, RandomizedSearchCV
 from sklearn.preprocessing import RobustScaler, LabelEncoder
 from sklearn.feature_selection import SelectKBest, f_regression
 from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
 import xgboost as xgb
 import lightgbm as lgb
 import joblib
 import warnings
 warnings.filterwarnings('ignore')
 import config
 from core.deep_learning_model import is_available as deep_learning_available
 from core.deep_learning_model import train_lstm_mlp
 from core.model_features import (
    NUMERICAL_OUTLIER_COLUMNS,
    ORDINAL_COLUMNS,
    TARGET_COLUMN,
    align_feature_frame,
    apply_label_encoders,
    apply_outlier_bounds,
    engineer_features,
    extract_xy,
    fit_label_encoders,
    fit_outlier_bounds,
    make_target_bins,
    normalize_columns,
    prepare_modeling_dataframe,
    to_float_array,
 )
 from core.preprocessing import get_clean_data
 try:
    import lightgbm as lgb
 except ImportError:
    lgb = None
 try:
    import xgboost as xgb
 except ImportError:
    xgb = None
 def patch_lightgbm_sklearn_compatibility():
    if lgb is None:
        return
    try:
        from sklearn.utils.validation import check_X_y
    except Exception:
        return
    params = inspect.signature(check_X_y).parameters
    if 'force_all_finite' in params:
        return
    def wrapped_check_X_y(*args, force_all_finite=None, **kwargs):
        if (
            force_all_finite is not None
            and 'ensure_all_finite' in params
            and 'ensure_all_finite' not in kwargs
        ):
            kwargs['ensure_all_finite'] = force_all_finite
        return check_X_y(*args, **kwargs)
    try:
        import lightgbm.compat as lgb_compat
        import lightgbm.sklearn as lgb_sklearn
        lgb_compat._LGBMCheckXY = wrapped_check_X_y
        lgb_sklearn._LGBMCheckXY = wrapped_check_X_y
    except Exception:
        pass
 patch_lightgbm_sklearn_compatibility()
 def print_training_log(model_name, start_time, best_score, best_params, n_iter, cv_folds):
    elapsed = time.time() - start_time
-    print(f"  {'─'*50}")
+    print(f'  {"-" * 50}')
-    print(f"  Model: {model_name}")
+    print(f'  Model: {model_name}')
-    print(f"  Time: {elapsed:.1f}s")
+    print(f'  Time: {elapsed:.1f}s')
-    print(f"  Best CV R2: {best_score:.4f}")
+    print(f'  Best CV R2: {best_score:.4f}')
-    print(f"  Best params:")
+    for key, value in best_params.items():
-    for k, v in best_params.items():
+        print(f'    - {key}: {value}')
-        print(f"    - {k}: {v}")
+    print(f'  Iterations: {n_iter}, CV folds: {cv_folds}')
    print(f"  Iterations: {n_iter}, CV folds: {cv_folds}")
    print(f"  {'─'*50}")
 class DataAugmenter:
    def __init__(self, noise_level=0.02, n_augment=2):
        self.noise_level = noise_level
        self.n_augment = n_augment
    def augment(self, df, target_col='Absenteeism time in hours'):
        print(f"\nData Augmentation...")
        print(f"  Original size: {len(df)}")
        augmented_dfs = [df]
        numerical_cols = df.select_dtypes(include=[np.number]).columns.tolist()
        if target_col in numerical_cols:
            numerical_cols.remove(target_col)
        for i in range(self.n_augment):
            df_aug = df.copy()
            for col in numerical_cols:
                if col in df_aug.columns:
                    std_val = df_aug[col].std()
                    if std_val > 0:
                        noise = np.random.normal(0, self.noise_level * std_val, len(df_aug))
                        df_aug[col] = df_aug[col] + noise
            augmented_dfs.append(df_aug)
        df_result = pd.concat(augmented_dfs, ignore_index=True)
        print(f"  Augmented size: {len(df_result)}")
        return df_result
    def smote_regression(self, df, target_col='Absenteeism time in hours'):
        df = df.copy()
        y = df[target_col].values
        bins = [0, 1, 4, 8, 100]
        labels = ['zero', 'low', 'medium', 'high']
        df['_target_bin'] = pd.cut(y, bins=bins, labels=labels, include_lowest=True)
        bin_counts = df['_target_bin'].value_counts()
        max_count = bin_counts.max()
        numerical_cols = df.select_dtypes(include=[np.number]).columns.tolist()
        if target_col in numerical_cols:
            numerical_cols.remove(target_col)
        if '_target_bin' in numerical_cols:
            numerical_cols.remove('_target_bin')
        augmented_rows = []
        for bin_label in labels:
            bin_df = df[df['_target_bin'] == bin_label].drop(columns=['_target_bin'])
            bin_size = len(bin_df)
            if bin_size < max_count and bin_size > 0:
                n_samples_to_add = max_count - bin_size
                for _ in range(n_samples_to_add):
                    idx = np.random.choice(bin_df.index)
                    sample = bin_df.loc[idx].copy()
                    for col in numerical_cols:
                        if col in sample.index:
                            std_val = bin_df[col].std()
                            if std_val > 0:
                                noise = np.random.normal(0, 0.02 * std_val)
                                sample[col] = sample[col] + noise
                    augmented_rows.append(sample)
        if augmented_rows:
            df_aug = pd.DataFrame(augmented_rows)
            df_result = pd.concat([df.drop(columns=['_target_bin']), df_aug], ignore_index=True)
        else:
            df_result = df.drop(columns=['_target_bin'])
        print(f"  After SMOTE-like augmentation: {len(df_result)}")
        return df_result
 class OptimizedModelTrainer:
@@ -128,461 +100,267 @@ class OptimizedModelTrainer:
        self.selected_features = None
        self.label_encoders = {}
        self.model_metrics = {}
-        self.augmenter = DataAugmenter(noise_level=0.02, n_augment=2)
+        self.training_metadata = {}
-        
+        self.feature_selector = None
        self.outlier_bounds = {}
        self.feature_k = 22
        self.target_transform = 'log1p'
        self.enabled_models = ['random_forest', 'gradient_boosting', 'extra_trees', 'lightgbm', 'xgboost']
        if deep_learning_available():
            self.enabled_models.append('lstm_mlp')
        self.raw_train_df = None
        self.raw_test_df = None
    def analyze_data(self, df):
-        print("\n" + "="*60)
+        y = df[TARGET_COLUMN]
-        print("Data Analysis")
+        print('\nData Analysis')
-        print("="*60)
+        print(f'  Samples: {len(df)}')
-        
+        print(f'  Mean: {y.mean():.2f}, Median: {y.median():.2f}, Std: {y.std():.2f}')
-        y = df['Absenteeism time in hours']
+        print(f'  High risk ratio (>8h): {(y > 8).mean() * 100:.1f}%')
-        
+
        print(f"\nTarget variable statistics:")
        print(f"  Min: {y.min()}")
        print(f"  Max: {y.max()}")
        print(f"  Mean: {y.mean():.2f}")
        print(f"  Median: {y.median():.2f}")
        print(f"  Std: {y.std():.2f}")
        print(f"  Skewness: {y.skew():.2f}")
        print(f"\nTarget distribution:")
        print(f"  Zero values: {(y == 0).sum()} ({(y == 0).sum() / len(y) * 100:.1f}%)")
        print(f"  1-8 hours: {((y > 0) & (y <= 8)).sum()} ({((y > 0) & (y <= 8)).sum() / len(y) * 100:.1f}%)")
        print(f"  >8 hours: {(y > 8).sum()} ({(y > 8).sum() / len(y) * 100:.1f}%)")
        return y
    def clip_outliers(self, df, columns, lower_pct=1, upper_pct=99):
        df_clean = df.copy()
        for col in columns:
            if col in df_clean.columns and df_clean[col].dtype in ['int64', 'float64']:
                if col == 'Absenteeism time in hours':
                    continue
                lower = df_clean[col].quantile(lower_pct / 100)
                upper = df_clean[col].quantile(upper_pct / 100)
                df_clean[col] = df_clean[col].clip(lower, upper)
        return df_clean
    def feature_engineering(self, df):
        df = df.copy()
        df['workload_per_age'] = df['Work load Average/day'] / (df['Age'] + 1)
        df['expense_per_distance'] = df['Transportation expense'] / (df['Distance from Residence to Work'] + 1)
        df['age_service_ratio'] = df['Age'] / (df['Service time'] + 1)
        df['has_children'] = (df['Son'] > 0).astype(int)
        df['has_pet'] = (df['Pet'] > 0).astype(int)
        df['family_responsibility'] = df['Son'] + df['Pet']
        df['health_risk'] = ((df['Social drinker'] == 1) | (df['Social smoker'] == 1) | (df['Body mass index'] > 30)).astype(int)
        df['lifestyle_risk'] = df['Social drinker'].astype(int) + df['Social smoker'].astype(int)
        df['age_group'] = pd.cut(df['Age'], bins=[0, 30, 40, 50, 100], labels=[1, 2, 3, 4])
        df['service_group'] = pd.cut(df['Service time'], bins=[0, 5, 10, 20, 100], labels=[1, 2, 3, 4])
        df['bmi_category'] = pd.cut(df['Body mass index'], bins=[0, 18.5, 25, 30, 100], labels=[1, 2, 3, 4])
        df['workload_category'] = pd.cut(df['Work load Average/day'], bins=[0, 200, 250, 300, 500], labels=[1, 2, 3, 4])
        df['commute_category'] = pd.cut(df['Distance from Residence to Work'], bins=[0, 10, 20, 50, 100], labels=[1, 2, 3, 4])
        df['seasonal_risk'] = df['Seasons'].apply(lambda x: 1 if x in [1, 3] else 0)
        df['weekday_risk'] = df['Day of the week'].apply(lambda x: 1 if x in [2, 6] else 0)
        df['hit_target_ratio'] = df['Hit target'] / 100
        df['experience_level'] = pd.cut(df['Service time'], bins=[0, 5, 10, 15, 100], labels=[1, 2, 3, 4])
        df['age_workload_interaction'] = df['Age'] * df['Work load Average/day'] / 10000
        df['service_bmi_interaction'] = df['Service time'] * df['Body mass index'] / 100
        return df
    def select_features(self, X, y, k=20):
        print("\nFeature Selection...")
        selector = SelectKBest(score_func=f_regression, k=min(k, X.shape[1]))
        selector.fit(X, y)
-        
+        self.feature_selector = selector
-        scores = selector.scores_
+        mask = selector.get_support()
-        feature_scores = list(zip(self.feature_names, scores))
+        self.selected_features = [name for name, keep in zip(self.feature_names, mask) if keep]
        feature_scores.sort(key=lambda x: x[1], reverse=True)
        print(f"\nTop {min(k, len(feature_scores))} features by F-score:")
        for i, (name, score) in enumerate(feature_scores[:min(k, len(feature_scores))]):
            cn = config.FEATURE_NAME_CN.get(name, name)
            print(f"  {i+1}. {cn}: {score:.2f}")
        selected_mask = selector.get_support()
        self.selected_features = [f for f, s in zip(self.feature_names, selected_mask) if s]
        return selector.transform(X)
-    
+
    def transform_target(self, y):
        return np.log1p(np.clip(y, a_min=0, a_max=None)) if self.target_transform == 'log1p' else y
    def inverse_transform_target(self, y_pred):
        return np.expm1(y_pred) if self.target_transform == 'log1p' else y_pred
    def transform_features(self, X_df):
        X_df = align_feature_frame(X_df, self.feature_names)
        X = self.scaler.transform(to_float_array(X_df))
        return self.feature_selector.transform(X) if self.feature_selector else X
    def prepare_data(self):
-        df = get_clean_data()
+        raw_df = normalize_columns(get_clean_data())
-        df.columns = [col.strip() for col in df.columns]
+        self.analyze_data(prepare_modeling_dataframe(raw_df.copy()))
-        
+
-        df = df.drop(columns=['ID'])
+        target_bins = make_target_bins(raw_df[TARGET_COLUMN].values)
-        
+        raw_train_df, raw_test_df = train_test_split(
-        cols_to_drop = ['Weight', 'Height', 'Reason for absence']
+            raw_df,
-        for col in cols_to_drop:
+            test_size=config.TEST_SIZE,
-            if col in df.columns:
+            random_state=config.RANDOM_STATE,
-                df = df.drop(columns=[col])
+            stratify=target_bins,
        print("  Removed features: Weight, Height, Reason for absence (data leakage risk)")
        self.analyze_data(df)
        print("\n" + "="*60)
        print("Data Preprocessing")
        print("="*60)
        numerical_cols = ['Age', 'Service time', 'Work load Average/day', 
                         'Transportation expense', 'Distance from Residence to Work',
                         'Hit target', 'Body mass index']
        df = self.clip_outliers(df, numerical_cols)
        print("  Outliers clipped (1st-99th percentile)")
        print("\n" + "="*60)
        print("Data Augmentation")
        print("="*60)
        df = self.augmenter.smote_regression(df)
        df = self.augmenter.augment(df)
        print("\n" + "="*60)
        print("Feature Engineering")
        print("="*60)
        df = self.feature_engineering(df)
        y = df['Absenteeism time in hours'].values
        X_df = df.drop(columns=['Absenteeism time in hours'])
        ordinal_cols = ['Month of absence', 'Day of the week', 'Seasons', 
                       'Disciplinary failure', 'Education', 'Social drinker', 
                       'Social smoker', 'age_group', 'service_group', 
                       'bmi_category', 'workload_category', 'commute_category',
                       'experience_level']
        for col in ordinal_cols:
            if col in X_df.columns:
                le = LabelEncoder()
                X_df[col] = le.fit_transform(X_df[col].astype(str))
                self.label_encoders[col] = le
        self.feature_names = list(X_df.columns)
        X = X_df.values.astype(float)
        X = self.scaler.fit_transform(X)
        X = self.select_features(X, y, k=20)
        print(f"\nFinal feature count: {X.shape[1]}")
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42
        )
-        
+        self.raw_train_df = raw_train_df.reset_index(drop=True)
        self.raw_test_df = raw_test_df.reset_index(drop=True)
        train_df = prepare_modeling_dataframe(self.raw_train_df)
        test_df = prepare_modeling_dataframe(self.raw_test_df)
        self.outlier_bounds = fit_outlier_bounds(train_df, NUMERICAL_OUTLIER_COLUMNS)
        train_df = apply_outlier_bounds(train_df, self.outlier_bounds)
        test_df = apply_outlier_bounds(test_df, self.outlier_bounds)
        train_df = engineer_features(train_df)
        test_df = engineer_features(test_df)
        X_train_df, y_train = extract_xy(train_df)
        X_test_df, y_test = extract_xy(test_df)
        X_train_df, self.label_encoders = fit_label_encoders(X_train_df, ORDINAL_COLUMNS)
        X_test_df = apply_label_encoders(X_test_df, self.label_encoders)
        self.feature_names = list(X_train_df.columns)
        X_test_df = align_feature_frame(X_test_df, self.feature_names)
        X_train = self.scaler.fit_transform(to_float_array(X_train_df))
        X_test = self.scaler.transform(to_float_array(X_test_df))
        transformed_target = self.transform_target(y_train)
        X_train = self.select_features(X_train, transformed_target, k=self.feature_k)
        X_test = self.transform_features(X_test_df)
        self.training_metadata = {
            'train_samples': int(len(train_df)),
            'test_samples': int(len(test_df)),
            'feature_count_before_selection': int(len(self.feature_names)),
            'feature_count_after_selection': int(X_train.shape[1]),
            'training_date': datetime.now().strftime('%Y-%m-%d'),
            'target_transform': self.target_transform,
            'available_models': [],
            'deep_learning_available': False,
        }
        return X_train, X_test, y_train, y_test
-    
+
    def _run_search(self, name, estimator, params, X_train, y_train, n_iter=12):
        start_time = time.time()
        search = RandomizedSearchCV(
            estimator,
            param_distributions=params,
            n_iter=n_iter,
            cv=4,
            scoring='r2',
            n_jobs=-1,
            random_state=config.RANDOM_STATE,
        )
        search.fit(X_train, y_train)
        self.models[name] = search.best_estimator_
        print_training_log(name, start_time, search.best_score_, search.best_params_, n_iter, 4)
    def train_random_forest(self, X_train, y_train):
-        print("\n" + "="*60)
+        self._run_search(
-        print("Training Random Forest")
+            'random_forest',
-        print("="*60)
+            RandomForestRegressor(random_state=config.RANDOM_STATE, n_jobs=-1),
-        
+            {
-        start_time = time.time()
+                'n_estimators': [200, 300, 400],
-        rf = RandomForestRegressor(random_state=42, n_jobs=-1)
+                'max_depth': [10, 14, 18, None],
-        
+                'min_samples_split': [2, 4, 8],
-        param_distributions = {
+                'min_samples_leaf': [1, 2, 3],
-            'n_estimators': [200, 300, 400],
+                'max_features': ['sqrt', 0.7],
-            'max_depth': [10, 15, 20, 25],
+            },
-            'min_samples_split': [2, 5, 10],
+            X_train,
-            'min_samples_leaf': [1, 2, 4],
+            y_train,
            'max_features': ['sqrt', 0.7]
        }
        print(f"  Searching {20*5} parameter combinations...")
        random_search = RandomizedSearchCV(
            rf, param_distributions, n_iter=20, cv=5, 
            scoring='r2', n_jobs=-1, random_state=42
        )
-        random_search.fit(X_train, y_train)
+
        self.models['random_forest'] = random_search.best_estimator_
        print_training_log("Random Forest", start_time, random_search.best_score_, 
                          random_search.best_params_, 20, 5)
        return random_search.best_estimator_
    def train_xgboost(self, X_train, y_train):
        print("\n" + "="*60)
        print("Training XGBoost")
        print("="*60)
        start_time = time.time()
        xgb_model = xgb.XGBRegressor(random_state=42, n_jobs=-1)
        param_distributions = {
            'n_estimators': [200, 300, 400],
            'max_depth': [5, 7, 9],
            'learning_rate': [0.05, 0.1],
            'subsample': [0.7, 0.8],
            'colsample_bytree': [0.7, 0.8],
            'min_child_weight': [1, 3],
            'reg_alpha': [0, 0.1],
            'reg_lambda': [1, 1.5]
        }
        print(f"  Searching {20*5} parameter combinations...")
        random_search = RandomizedSearchCV(
            xgb_model, param_distributions, n_iter=20, cv=5,
            scoring='r2', n_jobs=-1, random_state=42
        )
        random_search.fit(X_train, y_train)
        self.models['xgboost'] = random_search.best_estimator_
        print_training_log("XGBoost", start_time, random_search.best_score_,
                          random_search.best_params_, 20, 5)
        return random_search.best_estimator_
    def train_lightgbm(self, X_train, y_train):
        print("\n" + "="*60)
        print("Training LightGBM")
        print("="*60)
        start_time = time.time()
        lgb_model = lgb.LGBMRegressor(random_state=42, n_jobs=-1, verbose=-1)
        param_distributions = {
            'n_estimators': [200, 300, 400],
            'max_depth': [7, 9, 11, -1],
            'learning_rate': [0.05, 0.1],
            'subsample': [0.7, 0.8],
            'colsample_bytree': [0.7, 0.8],
            'min_child_samples': [5, 10, 20],
            'reg_alpha': [0, 0.1],
            'reg_lambda': [1, 1.5],
            'num_leaves': [31, 50, 70]
        }
        print(f"  Searching {20*5} parameter combinations...")
        random_search = RandomizedSearchCV(
            lgb_model, param_distributions, n_iter=20, cv=5,
            scoring='r2', n_jobs=-1, random_state=42
        )
        random_search.fit(X_train, y_train)
        self.models['lightgbm'] = random_search.best_estimator_
        print_training_log("LightGBM", start_time, random_search.best_score_,
                          random_search.best_params_, 20, 5)
        return random_search.best_estimator_
    def train_gradient_boosting(self, X_train, y_train):
-        print("\n" + "="*60)
+        self._run_search(
-        print("Training Gradient Boosting")
+            'gradient_boosting',
-        print("="*60)
+            GradientBoostingRegressor(random_state=config.RANDOM_STATE),
-        
+            {
-        start_time = time.time()
+                'n_estimators': [160, 220, 300],
-        gb = GradientBoostingRegressor(random_state=42)
+                'max_depth': [3, 4, 5],
-        
+                'learning_rate': [0.03, 0.05, 0.08],
-        param_distributions = {
+                'subsample': [0.7, 0.85, 1.0],
-            'n_estimators': [200, 300],
+                'min_samples_split': [2, 4, 6],
-            'max_depth': [5, 7, 9],
+                'min_samples_leaf': [1, 2, 3],
-            'learning_rate': [0.05, 0.1],
+            },
-            'subsample': [0.7, 0.8],
+            X_train,
-            'min_samples_split': [2, 5],
+            y_train,
            'min_samples_leaf': [1, 2]
        }
        print(f"  Searching {15*5} parameter combinations...")
        random_search = RandomizedSearchCV(
            gb, param_distributions, n_iter=15, cv=5,
            scoring='r2', n_jobs=-1, random_state=42
        )
-        random_search.fit(X_train, y_train)
+
        self.models['gradient_boosting'] = random_search.best_estimator_
        print_training_log("Gradient Boosting", start_time, random_search.best_score_,
                          random_search.best_params_, 15, 5)
        return random_search.best_estimator_
    def train_extra_trees(self, X_train, y_train):
-        print("\n" + "="*60)
+        self._run_search(
-        print("Training Extra Trees")
+            'extra_trees',
-        print("="*60)
+            ExtraTreesRegressor(random_state=config.RANDOM_STATE, n_jobs=-1),
-        
+            {
-        start_time = time.time()
+                'n_estimators': [220, 320, 420],
-        et = ExtraTreesRegressor(random_state=42, n_jobs=-1)
+                'max_depth': [10, 15, 20, None],
-        
+                'min_samples_split': [2, 4, 8],
-        param_distributions = {
+                'min_samples_leaf': [1, 2, 3],
-            'n_estimators': [200, 300, 400],
+                'max_features': ['sqrt', 0.7],
-            'max_depth': [10, 15, 20],
+            },
-            'min_samples_split': [2, 5, 10],
+            X_train,
-            'min_samples_leaf': [1, 2, 4],
+            y_train,
            'max_features': ['sqrt', 0.7]
        }
        print(f"  Searching {20*5} parameter combinations...")
        random_search = RandomizedSearchCV(
            et, param_distributions, n_iter=20, cv=5,
            scoring='r2', n_jobs=-1, random_state=42
        )
-        random_search.fit(X_train, y_train)
+
-        
+    def train_lightgbm(self, X_train, y_train):
-        self.models['extra_trees'] = random_search.best_estimator_
+        if lgb is None:
-        print_training_log("Extra Trees", start_time, random_search.best_score_,
+            return
-                          random_search.best_params_, 20, 5)
+        try:
-        
+            self._run_search(
-        return random_search.best_estimator_
+                'lightgbm',
-    
+                lgb.LGBMRegressor(random_state=config.RANDOM_STATE, n_jobs=-1, verbose=-1),
-    def train_stacking(self, X_train, y_train):
+                {
-        print("\n" + "="*60)
+                    'n_estimators': [180, 260, 340],
-        print("Training Stacking Ensemble")
+                    'max_depth': [7, 9, -1],
-        print("="*60)
+                    'learning_rate': [0.03, 0.05, 0.08],
-        
+                    'subsample': [0.7, 0.85, 1.0],
-        start_time = time.time()
+                    'colsample_bytree': [0.7, 0.85, 1.0],
-        base_estimators = []
+                    'num_leaves': [31, 50, 70],
-        
+                },
-        if 'random_forest' in self.models:
+                X_train,
-            base_estimators.append(('rf', self.models['random_forest']))
+                y_train,
-        if 'xgboost' in self.models:
+            )
-            base_estimators.append(('xgb', self.models['xgboost']))
+        except Exception as exc:
-        if 'lightgbm' in self.models:
+            print(f'  {"-" * 50}')
-            base_estimators.append(('lgb', self.models['lightgbm']))
+            print('  Model: lightgbm')
-        if 'gradient_boosting' in self.models:
+            print(f'  Skipped: {exc}')
-            base_estimators.append(('gb', self.models['gradient_boosting']))
+
-        
+    def train_xgboost(self, X_train, y_train):
-        if len(base_estimators) < 2:
+        if xgb is None:
-            print("  Not enough base models for stacking")
+            return
-            return None
+        self._run_search(
-        
+            'xgboost',
-        print(f"  Base estimators: {[name for name, _ in base_estimators]}")
+            xgb.XGBRegressor(random_state=config.RANDOM_STATE, n_jobs=-1),
-        print(f"  Meta learner: Ridge")
+            {
-        print(f"  CV folds: 5")
+                'n_estimators': [180, 260, 340],
-        
+                'max_depth': [4, 6, 8],
-        stacking = StackingRegressor(
+                'learning_rate': [0.03, 0.05, 0.08],
-            estimators=base_estimators,
+                'subsample': [0.7, 0.85, 1.0],
-            final_estimator=Ridge(alpha=1.0),
+                'colsample_bytree': [0.7, 0.85, 1.0],
-            cv=5,
+                'min_child_weight': [1, 3, 5],
-            n_jobs=-1
+            },
            X_train,
            y_train,
        )
-        stacking.fit(X_train, y_train)
+
        self.models['stacking'] = stacking
        elapsed = time.time() - start_time
        print(f"  {'─'*50}")
        print(f"  Stacking ensemble created in {elapsed:.1f}s")
        print(f"  {'─'*50}")
        return stacking
    def evaluate_model(self, model, X_test, y_test):
-        y_pred = model.predict(X_test)
+        y_pred = self.inverse_transform_target(model.predict(X_test))
-        
+        y_pred = np.clip(y_pred, a_min=0, a_max=None)
        r2 = r2_score(y_test, y_pred)
        mse = mean_squared_error(y_test, y_pred)
        rmse = np.sqrt(mse)
        mae = mean_absolute_error(y_test, y_pred)
        return {
-            'r2': round(r2, 4),
+            'r2': round(r2_score(y_test, y_pred), 4),
            'mse': round(mse, 4),
-            'rmse': round(rmse, 4),
+            'rmse': round(np.sqrt(mse), 4),
-            'mae': round(mae, 4)
+            'mae': round(mean_absolute_error(y_test, y_pred), 4),
        }
-    
+
    def save_models(self):
        os.makedirs(config.MODELS_DIR, exist_ok=True)
        for name, model in self.models.items():
-            if model is not None:
+            joblib.dump(model, os.path.join(config.MODELS_DIR, f'{name}_model.pkl'))
-                model_path = os.path.join(config.MODELS_DIR, f'{name}_model.pkl')
+        self.training_metadata['available_models'] = list(self.model_metrics.keys())
                joblib.dump(model, model_path)
                print(f"  {name} saved")
        joblib.dump(self.scaler, config.SCALER_PATH)
        joblib.dump(self.feature_names, os.path.join(config.MODELS_DIR, 'feature_names.pkl'))
        joblib.dump(self.selected_features, os.path.join(config.MODELS_DIR, 'selected_features.pkl'))
        joblib.dump(self.label_encoders, os.path.join(config.MODELS_DIR, 'label_encoders.pkl'))
        joblib.dump(self.model_metrics, os.path.join(config.MODELS_DIR, 'model_metrics.pkl'))
-        print("  Scaler and feature info saved")
+        joblib.dump(self.training_metadata, os.path.join(config.MODELS_DIR, 'training_metadata.pkl'))
-    
+
    def train_all(self):
-        total_start = time.time()
+        print('\nOptimized Model Training Started')
        print("\n" + "="*60)
        print("Optimized Model Training Started")
        print("="*60)
        print(f"Start time: {time.strftime('%Y-%m-%d %H:%M:%S')}")
        X_train, X_test, y_train, y_test = self.prepare_data()
-        
+        y_train_transformed = self.transform_target(y_train)
-        print(f"\nTrain size: {len(X_train)}, Test size: {len(X_test)}")
+
-        
+        if 'random_forest' in self.enabled_models:
-        print("\n" + "="*60)
+            self.train_random_forest(X_train, y_train_transformed)
-        print("Training Models with Hyperparameter Optimization")
+        if 'gradient_boosting' in self.enabled_models:
-        print("="*60)
+            self.train_gradient_boosting(X_train, y_train_transformed)
-        
+        if 'extra_trees' in self.enabled_models:
-        self.train_random_forest(X_train, y_train)
+            self.train_extra_trees(X_train, y_train_transformed)
-        self.train_extra_trees(X_train, y_train)
+        if 'lightgbm' in self.enabled_models:
-        self.train_xgboost(X_train, y_train)
+            self.train_lightgbm(X_train, y_train_transformed)
-        self.train_lightgbm(X_train, y_train)
+        if 'xgboost' in self.enabled_models:
-        self.train_gradient_boosting(X_train, y_train)
+            self.train_xgboost(X_train, y_train_transformed)
-        self.train_stacking(X_train, y_train)
+
        print("\n" + "="*60)
        print("Evaluating Models on Test Set")
        print("="*60)
        best_r2 = -float('inf')
        best_model = None
        for name, model in self.models.items():
-            if model is not None:
+            metrics = self.evaluate_model(model, X_test, y_test)
-                metrics = self.evaluate_model(model, X_test, y_test)
+            self.model_metrics[name] = metrics
-                self.model_metrics[name] = metrics
+            print(f'  {name:20s} R2={metrics["r2"]:.4f} RMSE={metrics["rmse"]:.4f} MAE={metrics["mae"]:.4f}')
-                
+
-                status = "Good" if metrics['r2'] > 0.5 else ("OK" if metrics['r2'] > 0.3 else "Poor")
+        if 'lstm_mlp' in self.enabled_models and self.raw_train_df is not None and self.raw_test_df is not None:
-                status_icon = "✓" if status == "Good" else ("△" if status == "OK" else "✗")
+            deep_model_path = os.path.join(config.MODELS_DIR, 'lstm_mlp_model.pt')
-                print(f"  {status_icon} {name:20s} - R2: {metrics['r2']:.4f}, RMSE: {metrics['rmse']:.4f}, MAE: {metrics['mae']:.4f}")
+            deep_result = train_lstm_mlp(
-                
+                self.raw_train_df,
-                if metrics['r2'] > best_r2:
+                self.raw_test_df,
-                    best_r2 = metrics['r2']
+                deep_model_path,
-                    best_model = name
+                target_transform=self.target_transform,
-        
+            )
-        print(f"\n  ★ Best Model: {best_model} (R2 = {best_r2:.4f})")
+            if deep_result:
-        
+                self.model_metrics['lstm_mlp'] = deep_result['metrics']
-        print("\n" + "="*60)
+                self.training_metadata['deep_learning_available'] = True
-        print("Saving Models")
+                self.training_metadata.update(deep_result['metadata'])
-        print("="*60)
+                print(
                    f'  {"lstm_mlp":20s} R2={deep_result["metrics"]["r2"]:.4f} '
                    f'RMSE={deep_result["metrics"]["rmse"]:.4f} MAE={deep_result["metrics"]["mae"]:.4f}'
                )
        self.save_models()
        return self.model_metrics
 def train_and_save_models():
-    total_start = time.time()
+    start = time.time()
    trainer = OptimizedModelTrainer()
    metrics = trainer.train_all()
-    total_elapsed = time.time() - total_start
+    print(f'\nTraining Complete in {time.time() - start:.1f}s')
-    
+    for idx, (name, metric) in enumerate(sorted(metrics.items(), key=lambda item: item[1]['r2'], reverse=True), start=1):
-    print("\n" + "="*60)
+        print(f'{idx}. {name} - R2={metric["r2"]:.4f}')
    print("Training Complete!")
    print("="*60)
    print(f"Total training time: {total_elapsed:.1f}s ({total_elapsed/60:.1f} min)")
    print(f"End time: {time.strftime('%Y-%m-%d %H:%M:%S')}")
    print("\n" + "-"*60)
    print("Final Model Ranking (by R2)")
    print("-"*60)
    sorted_metrics = sorted(metrics.items(), key=lambda x: x[1]['r2'], reverse=True)
    for i, (name, m) in enumerate(sorted_metrics, 1):
        medal = "🥇" if i == 1 else ("🥈" if i == 2 else ("🥉" if i == 3 else "  "))
        print(f"  {medal} {i}. {name:20s} - R2: {m['r2']:.4f}, RMSE: {m['rmse']:.4f}")
    return metrics
--- a/backend/data/raw/Absenteeism_at_work.csv
+++ b/backend/data/raw/Absenteeism_at_work.csv
@@ -1,741 +0,0 @@
 ID;Reason for absence;Month of absence;Day of the week;Seasons;Transportation expense;Distance from Residence to Work;Service time;Age;Work load Average/day ;Hit target;Disciplinary failure;Education;Son;Social drinker;Social smoker;Pet;Weight;Height;Body mass index;Absenteeism time in hours
 11;26;7;3;1;289;36;13;33;239.554;97;0;1;2;1;0;1;90;172;30;4
 36;0;7;3;1;118;13;18;50;239.554;97;1;1;1;1;0;0;98;178;31;0
 3;23;7;4;1;179;51;18;38;239.554;97;0;1;0;1;0;0;89;170;31;2
 7;7;7;5;1;279;5;14;39;239.554;97;0;1;2;1;1;0;68;168;24;4
 11;23;7;5;1;289;36;13;33;239.554;97;0;1;2;1;0;1;90;172;30;2
 3;23;7;6;1;179;51;18;38;239.554;97;0;1;0;1;0;0;89;170;31;2
 10;22;7;6;1;361;52;3;28;239.554;97;0;1;1;1;0;4;80;172;27;8
 20;23;7;6;1;260;50;11;36;239.554;97;0;1;4;1;0;0;65;168;23;4
 14;19;7;2;1;155;12;14;34;239.554;97;0;1;2;1;0;0;95;196;25;40
 1;22;7;2;1;235;11;14;37;239.554;97;0;3;1;0;0;1;88;172;29;8
 20;1;7;2;1;260;50;11;36;239.554;97;0;1;4;1;0;0;65;168;23;8
 20;1;7;3;1;260;50;11;36;239.554;97;0;1;4;1;0;0;65;168;23;8
 20;11;7;4;1;260;50;11;36;239.554;97;0;1;4;1;0;0;65;168;23;8
 3;11;7;4;1;179;51;18;38;239.554;97;0;1;0;1;0;0;89;170;31;1
 3;23;7;4;1;179;51;18;38;239.554;97;0;1;0;1;0;0;89;170;31;4
 24;14;7;6;1;246;25;16;41;239.554;97;0;1;0;1;0;0;67;170;23;8
 3;23;7;6;1;179;51;18;38;239.554;97;0;1;0;1;0;0;89;170;31;2
 3;21;7;2;1;179;51;18;38;239.554;97;0;1;0;1;0;0;89;170;31;8
 6;11;7;5;1;189;29;13;33;239.554;97;0;1;2;0;0;2;69;167;25;8
 33;23;8;4;1;248;25;14;47;205.917;92;0;1;2;0;0;1;86;165;32;2
 18;10;8;4;1;330;16;4;28;205.917;92;0;2;0;0;0;0;84;182;25;8
 3;11;8;2;1;179;51;18;38;205.917;92;0;1;0;1;0;0;89;170;31;1
 10;13;8;2;1;361;52;3;28;205.917;92;0;1;1;1;0;4;80;172;27;40
 20;28;8;6;1;260;50;11;36;205.917;92;0;1;4;1;0;0;65;168;23;4
 11;18;8;2;1;289;36;13;33;205.917;92;0;1;2;1;0;1;90;172;30;8
 10;25;8;2;1;361;52;3;28;205.917;92;0;1;1;1;0;4;80;172;27;7
 11;23;8;3;1;289;36;13;33;205.917;92;0;1;2;1;0;1;90;172;30;1
 30;28;8;4;1;157;27;6;29;205.917;92;0;1;0;1;1;0;75;185;22;4
 11;18;8;4;1;289;36;13;33;205.917;92;0;1;2;1;0;1;90;172;30;8
 3;23;8;6;1;179;51;18;38;205.917;92;0;1;0;1;0;0;89;170;31;2
 3;18;8;2;1;179;51;18;38;205.917;92;0;1;0;1;0;0;89;170;31;8
 2;18;8;5;1;235;29;12;48;205.917;92;0;1;1;0;1;5;88;163;33;8
 1;23;8;5;1;235;11;14;37;205.917;92;0;3;1;0;0;1;88;172;29;4
 2;18;8;2;1;235;29;12;48;205.917;92;0;1;1;0;1;5;88;163;33;8
 3;23;8;2;1;179;51;18;38;205.917;92;0;1;0;1;0;0;89;170;31;2
 10;23;8;2;1;361;52;3;28;205.917;92;0;1;1;1;0;4;80;172;27;1
 11;24;8;3;1;289;36;13;33;205.917;92;0;1;2;1;0;1;90;172;30;8
 19;11;8;5;1;291;50;12;32;205.917;92;0;1;0;1;0;0;65;169;23;4
 2;28;8;6;1;235;29;12;48;205.917;92;0;1;1;0;1;5;88;163;33;8
 20;23;8;6;1;260;50;11;36;205.917;92;0;1;4;1;0;0;65;168;23;4
 27;23;9;3;1;184;42;7;27;241.476;92;0;1;0;0;0;0;58;167;21;2
 34;23;9;2;1;118;10;10;37;241.476;92;0;1;0;0;0;0;83;172;28;4
 3;23;9;3;1;179;51;18;38;241.476;92;0;1;0;1;0;0;89;170;31;4
 5;19;9;3;1;235;20;13;43;241.476;92;0;1;1;1;0;0;106;167;38;8
 14;23;9;4;1;155;12;14;34;241.476;92;0;1;2;1;0;0;95;196;25;2
 34;23;9;2;1;118;10;10;37;241.476;92;0;1;0;0;0;0;83;172;28;3
 3;23;9;3;1;179;51;18;38;241.476;92;0;1;0;1;0;0;89;170;31;3
 15;23;9;5;1;291;31;12;40;241.476;92;0;1;1;1;0;1;73;171;25;4
 20;22;9;6;1;260;50;11;36;241.476;92;0;1;4;1;0;0;65;168;23;8
 15;14;9;2;4;291;31;12;40;241.476;92;0;1;1;1;0;1;73;171;25;32
 20;0;9;2;4;260;50;11;36;241.476;92;1;1;4;1;0;0;65;168;23;0
 29;0;9;2;4;225;26;9;28;241.476;92;1;1;1;0;0;2;69;169;24;0
 28;23;9;3;4;225;26;9;28;241.476;92;0;1;1;0;0;2;69;169;24;2
 34;23;9;3;4;118;10;10;37;241.476;92;0;1;0;0;0;0;83;172;28;2
 11;0;9;3;4;289;36;13;33;241.476;92;1;1;2;1;0;1;90;172;30;0
 36;0;9;3;4;118;13;18;50;241.476;92;1;1;1;1;0;0;98;178;31;0
 28;18;9;4;4;225;26;9;28;241.476;92;0;1;1;0;0;2;69;169;24;3
 3;23;9;4;4;179;51;18;38;241.476;92;0;1;0;1;0;0;89;170;31;3
 13;0;9;4;4;369;17;12;31;241.476;92;1;1;3;1;0;0;70;169;25;0
 33;23;9;6;4;248;25;14;47;241.476;92;0;1;2;0;0;1;86;165;32;1
 3;23;9;6;4;179;51;18;38;241.476;92;0;1;0;1;0;0;89;170;31;3
 20;23;9;6;4;260;50;11;36;241.476;92;0;1;4;1;0;0;65;168;23;4
 3;23;10;3;4;179;51;18;38;253.465;93;0;1;0;1;0;0;89;170;31;3
 34;23;10;3;4;118;10;10;37;253.465;93;0;1;0;0;0;0;83;172;28;3
 36;0;10;4;4;118;13;18;50;253.465;93;1;1;1;1;0;0;98;178;31;0
 22;23;10;5;4;179;26;9;30;253.465;93;0;3;0;0;0;0;56;171;19;1
 3;23;10;6;4;179;51;18;38;253.465;93;0;1;0;1;0;0;89;170;31;3
 28;23;10;6;4;225;26;9;28;253.465;93;0;1;1;0;0;2;69;169;24;3
 34;23;10;3;4;118;10;10;37;253.465;93;0;1;0;0;0;0;83;172;28;3
 28;23;10;4;4;225;26;9;28;253.465;93;0;1;1;0;0;2;69;169;24;2
 33;23;10;4;4;248;25;14;47;253.465;93;0;1;2;0;0;1;86;165;32;2
 15;23;10;5;4;291;31;12;40;253.465;93;0;1;1;1;0;1;73;171;25;5
 3;23;10;4;4;179;51;18;38;253.465;93;0;1;0;1;0;0;89;170;31;8
 28;23;10;4;4;225;26;9;28;253.465;93;0;1;1;0;0;2;69;169;24;3
 20;19;10;5;4;260;50;11;36;253.465;93;0;1;4;1;0;0;65;168;23;16
 15;14;10;3;4;291;31;12;40;253.465;93;0;1;1;1;0;1;73;171;25;8
 28;28;10;3;4;225;26;9;28;253.465;93;0;1;1;0;0;2;69;169;24;2
 11;26;10;4;4;289;36;13;33;253.465;93;0;1;2;1;0;1;90;172;30;8
 10;23;10;6;4;361;52;3;28;253.465;93;0;1;1;1;0;4;80;172;27;1
 20;28;10;6;4;260;50;11;36;253.465;93;0;1;4;1;0;0;65;168;23;3
 3;23;11;5;4;179;51;18;38;306.345;93;0;1;0;1;0;0;89;170;31;1
 28;23;11;4;4;225;26;9;28;306.345;93;0;1;1;0;0;2;69;169;24;1
 3;13;11;5;4;179;51;18;38;306.345;93;0;1;0;1;0;0;89;170;31;8
 17;21;11;5;4;179;22;17;40;306.345;93;0;2;2;0;1;0;63;170;22;8
 15;23;11;5;4;291;31;12;40;306.345;93;0;1;1;1;0;1;73;171;25;5
 14;10;11;2;4;155;12;14;34;306.345;93;0;1;2;1;0;0;95;196;25;32
 6;22;11;2;4;189;29;13;33;306.345;93;0;1;2;0;0;2;69;167;25;8
 15;14;11;2;4;291;31;12;40;306.345;93;0;1;1;1;0;1;73;171;25;40
 28;23;11;4;4;225;26;9;28;306.345;93;0;1;1;0;0;2;69;169;24;1
 14;6;11;6;4;155;12;14;34;306.345;93;0;1;2;1;0;0;95;196;25;8
 28;23;11;4;4;225;26;9;28;306.345;93;0;1;1;0;0;2;69;169;24;3
 17;21;11;4;4;179;22;17;40;306.345;93;0;2;2;0;1;0;63;170;22;8
 28;13;11;6;4;225;26;9;28;306.345;93;0;1;1;0;0;2;69;169;24;3
 20;28;11;6;4;260;50;11;36;306.345;93;0;1;4;1;0;0;65;168;23;4
 33;28;11;2;4;248;25;14;47;306.345;93;0;1;2;0;0;1;86;165;32;1
 28;28;11;3;4;225;26;9;28;306.345;93;0;1;1;0;0;2;69;169;24;3
 11;7;11;4;4;289;36;13;33;306.345;93;0;1;2;1;0;1;90;172;30;24
 15;23;11;5;4;291;31;12;40;306.345;93;0;1;1;1;0;1;73;171;25;3
 33;23;12;3;4;248;25;14;47;261.306;97;0;1;2;0;0;1;86;165;32;1
 34;19;12;3;4;118;10;10;37;261.306;97;0;1;0;0;0;0;83;172;28;64
 36;23;12;4;4;118;13;18;50;261.306;97;0;1;1;1;0;0;98;178;31;2
 1;26;12;4;4;235;11;14;37;261.306;97;0;3;1;0;0;1;88;172;29;8
 28;23;12;5;4;225;26;9;28;261.306;97;0;1;1;0;0;2;69;169;24;2
 20;26;12;6;4;260;50;11;36;261.306;97;0;1;4;1;0;0;65;168;23;8
 34;19;12;3;4;118;10;10;37;261.306;97;0;1;0;0;0;0;83;172;28;56
 10;22;12;4;4;361;52;3;28;261.306;97;0;1;1;1;0;4;80;172;27;8
 28;28;12;5;4;225;26;9;28;261.306;97;0;1;1;0;0;2;69;169;24;3
 20;28;12;6;4;260;50;11;36;261.306;97;0;1;4;1;0;0;65;168;23;3
 28;23;12;3;4;225;26;9;28;261.306;97;0;1;1;0;0;2;69;169;24;2
 10;22;12;4;4;361;52;3;28;261.306;97;0;1;1;1;0;4;80;172;27;8
 34;27;12;6;4;118;10;10;37;261.306;97;0;1;0;0;0;0;83;172;28;2
 24;19;12;6;2;246;25;16;41;261.306;97;0;1;0;1;0;0;67;170;23;8
 28;23;12;6;2;225;26;9;28;261.306;97;0;1;1;0;0;2;69;169;24;2
 28;23;1;4;2;225;26;9;28;308.593;95;0;1;1;0;0;2;69;169;24;1
 34;19;1;2;2;118;10;10;37;308.593;95;0;1;0;0;0;0;83;172;28;1
 34;27;1;3;2;118;10;10;37;308.593;95;0;1;0;0;0;0;83;172;28;1
 14;18;1;3;2;155;12;14;34;308.593;95;0;1;2;1;0;0;95;196;25;8
 28;27;1;4;2;225;26;9;28;308.593;95;0;1;1;0;0;2;69;169;24;2
 27;23;1;5;2;184;42;7;27;308.593;95;0;1;0;0;0;0;58;167;21;2
 28;28;1;5;2;225;26;9;28;308.593;95;0;1;1;0;0;2;69;169;24;2
 28;27;1;6;2;225;26;9;28;308.593;95;0;1;1;0;0;2;69;169;24;1
 34;27;1;2;2;118;10;10;37;308.593;95;0;1;0;0;0;0;83;172;28;2
 28;27;1;3;2;225;26;9;28;308.593;95;0;1;1;0;0;2;69;169;24;2
 34;27;1;3;2;118;10;10;37;308.593;95;0;1;0;0;0;0;83;172;28;2
 34;27;1;4;2;118;10;10;37;308.593;95;0;1;0;0;0;0;83;172;28;2
 34;27;1;5;2;118;10;10;37;308.593;95;0;1;0;0;0;0;83;172;28;2
 34;27;1;6;2;118;10;10;37;308.593;95;0;1;0;0;0;0;83;172;28;2
 34;27;1;2;2;118;10;10;37;308.593;95;0;1;0;0;0;0;83;172;28;2
 34;27;1;3;2;118;10;10;37;308.593;95;0;1;0;0;0;0;83;172;28;2
 22;18;1;3;2;179;26;9;30;308.593;95;0;3;0;0;0;0;56;171;19;8
 11;18;1;3;2;289;36;13;33;308.593;95;0;1;2;1;0;1;90;172;30;8
 34;27;1;4;2;118;10;10;37;308.593;95;0;1;0;0;0;0;83;172;28;2
 27;23;1;5;2;184;42;7;27;308.593;95;0;1;0;0;0;0;58;167;21;2
 34;27;1;5;2;118;10;10;37;308.593;95;0;1;0;0;0;0;83;172;28;2
 34;27;1;2;2;118;10;10;37;308.593;95;0;1;0;0;0;0;83;172;28;0
 28;23;1;3;2;225;26;9;28;308.593;95;0;1;1;0;0;2;69;169;24;1
 11;22;1;5;2;289;36;13;33;308.593;95;0;1;2;1;0;1;90;172;30;3
 27;23;2;6;2;184;42;7;27;302.585;99;0;1;0;0;0;0;58;167;21;1
 24;1;2;4;2;246;25;16;41;302.585;99;0;1;0;1;0;0;67;170;23;8
 3;11;2;4;2;179;51;18;38;302.585;99;0;1;0;1;0;0;89;170;31;8
 14;28;2;5;2;155;12;14;34;302.585;99;0;1;2;1;0;0;95;196;25;2
 6;23;2;5;2;189;29;13;33;302.585;99;0;1;2;0;0;2;69;167;25;8
 20;28;2;6;2;260;50;11;36;302.585;99;0;1;4;1;0;0;65;168;23;2
 11;22;2;6;2;289;36;13;33;302.585;99;0;1;2;1;0;1;90;172;30;8
 31;11;2;2;2;388;15;9;50;302.585;99;0;1;0;0;0;0;76;178;24;8
 31;1;2;3;2;388;15;9;50;302.585;99;0;1;0;0;0;0;76;178;24;8
 28;28;2;2;2;225;26;9;28;302.585;99;0;1;1;0;0;2;69;169;24;2
 28;23;2;3;2;225;26;9;28;302.585;99;0;1;1;0;0;2;69;169;24;2
 22;23;2;3;2;179;26;9;30;302.585;99;0;3;0;0;0;0;56;171;19;1
 27;23;2;3;2;184;42;7;27;302.585;99;0;1;0;0;0;0;58;167;21;8
 28;25;2;5;2;225;26;9;28;302.585;99;0;1;1;0;0;2;69;169;24;3
 18;18;2;2;2;330;16;4;28;302.585;99;0;2;0;0;0;0;84;182;25;8
 18;23;2;3;2;330;16;4;28;302.585;99;0;2;0;0;0;0;84;182;25;1
 28;23;2;4;2;225;26;9;28;302.585;99;0;1;1;0;0;2;69;169;24;1
 6;19;2;5;2;189;29;13;33;302.585;99;0;1;2;0;0;2;69;167;25;8
 19;28;3;3;2;291;50;12;32;343.253;95;0;1;0;1;0;0;65;169;23;2
 20;19;3;3;2;260;50;11;36;343.253;95;0;1;4;1;0;0;65;168;23;8
 30;19;3;3;2;157;27;6;29;343.253;95;0;1;0;1;1;0;75;185;22;3
 17;17;3;3;2;179;22;17;40;343.253;95;0;2;2;0;1;0;63;170;22;8
 15;22;3;4;2;291;31;12;40;343.253;95;0;1;1;1;0;1;73;171;25;8
 20;13;3;4;2;260;50;11;36;343.253;95;0;1;4;1;0;0;65;168;23;8
 22;13;3;5;2;179;26;9;30;343.253;95;0;3;0;0;0;0;56;171;19;8
 33;14;3;6;2;248;25;14;47;343.253;95;0;1;2;0;0;1;86;165;32;3
 20;13;3;6;2;260;50;11;36;343.253;95;0;1;4;1;0;0;65;168;23;40
 17;11;3;2;2;179;22;17;40;343.253;95;0;2;2;0;1;0;63;170;22;40
 14;1;3;2;2;155;12;14;34;343.253;95;0;1;2;1;0;0;95;196;25;16
 20;26;3;2;2;260;50;11;36;343.253;95;0;1;4;1;0;0;65;168;23;16
 14;13;3;3;2;155;12;14;34;343.253;95;0;1;2;1;0;0;95;196;25;8
 11;6;3;5;2;289;36;13;33;343.253;95;0;1;2;1;0;1;90;172;30;8
 17;8;3;5;2;179;22;17;40;343.253;95;0;2;2;0;1;0;63;170;22;8
 20;28;3;6;2;260;50;11;36;343.253;95;0;1;4;1;0;0;65;168;23;4
 28;23;3;6;2;225;26;9;28;343.253;95;0;1;1;0;0;2;69;169;24;1
 7;14;3;2;2;279;5;14;39;343.253;95;0;1;2;1;1;0;68;168;24;8
 3;13;3;3;2;179;51;18;38;343.253;95;0;1;0;1;0;0;89;170;31;24
 28;23;3;4;2;225;26;9;28;343.253;95;0;1;1;0;0;2;69;169;24;2
 28;11;3;2;3;225;26;9;28;343.253;95;0;1;1;0;0;2;69;169;24;8
 22;13;3;2;3;179;26;9;30;343.253;95;0;3;0;0;0;0;56;171;19;1
 28;11;3;3;3;225;26;9;28;343.253;95;0;1;1;0;0;2;69;169;24;8
 28;11;3;4;3;225;26;9;28;343.253;95;0;1;1;0;0;2;69;169;24;16
 3;13;3;4;3;179;51;18;38;343.253;95;0;1;0;1;0;0;89;170;31;3
 7;14;3;5;3;279;5;14;39;343.253;95;0;1;2;1;1;0;68;168;24;16
 28;28;3;6;3;225;26;9;28;343.253;95;0;1;1;0;0;2;69;169;24;2
 33;14;3;6;3;248;25;14;47;343.253;95;0;1;2;0;0;1;86;165;32;3
 28;28;3;2;3;225;26;9;28;343.253;95;0;1;1;0;0;2;69;169;24;1
 15;28;4;4;3;291;31;12;40;326.452;96;0;1;1;1;0;1;73;171;25;1
 28;23;4;4;3;225;26;9;28;326.452;96;0;1;1;0;0;2;69;169;24;1
 14;28;4;3;3;155;12;14;34;326.452;96;0;1;2;1;0;0;95;196;25;1
 24;13;4;4;3;246;25;16;41;326.452;96;0;1;0;1;0;0;67;170;23;24
 14;23;4;5;3;155;12;14;34;326.452;96;0;1;2;1;0;0;95;196;25;1
 28;28;4;6;3;225;26;9;28;326.452;96;0;1;1;0;0;2;69;169;24;2
 20;28;4;6;3;260;50;11;36;326.452;96;0;1;4;1;0;0;65;168;23;4
 3;13;4;4;3;179;51;18;38;326.452;96;0;1;0;1;0;0;89;170;31;24
 36;23;4;5;3;118;13;18;50;326.452;96;0;1;1;1;0;0;98;178;31;1
 15;23;4;6;3;291;31;12;40;326.452;96;0;1;1;1;0;1;73;171;25;3
 24;14;4;6;3;246;25;16;41;326.452;96;0;1;0;1;0;0;67;170;23;8
 15;28;4;6;3;291;31;12;40;326.452;96;0;1;1;1;0;1;73;171;25;1
 33;28;4;6;3;248;25;14;47;326.452;96;0;1;2;0;0;1;86;165;32;8
 20;19;4;6;3;260;50;11;36;326.452;96;0;1;4;1;0;0;65;168;23;56
 11;19;4;3;3;289;36;13;33;326.452;96;0;1;2;1;0;1;90;172;30;8
 14;12;4;4;3;155;12;14;34;326.452;96;0;1;2;1;0;0;95;196;25;24
 23;19;4;4;3;378;49;11;36;326.452;96;0;1;2;0;1;4;65;174;21;8
 11;13;4;5;3;289;36;13;33;326.452;96;0;1;2;1;0;1;90;172;30;16
 1;7;4;6;3;235;11;14;37;326.452;96;0;3;1;0;0;1;88;172;29;3
 2;0;4;2;3;235;29;12;48;326.452;96;1;1;1;0;1;5;88;163;33;0
 11;13;5;4;3;289;36;13;33;378.884;92;0;1;2;1;0;1;90;172;30;8
 14;28;5;5;3;155;12;14;34;378.884;92;0;1;2;1;0;0;95;196;25;2
 14;28;5;2;3;155;12;14;34;378.884;92;0;1;2;1;0;0;95;196;25;1
 3;18;5;3;3;179;51;18;38;378.884;92;0;1;0;1;0;0;89;170;31;8
 28;19;5;3;3;225;26;9;28;378.884;92;0;1;1;0;0;2;69;169;24;8
 27;7;5;4;3;184;42;7;27;378.884;92;0;1;0;0;0;0;58;167;21;4
 14;28;5;2;3;155;12;14;34;378.884;92;0;1;2;1;0;0;95;196;25;2
 3;12;5;3;3;179;51;18;38;378.884;92;0;1;0;1;0;0;89;170;31;1
 11;13;5;4;3;289;36;13;33;378.884;92;0;1;2;1;0;1;90;172;30;24
 7;0;5;4;3;279;5;14;39;378.884;92;1;1;2;1;1;0;68;168;24;0
 18;0;5;4;3;330;16;4;28;378.884;92;1;2;0;0;0;0;84;182;25;0
 23;0;5;4;3;378;49;11;36;378.884;92;1;1;2;0;1;4;65;174;21;0
 31;0;5;4;3;388;15;9;50;378.884;92;1;1;0;0;0;0;76;178;24;0
 3;11;5;3;3;179;51;18;38;378.884;92;0;1;0;1;0;0;89;170;31;1
 36;13;5;4;3;118;13;18;50;378.884;92;0;1;1;1;0;0;98;178;31;24
 10;22;5;6;3;361;52;3;28;378.884;92;0;1;1;1;0;4;80;172;27;8
 24;19;6;2;3;246;25;16;41;377.550;94;0;1;0;1;0;0;67;170;23;8
 10;22;6;2;3;361;52;3;28;377.550;94;0;1;1;1;0;4;80;172;27;8
 24;10;6;3;3;246;25;16;41;377.550;94;0;1;0;1;0;0;67;170;23;24
 15;23;6;5;3;291;31;12;40;377.550;94;0;1;1;1;0;1;73;171;25;4
 24;10;6;6;3;246;25;16;41;377.550;94;0;1;0;1;0;0;67;170;23;8
 3;11;6;2;3;179;51;18;38;377.550;94;0;1;0;1;0;0;89;170;31;8
 14;23;6;2;3;155;12;14;34;377.550;94;0;1;2;1;0;0;95;196;25;4
 24;10;6;2;3;246;25;16;41;377.550;94;0;1;0;1;0;0;67;170;23;8
 36;13;6;4;3;118;13;18;50;377.550;94;0;1;1;1;0;0;98;178;31;8
 1;13;6;6;3;235;11;14;37;377.550;94;0;3;1;0;0;1;88;172;29;16
 36;23;6;3;3;118;13;18;50;377.550;94;0;1;1;1;0;0;98;178;31;1
 36;13;6;4;3;118;13;18;50;377.550;94;0;1;1;1;0;0;98;178;31;80
 23;22;6;5;3;378;49;11;36;377.550;94;0;1;2;0;1;4;65;174;21;8
 3;11;6;6;3;179;51;18;38;377.550;94;0;1;0;1;0;0;89;170;31;2
 32;28;6;2;1;289;48;29;49;377.550;94;0;1;0;0;0;2;108;172;36;2
 28;28;6;5;1;225;26;9;28;377.550;94;0;1;1;0;0;2;69;169;24;2
 14;19;7;3;1;155;12;14;34;275.312;98;0;1;2;1;0;0;95;196;25;16
 36;1;7;4;1;118;13;18;50;275.312;98;0;1;1;1;0;0;98;178;31;8
 34;5;7;6;1;118;10;10;37;275.312;98;0;1;0;0;0;0;83;172;28;8
 34;26;7;6;1;118;10;10;37;275.312;98;0;1;0;0;0;0;83;172;28;4
 18;26;7;3;1;330;16;4;28;275.312;98;0;2;0;0;0;0;84;182;25;8
 22;18;7;5;1;179;26;9;30;275.312;98;0;3;0;0;0;0;56;171;19;8
 14;25;7;6;1;155;12;14;34;275.312;98;0;1;2;1;0;0;95;196;25;2
 18;1;7;2;1;330;16;4;28;275.312;98;0;2;0;0;0;0;84;182;25;8
 18;1;7;3;1;330;16;4;28;275.312;98;0;2;0;0;0;0;84;182;25;8
 30;25;7;2;1;157;27;6;29;275.312;98;0;1;0;1;1;0;75;185;22;3
 10;22;7;3;1;361;52;3;28;275.312;98;0;1;1;1;0;4;80;172;27;8
 11;26;7;4;1;289;36;13;33;275.312;98;0;1;2;1;0;1;90;172;30;8
 3;26;7;5;1;179;51;18;38;275.312;98;0;1;0;1;0;0;89;170;31;8
 11;19;7;2;1;289;36;13;33;275.312;98;0;1;2;1;0;1;90;172;30;32
 11;19;7;5;1;289;36;13;33;275.312;98;0;1;2;1;0;1;90;172;30;8
 20;0;7;5;1;260;50;11;36;275.312;98;1;1;4;1;0;0;65;168;23;0
 11;19;8;6;1;289;36;13;33;265.615;94;0;1;2;1;0;1;90;172;30;8
 30;19;8;6;1;157;27;6;29;265.615;94;0;1;0;1;1;0;75;185;22;3
 11;23;8;2;1;289;36;13;33;265.615;94;0;1;2;1;0;1;90;172;30;1
 9;18;8;3;1;228;14;16;58;265.615;94;0;1;2;0;0;1;65;172;22;8
 26;13;8;5;1;300;26;13;43;265.615;94;0;1;2;1;1;1;77;175;25;1
 26;14;8;5;1;300;26;13;43;265.615;94;0;1;2;1;1;1;77;175;25;2
 20;28;8;6;1;260;50;11;36;265.615;94;0;1;4;1;0;0;65;168;23;4
 11;23;8;3;1;289;36;13;33;265.615;94;0;1;2;1;0;1;90;172;30;4
 33;23;8;4;1;248;25;14;47;265.615;94;0;1;2;0;0;1;86;165;32;1
 21;11;8;5;1;268;11;8;33;265.615;94;0;2;0;0;0;0;79;178;25;8
 22;23;8;5;1;179;26;9;30;265.615;94;0;3;0;0;0;0;56;171;19;1
 36;13;8;5;1;118;13;18;50;265.615;94;0;1;1;1;0;0;98;178;31;3
 33;25;8;2;1;248;25;14;47;265.615;94;0;1;2;0;0;1;86;165;32;2
 1;23;8;3;1;235;11;14;37;265.615;94;0;3;1;0;0;1;88;172;29;1
 36;23;8;5;1;118;13;18;50;265.615;94;0;1;1;1;0;0;98;178;31;1
 1;19;8;5;1;235;11;14;37;265.615;94;0;3;1;0;0;1;88;172;29;8
 10;8;8;3;1;361;52;3;28;265.615;94;0;1;1;1;0;4;80;172;27;8
 27;6;8;4;1;184;42;7;27;265.615;94;0;1;0;0;0;0;58;167;21;8
 3;11;9;2;1;179;51;18;38;294.217;81;0;1;0;1;0;0;89;170;31;8
 3;23;9;6;1;179;51;18;38;294.217;81;0;1;0;1;0;0;89;170;31;3
 11;19;9;4;1;289;36;13;33;294.217;81;0;1;2;1;0;1;90;172;30;24
 5;0;9;5;1;235;20;13;43;294.217;81;1;1;1;1;0;0;106;167;38;0
 24;9;9;2;1;246;25;16;41;294.217;81;0;1;0;1;0;0;67;170;23;16
 15;28;9;3;1;291;31;12;40;294.217;81;0;1;1;1;0;1;73;171;25;3
 8;0;9;3;1;231;35;14;39;294.217;81;1;1;2;1;0;2;100;170;35;0
 19;0;9;3;1;291;50;12;32;294.217;81;1;1;0;1;0;0;65;169;23;0
 3;13;9;4;1;179;51;18;38;294.217;81;0;1;0;1;0;0;89;170;31;8
 24;9;9;4;1;246;25;16;41;294.217;81;0;1;0;1;0;0;67;170;23;32
 3;23;9;5;1;179;51;18;38;294.217;81;0;1;0;1;0;0;89;170;31;1
 15;28;9;6;1;291;31;12;40;294.217;81;0;1;1;1;0;1;73;171;25;4
 20;28;9;6;1;260;50;11;36;294.217;81;0;1;4;1;0;0;65;168;23;4
 5;26;9;4;4;235;20;13;43;294.217;81;0;1;1;1;0;0;106;167;38;8
 36;28;9;5;4;118;13;18;50;294.217;81;0;1;1;1;0;0;98;178;31;1
 5;0;9;5;4;235;20;13;43;294.217;81;1;1;1;1;0;0;106;167;38;0
 15;28;9;6;4;291;31;12;40;294.217;81;0;1;1;1;0;1;73;171;25;3
 15;7;9;2;4;291;31;12;40;294.217;81;0;1;1;1;0;1;73;171;25;40
 3;13;9;2;4;179;51;18;38;294.217;81;0;1;0;1;0;0;89;170;31;8
 11;24;10;2;4;289;36;13;33;265.017;88;0;1;2;1;0;1;90;172;30;8
 1;26;10;2;4;235;11;14;37;265.017;88;0;3;1;0;0;1;88;172;29;4
 11;26;10;2;4;289;36;13;33;265.017;88;0;1;2;1;0;1;90;172;30;8
 11;22;10;6;4;289;36;13;33;265.017;88;0;1;2;1;0;1;90;172;30;8
 36;0;10;6;4;118;13;18;50;265.017;88;1;1;1;1;0;0;98;178;31;0
 33;0;10;6;4;248;25;14;47;265.017;88;1;1;2;0;0;1;86;165;32;0
 22;1;10;2;4;179;26;9;30;265.017;88;0;3;0;0;0;0;56;171;19;8
 34;7;10;2;4;118;10;10;37;265.017;88;0;1;0;0;0;0;83;172;28;3
 13;22;10;2;4;369;17;12;31;265.017;88;0;1;3;1;0;0;70;169;25;8
 3;28;10;4;4;179;51;18;38;265.017;88;0;1;0;1;0;0;89;170;31;1
 22;1;10;4;4;179;26;9;30;265.017;88;0;3;0;0;0;0;56;171;19;64
 5;0;10;4;4;235;20;13;43;265.017;88;1;1;1;1;0;0;106;167;38;0
 11;19;10;5;4;289;36;13;33;265.017;88;0;1;2;1;0;1;90;172;30;16
 20;28;10;6;4;260;50;11;36;265.017;88;0;1;4;1;0;0;65;168;23;3
 5;0;10;6;4;235;20;13;43;265.017;88;1;1;1;1;0;0;106;167;38;0
 5;23;10;2;4;235;20;13;43;265.017;88;0;1;1;1;0;0;106;167;38;2
 5;23;10;2;4;235;20;13;43;265.017;88;0;1;1;1;0;0;106;167;38;2
 36;28;10;3;4;118;13;18;50;265.017;88;0;1;1;1;0;0;98;178;31;1
 15;28;10;3;4;291;31;12;40;265.017;88;0;1;1;1;0;1;73;171;25;4
 22;23;10;5;4;179;26;9;30;265.017;88;0;3;0;0;0;0;56;171;19;16
 36;28;10;5;4;118;13;18;50;265.017;88;0;1;1;1;0;0;98;178;31;1
 10;10;10;2;4;361;52;3;28;265.017;88;0;1;1;1;0;4;80;172;27;8
 20;0;10;3;4;260;50;11;36;265.017;88;1;1;4;1;0;0;65;168;23;0
 15;0;10;3;4;291;31;12;40;265.017;88;1;1;1;1;0;1;73;171;25;0
 30;0;10;3;4;157;27;6;29;265.017;88;1;1;0;1;1;0;75;185;22;0
 22;1;10;4;4;179;26;9;30;265.017;88;0;3;0;0;0;0;56;171;19;5
 22;7;10;4;4;179;26;9;30;265.017;88;0;3;0;0;0;0;56;171;19;5
 36;23;10;5;4;118;13;18;50;265.017;88;0;1;1;1;0;0;98;178;31;1
 34;11;11;2;4;118;10;10;37;284.031;97;0;1;0;0;0;0;83;172;28;8
 33;23;11;2;4;248;25;14;47;284.031;97;0;1;2;0;0;1;86;165;32;2
 3;6;11;3;4;179;51;18;38;284.031;97;0;1;0;1;0;0;89;170;31;8
 20;28;11;6;4;260;50;11;36;284.031;97;0;1;4;1;0;0;65;168;23;3
 15;23;11;2;4;291;31;12;40;284.031;97;0;1;1;1;0;1;73;171;25;1
 23;1;11;2;4;378;49;11;36;284.031;97;0;1;2;0;1;4;65;174;21;8
 14;11;11;2;4;155;12;14;34;284.031;97;0;1;2;1;0;0;95;196;25;120
 5;26;11;2;4;235;20;13;43;284.031;97;0;1;1;1;0;0;106;167;38;8
 18;0;11;3;4;330;16;4;28;284.031;97;1;2;0;0;0;0;84;182;25;0
 1;18;11;4;4;235;11;14;37;284.031;97;0;3;1;0;0;1;88;172;29;1
 34;11;11;4;4;118;10;10;37;284.031;97;0;1;0;0;0;0;83;172;28;3
 1;25;11;5;4;235;11;14;37;284.031;97;0;3;1;0;0;1;88;172;29;2
 3;28;11;5;4;179;51;18;38;284.031;97;0;1;0;1;0;0;89;170;31;3
 24;13;11;6;4;246;25;16;41;284.031;97;0;1;0;1;0;0;67;170;23;8
 15;12;11;6;4;291;31;12;40;284.031;97;0;1;1;1;0;1;73;171;25;4
 24;13;11;2;4;246;25;16;41;284.031;97;0;1;0;1;0;0;67;170;23;8
 3;28;11;3;4;179;51;18;38;284.031;97;0;1;0;1;0;0;89;170;31;1
 20;10;11;4;4;260;50;11;36;284.031;97;0;1;4;1;0;0;65;168;23;8
 20;15;11;6;4;260;50;11;36;284.031;97;0;1;4;1;0;0;65;168;23;8
 23;0;11;6;4;378;49;11;36;284.031;97;1;1;2;0;1;4;65;174;21;0
 7;0;11;3;4;279;5;14;39;284.031;97;1;1;2;1;1;0;68;168;24;0
 3;23;11;5;4;179;51;18;38;284.031;97;0;1;0;1;0;0;89;170;31;1
 28;12;12;2;4;225;26;9;28;236.629;93;0;1;1;0;0;2;69;169;24;3
 3;28;12;2;4;179;51;18;38;236.629;93;0;1;0;1;0;0;89;170;31;2
 3;28;12;2;4;179;51;18;38;236.629;93;0;1;0;1;0;0;89;170;31;1
 1;23;12;2;4;235;11;14;37;236.629;93;0;3;1;0;0;1;88;172;29;3
 36;28;12;3;4;118;13;18;50;236.629;93;0;1;1;1;0;0;98;178;31;1
 20;28;12;6;4;260;50;11;36;236.629;93;0;1;4;1;0;0;65;168;23;4
 24;4;12;5;4;246;25;16;41;236.629;93;0;1;0;1;0;0;67;170;23;8
 3;28;12;5;4;179;51;18;38;236.629;93;0;1;0;1;0;0;89;170;31;1
 3;28;12;6;4;179;51;18;38;236.629;93;0;1;0;1;0;0;89;170;31;1
 22;23;12;3;4;179;26;9;30;236.629;93;0;3;0;0;0;0;56;171;19;1
 34;25;12;3;4;118;10;10;37;236.629;93;0;1;0;0;0;0;83;172;28;8
 1;25;12;5;4;235;11;14;37;236.629;93;0;3;1;0;0;1;88;172;29;2
 3;28;12;6;4;179;51;18;38;236.629;93;0;1;0;1;0;0;89;170;31;1
 5;13;12;3;2;235;20;13;43;236.629;93;0;1;1;1;0;0;106;167;38;8
 1;14;12;3;2;235;11;14;37;236.629;93;0;3;1;0;0;1;88;172;29;4
 20;26;12;4;2;260;50;11;36;236.629;93;0;1;4;1;0;0;65;168;23;8
 30;28;12;2;2;157;27;6;29;236.629;93;0;1;0;1;1;0;75;185;22;2
 3;28;12;2;2;179;51;18;38;236.629;93;0;1;0;1;0;0;89;170;31;3
 11;19;12;2;2;289;36;13;33;236.629;93;0;1;2;1;0;1;90;172;30;8
 28;23;1;4;2;225;26;9;28;330.061;100;0;1;1;0;0;2;69;169;24;5
 34;19;1;2;2;118;10;10;37;330.061;100;0;1;0;0;0;0;83;172;28;32
 14;23;1;2;2;155;12;14;34;330.061;100;0;1;2;1;0;0;95;196;25;2
 1;13;1;3;2;235;11;14;37;330.061;100;0;3;1;0;0;1;88;172;29;1
 14;23;1;3;2;155;12;14;34;330.061;100;0;1;2;1;0;0;95;196;25;4
 11;26;1;2;2;289;36;13;33;330.061;100;0;1;2;1;0;1;90;172;30;8
 15;3;1;4;2;291;31;12;40;330.061;100;0;1;1;1;0;1;73;171;25;8
 5;26;1;2;2;235;20;13;43;330.061;100;0;1;1;1;0;0;106;167;38;8
 36;26;1;2;2;118;13;18;50;330.061;100;0;1;1;1;0;0;98;178;31;4
 3;28;1;4;2;179;51;18;38;330.061;100;0;1;0;1;0;0;89;170;31;1
 3;28;1;6;2;179;51;18;38;330.061;100;0;1;0;1;0;0;89;170;31;1
 34;28;2;3;2;118;10;10;37;251.818;96;0;1;0;0;0;0;83;172;28;2
 3;27;2;4;2;179;51;18;38;251.818;96;0;1;0;1;0;0;89;170;31;3
 28;7;2;4;2;225;26;9;28;251.818;96;0;1;1;0;0;2;69;169;24;1
 11;22;2;6;2;289;36;13;33;251.818;96;0;1;2;1;0;1;90;172;30;3
 20;28;2;6;2;260;50;11;36;251.818;96;0;1;4;1;0;0;65;168;23;3
 3;23;2;6;2;179;51;18;38;251.818;96;0;1;0;1;0;0;89;170;31;3
 3;27;2;2;2;179;51;18;38;251.818;96;0;1;0;1;0;0;89;170;31;2
 3;27;2;4;2;179;51;18;38;251.818;96;0;1;0;1;0;0;89;170;31;3
 3;10;2;5;2;179;51;18;38;251.818;96;0;1;0;1;0;0;89;170;31;8
 24;26;2;5;2;246;25;16;41;251.818;96;0;1;0;1;0;0;67;170;23;8
 3;27;2;6;2;179;51;18;38;251.818;96;0;1;0;1;0;0;89;170;31;3
 6;22;2;2;2;189;29;13;33;251.818;96;0;1;2;0;0;2;69;167;25;8
 3;27;2;2;2;179;51;18;38;251.818;96;0;1;0;1;0;0;89;170;31;3
 24;23;2;3;2;246;25;16;41;251.818;96;0;1;0;1;0;0;67;170;23;2
 15;23;2;3;2;291;31;12;40;251.818;96;0;1;1;1;0;1;73;171;25;2
 30;11;2;4;2;157;27;6;29;251.818;96;0;1;0;1;1;0;75;185;22;16
 3;27;2;4;2;179;51;18;38;251.818;96;0;1;0;1;0;0;89;170;31;3
 3;27;2;6;2;179;51;18;38;251.818;96;0;1;0;1;0;0;89;170;31;3
 24;10;2;6;2;246;25;16;41;251.818;96;0;1;0;1;0;0;67;170;23;24
 3;27;2;4;2;179;51;18;38;251.818;96;0;1;0;1;0;0;89;170;31;3
 3;27;2;6;2;179;51;18;38;251.818;96;0;1;0;1;0;0;89;170;31;3
 34;18;3;3;2;118;10;10;37;244.387;98;0;1;0;0;0;0;83;172;28;8
 24;19;3;4;2;246;25;16;41;244.387;98;0;1;0;1;0;0;67;170;23;16
 24;28;3;6;2;246;25;16;41;244.387;98;0;1;0;1;0;0;67;170;23;2
 20;28;3;6;2;260;50;11;36;244.387;98;0;1;4;1;0;0;65;168;23;4
 3;28;3;2;2;179;51;18;38;244.387;98;0;1;0;1;0;0;89;170;31;2
 1;22;3;2;2;235;11;14;37;244.387;98;0;3;1;0;0;1;88;172;29;8
 17;22;3;3;2;179;22;17;40;244.387;98;0;2;2;0;1;0;63;170;22;8
 23;22;3;3;2;378;49;11;36;244.387;98;0;1;2;0;1;4;65;174;21;8
 3;28;3;2;2;179;51;18;38;244.387;98;0;1;0;1;0;0;89;170;31;16
 10;22;3;4;2;361;52;3;28;244.387;98;0;1;1;1;0;4;80;172;27;8
 13;0;3;4;2;369;17;12;31;244.387;98;1;1;3;1;0;0;70;169;25;0
 1;21;3;5;2;235;11;14;37;244.387;98;0;3;1;0;0;1;88;172;29;8
 36;23;3;6;3;118;13;18;50;244.387;98;0;1;1;1;0;0;98;178;31;2
 36;14;3;3;3;118;13;18;50;244.387;98;0;1;1;1;0;0;98;178;31;3
 36;13;3;4;3;118;13;18;50;244.387;98;0;1;1;1;0;0;98;178;31;8
 1;0;3;5;3;235;11;14;37;244.387;98;1;3;1;0;0;1;88;172;29;0
 24;0;3;5;3;246;25;16;41;244.387;98;1;1;0;1;0;0;67;170;23;0
 36;0;3;5;3;118;13;18;50;244.387;98;1;1;1;1;0;0;98;178;31;0
 3;28;3;6;3;179;51;18;38;244.387;98;0;1;0;1;0;0;89;170;31;8
 11;22;3;6;3;289;36;13;33;244.387;98;0;1;2;1;0;1;90;172;30;8
 20;19;3;2;3;260;50;11;36;244.387;98;0;1;4;1;0;0;65;168;23;8
 24;28;3;3;3;246;25;16;41;244.387;98;0;1;0;1;0;0;67;170;23;2
 3;28;4;4;3;179;51;18;38;239.409;98;0;1;0;1;0;0;89;170;31;4
 20;28;4;6;3;260;50;11;36;239.409;98;0;1;4;1;0;0;65;168;23;3
 18;26;4;6;3;330;16;4;28;239.409;98;0;2;0;0;0;0;84;182;25;4
 13;22;4;2;3;369;17;12;31;239.409;98;0;1;3;1;0;0;70;169;25;4
 33;26;4;2;3;248;25;14;47;239.409;98;0;1;2;0;0;1;86;165;32;4
 18;23;4;4;3;330;16;4;28;239.409;98;0;2;0;0;0;0;84;182;25;8
 3;28;4;4;3;179;51;18;38;239.409;98;0;1;0;1;0;0;89;170;31;8
 36;23;4;2;3;118;13;18;50;239.409;98;0;1;1;1;0;0;98;178;31;1
 36;13;4;4;3;118;13;18;50;239.409;98;0;1;1;1;0;0;98;178;31;120
 26;28;4;6;3;300;26;13;43;239.409;98;0;1;2;1;1;1;77;175;25;8
 20;28;4;6;3;260;50;11;36;239.409;98;0;1;4;1;0;0;65;168;23;4
 3;28;4;2;3;179;51;18;38;239.409;98;0;1;0;1;0;0;89;170;31;4
 34;11;4;4;3;118;10;10;37;239.409;98;0;1;0;0;0;0;83;172;28;2
 5;13;5;2;3;235;20;13;43;246.074;99;0;1;1;1;0;0;106;167;38;16
 33;23;5;4;3;248;25;14;47;246.074;99;0;1;2;0;0;1;86;165;32;2
 13;10;5;2;3;369;17;12;31;246.074;99;0;1;3;1;0;0;70;169;25;8
 22;23;5;4;3;179;26;9;30;246.074;99;0;3;0;0;0;0;56;171;19;3
 3;28;5;4;3;179;51;18;38;246.074;99;0;1;0;1;0;0;89;170;31;4
 10;23;5;5;3;361;52;3;28;246.074;99;0;1;1;1;0;4;80;172;27;1
 20;28;5;6;3;260;50;11;36;246.074;99;0;1;4;1;0;0;65;168;23;3
 17;11;5;2;3;179;22;17;40;246.074;99;0;2;2;0;1;0;63;170;22;2
 17;8;5;2;3;179;22;17;40;246.074;99;0;2;2;0;1;0;63;170;22;3
 9;18;5;4;3;228;14;16;58;246.074;99;0;1;2;0;0;1;65;172;22;8
 28;25;5;4;3;225;26;9;28;246.074;99;0;1;1;0;0;2;69;169;24;3
 18;13;5;6;3;330;16;4;28;246.074;99;0;2;0;0;0;0;84;182;25;8
 22;25;5;2;3;179;26;9;30;246.074;99;0;3;0;0;0;0;56;171;19;2
 34;28;5;2;3;118;10;10;37;246.074;99;0;1;0;0;0;0;83;172;28;1
 1;1;5;2;3;235;11;14;37;246.074;99;0;3;1;0;0;1;88;172;29;8
 22;23;5;4;3;179;26;9;30;246.074;99;0;3;0;0;0;0;56;171;19;3
 34;23;6;2;3;118;10;10;37;253.957;95;0;1;0;0;0;0;83;172;28;3
 3;28;6;2;3;179;51;18;38;253.957;95;0;1;0;1;0;0;89;170;31;3
 34;28;6;3;3;118;10;10;37;253.957;95;0;1;0;0;0;0;83;172;28;2
 28;23;6;5;3;225;26;9;28;253.957;95;0;1;1;0;0;2;69;169;24;4
 20;28;6;6;3;260;50;11;36;253.957;95;0;1;4;1;0;0;65;168;23;4
 3;0;6;6;3;179;51;18;38;253.957;95;1;1;0;1;0;0;89;170;31;0
 15;13;6;2;3;291;31;12;40;253.957;95;0;1;1;1;0;1;73;171;25;40
 3;28;6;2;3;179;51;18;38;253.957;95;0;1;0;1;0;0;89;170;31;24
 24;28;6;3;3;246;25;16;41;253.957;95;0;1;0;1;0;0;67;170;23;3
 3;28;6;2;3;179;51;18;38;253.957;95;0;1;0;1;0;0;89;170;31;4
 5;26;6;3;3;235;20;13;43;253.957;95;0;1;1;1;0;0;106;167;38;8
 3;28;6;2;1;179;51;18;38;253.957;95;0;1;0;1;0;0;89;170;31;2
 28;23;6;4;1;225;26;9;28;253.957;95;0;1;1;0;0;2;69;169;24;2
 36;23;6;4;1;118;13;18;50;253.957;95;0;1;1;1;0;0;98;178;31;2
 3;5;6;4;1;179;51;18;38;253.957;95;0;1;0;1;0;0;89;170;31;8
 22;21;6;4;1;179;26;9;30;253.957;95;0;3;0;0;0;0;56;171;19;2
 24;28;6;6;1;246;25;16;41;253.957;95;0;1;0;1;0;0;67;170;23;2
 18;11;6;3;1;330;16;4;28;253.957;95;0;2;0;0;0;0;84;182;25;1
 1;13;6;3;1;235;11;14;37;253.957;95;0;3;1;0;0;1;88;172;29;8
 22;23;7;5;1;179;26;9;30;230.290;92;0;3;0;0;0;0;56;171;19;2
 28;25;7;5;1;225;26;9;28;230.290;92;0;1;1;0;0;2;69;169;24;4
 20;13;7;6;1;260;50;11;36;230.290;92;0;1;4;1;0;0;65;168;23;8
 21;7;7;2;1;268;11;8;33;230.290;92;0;2;0;0;0;0;79;178;25;8
 18;25;7;6;1;330;16;4;28;230.290;92;0;2;0;0;0;0;84;182;25;8
 34;26;7;6;1;118;10;10;37;230.290;92;0;1;0;0;0;0;83;172;28;8
 20;26;7;2;1;260;50;11;36;230.290;92;0;1;4;1;0;0;65;168;23;4
 34;28;7;3;1;118;10;10;37;230.290;92;0;1;0;0;0;0;83;172;28;8
 26;15;7;2;1;300;26;13;43;230.290;92;0;1;2;1;1;1;77;175;25;8
 2;23;7;2;1;235;29;12;48;230.290;92;0;1;1;0;1;5;88;163;33;1
 24;28;7;3;1;246;25;16;41;230.290;92;0;1;0;1;0;0;67;170;23;2
 28;9;7;3;1;225;26;9;28;230.290;92;0;1;1;0;0;2;69;169;24;112
 3;28;7;3;1;179;51;18;38;230.290;92;0;1;0;1;0;0;89;170;31;1
 36;23;7;6;1;118;13;18;50;230.290;92;0;1;1;1;0;0;98;178;31;1
 10;22;7;6;1;361;52;3;28;230.290;92;0;1;1;1;0;4;80;172;27;8
 11;22;7;2;1;289;36;13;33;230.290;92;0;1;2;1;0;1;90;172;30;8
 5;26;7;2;1;235;20;13;43;230.290;92;0;1;1;1;0;0;106;167;38;8
 24;28;7;3;1;246;25;16;41;230.290;92;0;1;0;1;0;0;67;170;23;2
 15;28;7;5;1;291;31;12;40;230.290;92;0;1;1;1;0;1;73;171;25;1
 7;23;7;5;1;279;5;14;39;230.290;92;0;1;2;1;1;0;68;168;24;2
 3;25;8;5;1;179;51;18;38;249.797;93;0;1;0;1;0;0;89;170;31;4
 17;25;8;2;1;179;22;17;40;249.797;93;0;2;2;0;1;0;63;170;22;1
 24;28;8;3;1;246;25;16;41;249.797;93;0;1;0;1;0;0;67;170;23;4
 34;28;8;3;1;118;10;10;37;249.797;93;0;1;0;0;0;0;83;172;28;4
 11;26;8;3;1;289;36;13;33;249.797;93;0;1;2;1;0;1;90;172;30;8
 5;26;8;3;1;235;20;13;43;249.797;93;0;1;1;1;0;0;106;167;38;8
 15;28;8;5;1;291;31;12;40;249.797;93;0;1;1;1;0;1;73;171;25;4
 3;25;8;2;1;179;51;18;38;249.797;93;0;1;0;1;0;0;89;170;31;4
 17;25;8;3;1;179;22;17;40;249.797;93;0;2;2;0;1;0;63;170;22;8
 18;23;8;5;1;330;16;4;28;249.797;93;0;2;0;0;0;0;84;182;25;16
 1;23;8;3;1;235;11;14;37;249.797;93;0;3;1;0;0;1;88;172;29;4
 24;28;8;3;1;246;25;16;41;249.797;93;0;1;0;1;0;0;67;170;23;1
 34;28;8;3;1;118;10;10;37;249.797;93;0;1;0;0;0;0;83;172;28;5
 15;28;8;5;1;291;31;12;40;249.797;93;0;1;1;1;0;1;73;171;25;2
 20;28;8;2;1;260;50;11;36;249.797;93;0;1;4;1;0;0;65;168;23;3
 24;28;9;3;1;246;25;16;41;261.756;87;0;1;0;1;0;0;67;170;23;1
 24;28;9;3;1;246;25;16;41;261.756;87;0;1;0;1;0;0;67;170;23;1
 34;28;9;3;1;118;10;10;37;261.756;87;0;1;0;0;0;0;83;172;28;3
 14;23;9;3;1;155;12;14;34;261.756;87;0;1;2;1;0;0;95;196;25;2
 15;28;9;5;1;291;31;12;40;261.756;87;0;1;1;1;0;1;73;171;25;2
 22;23;9;6;1;179;26;9;30;261.756;87;0;3;0;0;0;0;56;171;19;8
 33;23;9;6;1;248;25;14;47;261.756;87;0;1;2;0;0;1;86;165;32;1
 3;23;9;2;1;179;51;18;38;261.756;87;0;1;0;1;0;0;89;170;31;4
 28;23;9;4;1;225;26;9;28;261.756;87;0;1;1;0;0;2;69;169;24;1
 22;23;9;2;1;179;26;9;30;261.756;87;0;3;0;0;0;0;56;171;19;2
 13;23;9;3;4;369;17;12;31;261.756;87;0;1;3;1;0;0;70;169;25;8
 10;22;9;3;4;361;52;3;28;261.756;87;0;1;1;1;0;4;80;172;27;8
 32;4;10;5;4;289;48;29;49;284.853;91;0;1;0;0;0;2;108;172;36;1
 25;11;10;5;4;235;16;8;32;284.853;91;0;3;0;0;0;0;75;178;25;3
 24;26;10;6;4;246;25;16;41;284.853;91;0;1;0;1;0;0;67;170;23;8
 32;14;10;4;4;289;48;29;49;284.853;91;0;1;0;0;0;2;108;172;36;3
 15;28;10;4;4;291;31;12;40;284.853;91;0;1;1;1;0;1;73;171;25;2
 34;23;10;3;4;118;10;10;37;284.853;91;0;1;0;0;0;0;83;172;28;2
 32;23;10;5;4;289;48;29;49;284.853;91;0;1;0;0;0;2;108;172;36;2
 15;23;10;6;4;291;31;12;40;284.853;91;0;1;1;1;0;1;73;171;25;1
 28;23;10;3;4;225;26;9;28;284.853;91;0;1;1;0;0;2;69;169;24;2
 13;23;10;3;4;369;17;12;31;284.853;91;0;1;3;1;0;0;70;169;25;8
 13;23;10;3;4;369;17;12;31;284.853;91;0;1;3;1;0;0;70;169;25;3
 28;23;10;3;4;225;26;9;28;284.853;91;0;1;1;0;0;2;69;169;24;4
 13;26;10;3;4;369;17;12;31;284.853;91;0;1;3;1;0;0;70;169;25;8
 3;28;10;4;4;179;51;18;38;284.853;91;0;1;0;1;0;0;89;170;31;3
 9;1;10;4;4;228;14;16;58;284.853;91;0;1;2;0;0;1;65;172;22;1
 15;23;10;4;4;291;31;12;40;284.853;91;0;1;1;1;0;1;73;171;25;1
 13;10;10;5;4;369;17;12;31;284.853;91;0;1;3;1;0;0;70;169;25;8
 28;13;10;5;4;225;26;9;28;284.853;91;0;1;1;0;0;2;69;169;24;1
 13;10;10;6;4;369;17;12;31;284.853;91;0;1;3;1;0;0;70;169;25;8
 28;10;10;6;4;225;26;9;28;284.853;91;0;1;1;0;0;2;69;169;24;3
 6;23;10;2;4;189;29;13;33;284.853;91;0;1;2;0;0;2;69;167;25;8
 25;6;10;2;4;235;16;8;32;284.853;91;0;3;0;0;0;0;75;178;25;8
 33;10;10;2;4;248;25;14;47;284.853;91;0;1;2;0;0;1;86;165;32;8
 28;0;10;2;4;225;26;9;28;284.853;91;1;1;1;0;0;2;69;169;24;0
 28;13;10;3;4;225;26;9;28;284.853;91;0;1;1;0;0;2;69;169;24;3
 3;21;11;3;4;179;51;18;38;268.519;93;0;1;0;1;0;0;89;170;31;1
 34;28;11;4;4;118;10;10;37;268.519;93;0;1;0;0;0;0;83;172;28;3
 18;2;11;4;4;330;16;4;28;268.519;93;0;2;0;0;0;0;84;182;25;24
 3;28;11;6;4;179;51;18;38;268.519;93;0;1;0;1;0;0;89;170;31;1
 34;9;11;3;4;118;10;10;37;268.519;93;0;1;0;0;0;0;83;172;28;8
 11;24;11;4;4;289;36;13;33;268.519;93;0;1;2;1;0;1;90;172;30;8
 25;1;11;6;4;235;16;8;32;268.519;93;0;3;0;0;0;0;75;178;25;8
 28;23;11;6;4;225;26;9;28;268.519;93;0;1;1;0;0;2;69;169;24;4
 10;22;11;3;4;361;52;3;28;268.519;93;0;1;1;1;0;4;80;172;27;8
 15;28;11;4;4;291;31;12;40;268.519;93;0;1;1;1;0;1;73;171;25;2
 34;13;11;5;4;118;10;10;37;268.519;93;0;1;0;0;0;0;83;172;28;2
 28;14;11;5;4;225;26;9;28;268.519;93;0;1;1;0;0;2;69;169;24;3
 3;28;11;2;4;179;51;18;38;268.519;93;0;1;0;1;0;0;89;170;31;1
 34;23;11;2;4;118;10;10;37;268.519;93;0;1;0;0;0;0;83;172;28;8
 34;8;11;3;4;118;10;10;37;268.519;93;0;1;0;0;0;0;83;172;28;8
 28;23;11;3;4;225;26;9;28;268.519;93;0;1;1;0;0;2;69;169;24;2
 15;0;11;3;4;291;31;12;40;268.519;93;1;1;1;1;0;1;73;171;25;0
 11;0;11;4;4;289;36;13;33;268.519;93;1;1;2;1;0;1;90;172;30;0
 33;14;11;5;4;248;25;14;47;268.519;93;0;1;2;0;0;1;86;165;32;4
 5;0;11;5;4;235;20;13;43;268.519;93;1;1;1;1;0;0;106;167;38;0
 28;23;11;6;4;225;26;9;28;268.519;93;0;1;1;0;0;2;69;169;24;2
 13;26;11;6;4;369;17;12;31;268.519;93;0;1;3;1;0;0;70;169;25;8
 10;28;11;2;4;361;52;3;28;268.519;93;0;1;1;1;0;4;80;172;27;2
 3;13;12;3;4;179;51;18;38;280.549;98;0;1;0;1;0;0;89;170;31;32
 15;28;12;4;4;291;31;12;40;280.549;98;0;1;1;1;0;1;73;171;25;1
 28;23;12;4;4;225;26;9;28;280.549;98;0;1;1;0;0;2;69;169;24;3
 22;13;12;6;4;179;26;9;30;280.549;98;0;3;0;0;0;0;56;171;19;1
 28;23;12;6;4;225;26;9;28;280.549;98;0;1;1;0;0;2;69;169;24;3
 28;23;12;4;4;225;26;9;28;280.549;98;0;1;1;0;0;2;69;169;24;3
 10;14;12;5;4;361;52;3;28;280.549;98;0;1;1;1;0;4;80;172;27;4
 17;18;12;6;4;179;22;17;40;280.549;98;0;2;2;0;1;0;63;170;22;2
 5;26;12;6;4;235;20;13;43;280.549;98;0;1;1;1;0;0;106;167;38;8
 12;18;12;2;4;233;51;1;31;280.549;98;0;2;1;1;0;8;68;178;21;8
 22;13;12;3;4;179;26;9;30;280.549;98;0;3;0;0;0;0;56;171;19;16
 28;23;12;3;4;225;26;9;28;280.549;98;0;1;1;0;0;2;69;169;24;2
 28;23;12;5;4;225;26;9;28;280.549;98;0;1;1;0;0;2;69;169;24;3
 28;23;12;2;4;225;26;9;28;280.549;98;0;1;1;0;0;2;69;169;24;2
 14;18;12;3;2;155;12;14;34;280.549;98;0;1;2;1;0;0;95;196;25;80
 22;12;1;2;2;179;26;9;30;313.532;96;0;3;0;0;0;0;56;171;19;24
 22;12;1;5;2;179;26;9;30;313.532;96;0;3;0;0;0;0;56;171;19;16
 17;25;1;5;2;179;22;17;40;313.532;96;0;2;2;0;1;0;63;170;22;2
 17;25;1;6;2;179;22;17;40;313.532;96;0;2;2;0;1;0;63;170;22;2
 22;13;1;2;2;179;26;9;30;313.532;96;0;3;0;0;0;0;56;171;19;3
 17;25;1;4;2;179;22;17;40;313.532;96;0;2;2;0;1;0;63;170;22;2
 32;10;1;5;2;289;48;29;49;313.532;96;0;1;0;0;0;2;108;172;36;8
 17;18;1;6;2;179;22;17;40;313.532;96;0;2;2;0;1;0;63;170;22;3
 22;27;1;2;2;179;26;9;30;313.532;96;0;3;0;0;0;0;56;171;19;2
 14;18;1;3;2;155;12;14;34;313.532;96;0;1;2;1;0;0;95;196;25;8
 22;27;1;4;2;179;26;9;30;313.532;96;0;3;0;0;0;0;56;171;19;2
 3;27;1;4;2;179;51;18;38;313.532;96;0;1;0;1;0;0;89;170;31;3
 11;13;1;4;2;289;36;13;33;313.532;96;0;1;2;1;0;1;90;172;30;8
 3;27;1;5;2;179;51;18;38;313.532;96;0;1;0;1;0;0;89;170;31;3
 3;27;1;6;2;179;51;18;38;313.532;96;0;1;0;1;0;0;89;170;31;2
 3;13;2;3;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;8
 28;23;2;3;2;225;26;9;28;264.249;97;0;1;1;0;0;2;69;169;24;3
 33;1;2;4;2;248;25;14;47;264.249;97;0;1;2;0;0;1;86;165;32;8
 3;27;2;4;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;2
 28;28;2;5;2;225;26;9;28;264.249;97;0;1;1;0;0;2;69;169;24;3
 3;27;2;5;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;2
 22;27;2;5;2;179;26;9;30;264.249;97;0;3;0;0;0;0;56;171;19;2
 29;28;2;6;2;225;15;15;41;264.249;97;0;4;2;1;0;2;94;182;28;2
 3;27;2;6;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;2
 12;19;2;2;2;233;51;1;31;264.249;97;0;2;1;1;0;8;68;178;21;2
 3;27;2;2;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;2
 28;7;2;3;2;225;26;9;28;264.249;97;0;1;1;0;0;2;69;169;24;8
 3;27;2;4;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;3
 3;27;2;5;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;3
 28;25;2;5;2;225;26;9;28;264.249;97;0;1;1;0;0;2;69;169;24;3
 22;13;2;5;2;179;26;9;30;264.249;97;0;3;0;0;0;0;56;171;19;2
 17;23;2;6;2;179;22;17;40;264.249;97;0;2;2;0;1;0;63;170;22;2
 3;27;2;6;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;3
 12;12;2;4;2;233;51;1;31;264.249;97;0;2;1;1;0;8;68;178;21;3
 22;27;2;4;2;179;26;9;30;264.249;97;0;3;0;0;0;0;56;171;19;2
 3;27;2;4;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;2
 3;13;2;5;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;8
 3;27;2;6;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;2
 14;25;2;2;2;155;12;14;34;264.249;97;0;1;2;1;0;0;95;196;25;5
 25;25;2;2;2;235;16;8;32;264.249;97;0;3;0;0;0;0;75;178;25;3
 3;27;2;2;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;2
 28;7;2;2;2;225;26;9;28;264.249;97;0;1;1;0;0;2;69;169;24;2
 3;27;2;3;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;2
 33;23;2;3;2;248;25;14;47;264.249;97;0;1;2;0;0;1;86;165;32;2
 28;25;2;3;2;225;26;9;28;264.249;97;0;1;1;0;0;2;69;169;24;2
 3;27;2;4;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;2
 3;27;2;5;2;179;51;18;38;264.249;97;0;1;0;1;0;0;89;170;31;2
 25;25;2;6;2;235;16;8;32;264.249;97;0;3;0;0;0;0;75;178;25;2
 3;27;3;2;2;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;2
 33;23;3;2;2;248;25;14;47;222.196;99;0;1;2;0;0;1;86;165;32;2
 9;25;3;3;2;228;14;16;58;222.196;99;0;1;2;0;0;1;65;172;22;3
 33;25;3;3;2;248;25;14;47;222.196;99;0;1;2;0;0;1;86;165;32;3
 9;12;3;3;2;228;14;16;58;222.196;99;0;1;2;0;0;1;65;172;22;112
 3;27;3;4;2;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;2
 28;27;3;5;2;225;26;9;28;222.196;99;0;1;1;0;0;2;69;169;24;2
 3;27;3;5;2;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;3
 28;25;3;5;2;225;26;9;28;222.196;99;0;1;1;0;0;2;69;169;24;2
 22;27;3;6;2;179;26;9;30;222.196;99;0;3;0;0;0;0;56;171;19;3
 25;25;3;2;2;235;16;8;32;222.196;99;0;3;0;0;0;0;75;178;25;3
 10;19;3;2;2;361;52;3;28;222.196;99;0;1;1;1;0;4;80;172;27;8
 3;13;3;3;2;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;8
 3;27;3;4;2;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;2
 3;27;3;5;2;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;3
 22;27;3;6;2;179;26;9;30;222.196;99;0;3;0;0;0;0;56;171;19;2
 3;10;3;2;2;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;4
 33;13;3;2;2;248;25;14;47;222.196;99;0;1;2;0;0;1;86;165;32;2
 3;27;3;2;2;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;3
 28;7;3;2;2;225;26;9;28;222.196;99;0;1;1;0;0;2;69;169;24;8
 3;27;3;3;2;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;2
 11;23;3;4;2;289;36;13;33;222.196;99;0;1;2;1;0;1;90;172;30;8
 9;25;3;4;2;228;14;16;58;222.196;99;0;1;2;0;0;1;65;172;22;2
 3;27;3;4;2;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;2
 33;23;3;5;2;248;25;14;47;222.196;99;0;1;2;0;0;1;86;165;32;3
 3;27;3;5;2;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;3
 22;23;3;6;2;179;26;9;30;222.196;99;0;3;0;0;0;0;56;171;19;2
 3;27;3;6;2;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;3
 3;27;3;3;3;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;3
 16;23;3;4;3;118;15;24;46;222.196;99;0;1;2;1;1;0;75;175;25;8
 14;13;3;4;3;155;12;14;34;222.196;99;0;1;2;1;0;0;95;196;25;24
 3;27;3;4;3;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;3
 3;27;3;5;3;179;51;18;38;222.196;99;0;1;0;1;0;0;89;170;31;3
 22;13;3;2;3;179;26;9;30;222.196;99;0;3;0;0;0;0;56;171;19;2
 11;19;3;2;3;289;36;13;33;222.196;99;0;1;2;1;0;1;90;172;30;104
 13;22;3;4;3;369;17;12;31;222.196;99;0;1;3;1;0;0;70;169;25;8
 28;13;4;2;3;225;26;9;28;246.288;91;0;1;1;0;0;2;69;169;24;8
 34;10;4;2;3;118;10;10;37;246.288;91;0;1;0;0;0;0;83;172;28;8
 10;19;4;3;3;361;52;3;28;246.288;91;0;1;1;1;0;4;80;172;27;8
 33;19;4;4;3;248;25;14;47;246.288;91;0;1;2;0;0;1;86;165;32;8
 6;13;4;5;3;189;29;13;33;246.288;91;0;1;2;0;0;2;69;167;25;8
 22;27;4;6;3;179;26;9;30;246.288;91;0;3;0;0;0;0;56;171;19;2
 13;7;4;2;3;369;17;12;31;246.288;91;0;1;3;1;0;0;70;169;25;24
 17;16;4;3;3;179;22;17;40;246.288;91;0;2;2;0;1;0;63;170;22;2
 36;23;4;3;3;118;13;18;50;246.288;91;0;1;1;1;0;0;98;178;31;3
 10;23;4;3;3;361;52;3;28;246.288;91;0;1;1;1;0;4;80;172;27;2
 34;10;4;4;3;118;10;10;37;246.288;91;0;1;0;0;0;0;83;172;28;2
 1;22;4;6;3;235;11;14;37;246.288;91;0;3;1;0;0;1;88;172;29;8
 22;27;4;6;3;179;26;9;30;246.288;91;0;3;0;0;0;0;56;171;19;2
 28;19;4;2;3;225;26;9;28;246.288;91;0;1;1;0;0;2;69;169;24;8
 25;16;4;3;3;235;16;8;32;246.288;91;0;3;0;0;0;0;75;178;25;3
 22;27;4;6;3;179;26;9;30;246.288;91;0;3;0;0;0;0;56;171;19;2
 14;28;4;3;3;155;12;14;34;246.288;91;0;1;2;1;0;0;95;196;25;4
 28;19;4;5;3;225;26;9;28;246.288;91;0;1;1;0;0;2;69;169;24;8
 36;14;4;5;3;118;13;18;50;246.288;91;0;1;1;1;0;0;98;178;31;2
 22;27;4;6;3;179;26;9;30;246.288;91;0;3;0;0;0;0;56;171;19;2
 1;22;5;2;3;235;11;14;37;237.656;99;0;3;1;0;0;1;88;172;29;8
 29;19;5;4;3;225;15;15;41;237.656;99;0;4;2;1;0;2;94;182;28;3
 25;28;5;4;3;235;16;8;32;237.656;99;0;3;0;0;0;0;75;178;25;2
 34;8;5;4;3;118;10;10;37;237.656;99;0;1;0;0;0;0;83;172;28;3
 5;26;5;4;3;235;20;13;43;237.656;99;0;1;1;1;0;0;106;167;38;8
 22;13;5;5;3;179;26;9;30;237.656;99;0;3;0;0;0;0;56;171;19;1
 15;28;5;5;3;291;31;12;40;237.656;99;0;1;1;1;0;1;73;171;25;2
 29;14;5;5;3;225;15;15;41;237.656;99;0;4;2;1;0;2;94;182;28;8
 26;19;5;6;3;300;26;13;43;237.656;99;0;1;2;1;1;1;77;175;25;64
 29;22;5;6;3;225;15;15;41;237.656;99;0;4;2;1;0;2;94;182;28;8
 22;27;5;6;3;179;26;9;30;237.656;99;0;3;0;0;0;0;56;171;19;2
 36;23;5;2;3;118;13;18;50;237.656;99;0;1;1;1;0;0;98;178;31;2
 36;5;5;3;3;118;13;18;50;237.656;99;0;1;1;1;0;0;98;178;31;3
 34;28;5;3;3;118;10;10;37;237.656;99;0;1;0;0;0;0;83;172;28;1
 36;0;5;3;3;118;13;18;50;237.656;99;1;1;1;1;0;0;98;178;31;0
 22;27;5;4;3;179;26;9;30;237.656;99;0;3;0;0;0;0;56;171;19;2
 23;0;5;4;3;378;49;11;36;237.656;99;1;1;2;0;1;4;65;174;21;0
 17;16;5;6;3;179;22;17;40;237.656;99;0;2;2;0;1;0;63;170;22;1
 14;10;5;2;3;155;12;14;34;237.656;99;0;1;2;1;0;0;95;196;25;48
 25;10;5;2;3;235;16;8;32;237.656;99;0;3;0;0;0;0;75;178;25;8
 15;22;5;4;3;291;31;12;40;237.656;99;0;1;1;1;0;1;73;171;25;8
 17;10;5;4;3;179;22;17;40;237.656;99;0;2;2;0;1;0;63;170;22;8
 28;6;5;4;3;225;26;9;28;237.656;99;0;1;1;0;0;2;69;169;24;3
 18;10;5;5;3;330;16;4;28;237.656;99;0;2;0;0;0;0;84;182;25;8
 25;23;5;5;3;235;16;8;32;237.656;99;0;3;0;0;0;0;75;178;25;2
 15;28;5;5;3;291;31;12;40;237.656;99;0;1;1;1;0;1;73;171;25;2
 22;27;5;6;3;179;26;9;30;237.656;99;0;3;0;0;0;0;56;171;19;2
 10;7;5;2;3;361;52;3;28;237.656;99;0;1;1;1;0;4;80;172;27;8
 14;23;5;4;3;155;12;14;34;237.656;99;0;1;2;1;0;0;95;196;25;2
 17;25;5;6;3;179;22;17;40;237.656;99;0;2;2;0;1;0;63;170;22;8
 14;10;5;6;3;155;12;14;34;237.656;99;0;1;2;1;0;0;95;196;25;8
 28;11;5;2;3;225;26;9;28;237.656;99;0;1;1;0;0;2;69;169;24;1
 16;7;6;4;3;118;15;24;46;275.089;96;0;1;2;1;1;0;75;175;25;8
 22;27;6;4;3;179;26;9;30;275.089;96;0;3;0;0;0;0;56;171;19;3
 34;26;6;6;3;118;10;10;37;275.089;96;0;1;0;0;0;0;83;172;28;8
 34;10;6;4;3;118;10;10;37;275.089;96;0;1;0;0;0;0;83;172;28;8
 23;22;6;5;3;378;49;11;36;275.089;96;0;1;2;0;1;4;65;174;21;8
 36;19;6;5;3;118;13;18;50;275.089;96;0;1;1;1;0;0;98;178;31;24
 12;19;6;6;3;233;51;1;31;275.089;96;0;2;1;1;0;8;68;178;21;8
 22;27;6;6;3;179;26;9;30;275.089;96;0;3;0;0;0;0;56;171;19;2
 2;0;6;2;3;235;29;12;48;275.089;96;1;1;1;0;1;5;88;163;33;0
 21;0;6;2;3;268;11;8;33;275.089;96;1;2;0;0;0;0;79;178;25;0
 36;19;6;5;3;118;13;18;50;275.089;96;0;1;1;1;0;0;98;178;31;3
 22;13;6;5;3;179;26;9;30;275.089;96;0;3;0;0;0;0;56;171;19;2
 15;28;6;5;3;291;31;12;40;275.089;96;0;1;1;1;0;1;73;171;25;2
 22;13;6;2;1;179;26;9;30;275.089;96;0;3;0;0;0;0;56;171;19;3
 34;25;6;2;1;118;10;10;37;275.089;96;0;1;0;0;0;0;83;172;28;3
 12;22;6;5;1;233;51;1;31;275.089;96;0;2;1;1;0;8;68;178;21;8
 34;8;6;6;1;118;10;10;37;275.089;96;0;1;0;0;0;0;83;172;28;2
 34;10;6;4;1;118;10;10;37;275.089;96;0;1;0;0;0;0;83;172;28;3
 12;22;6;4;1;233;51;1;31;275.089;96;0;2;1;1;0;8;68;178;21;3
 5;26;7;4;1;235;20;13;43;264.604;93;0;1;1;1;0;0;106;167;38;4
 12;19;7;6;1;233;51;1;31;264.604;93;0;2;1;1;0;8;68;178;21;2
 9;6;7;2;1;228;14;16;58;264.604;93;0;1;2;0;0;1;65;172;22;8
 34;28;7;2;1;118;10;10;37;264.604;93;0;1;0;0;0;0;83;172;28;4
 9;6;7;3;1;228;14;16;58;264.604;93;0;1;2;0;0;1;65;172;22;120
 6;22;7;3;1;189;29;13;33;264.604;93;0;1;2;0;0;2;69;167;25;16
 34;23;7;4;1;118;10;10;37;264.604;93;0;1;0;0;0;0;83;172;28;2
 10;22;7;4;1;361;52;3;28;264.604;93;0;1;1;1;0;4;80;172;27;8
 28;22;7;4;1;225;26;9;28;264.604;93;0;1;1;0;0;2;69;169;24;8
 13;13;7;2;1;369;17;12;31;264.604;93;0;1;3;1;0;0;70;169;25;80
 11;14;7;3;1;289;36;13;33;264.604;93;0;1;2;1;0;1;90;172;30;8
 1;11;7;3;1;235;11;14;37;264.604;93;0;3;1;0;0;1;88;172;29;4
 4;0;0;3;1;118;14;13;40;271.219;95;0;1;1;1;0;8;98;170;34;0
 8;0;0;4;2;231;35;14;39;271.219;95;0;1;2;1;0;2;100;170;35;0
 35;0;0;6;3;179;45;14;53;271.219;95;0;1;1;0;0;1;77;175;25;0
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -10,6 +10,7 @@ numpy==1.24.3
 scikit-learn==1.3.0
 xgboost==1.7.6
 lightgbm==4.1.0
 torch==2.6.0
 joblib==1.3.1
 # Utilities
--- a/backend/services/analysis_service.py
+++ b/backend/services/analysis_service.py
@@ -1,6 +1,6 @@
 import os
 import joblib
 import numpy as np
 import config
 from core.feature_mining import get_correlation_for_heatmap, group_comparison
@@ -10,109 +10,95 @@ class AnalysisService:
    def __init__(self):
        self.models = {}
        self.feature_names = None
-    
+        self.selected_features = None
        self.training_metadata = {}
    def _ensure_models_loaded(self):
-        if not self.models:
+        if self.models:
-            model_files = {
+            return
-                'random_forest': 'random_forest_model.pkl',
+        metadata_path = os.path.join(config.MODELS_DIR, 'training_metadata.pkl')
-                'xgboost': 'xgboost_model.pkl',
+        if os.path.exists(metadata_path):
-                'lightgbm': 'lightgbm_model.pkl',
+            self.training_metadata = joblib.load(metadata_path)
-            }
+        model_files = {
-            
+            'random_forest': 'random_forest_model.pkl',
-            for name, filename in model_files.items():
+            'xgboost': 'xgboost_model.pkl',
-                model_path = os.path.join(config.MODELS_DIR, filename)
+            'lightgbm': 'lightgbm_model.pkl',
-                if os.path.exists(model_path):
+            'gradient_boosting': 'gradient_boosting_model.pkl',
-                    try:
+        }
-                        self.models[name] = joblib.load(model_path)
+        allowed_models = self.training_metadata.get('available_models')
-                    except Exception as e:
+        if allowed_models:
-                        print(f"Failed to load {name}: {e}")
+            model_files = {k: v for k, v in model_files.items() if k in allowed_models}
-            
+        for name, filename in model_files.items():
-            feature_names_path = os.path.join(config.MODELS_DIR, 'feature_names.pkl')
+            path = os.path.join(config.MODELS_DIR, filename)
-            if os.path.exists(feature_names_path):
+            if os.path.exists(path):
-                self.feature_names = joblib.load(feature_names_path)
+                try:
-    
+                    self.models[name] = joblib.load(path)
                except Exception as exc:
                    print(f'Failed to load model {name}: {exc}')
        for filename, attr in [('feature_names.pkl', 'feature_names'), ('selected_features.pkl', 'selected_features')]:
            path = os.path.join(config.MODELS_DIR, filename)
            if os.path.exists(path):
                try:
                    setattr(self, attr, joblib.load(path))
                except Exception as exc:
                    print(f'Failed to load artifact {filename}: {exc}')
    def get_feature_importance(self, model_type='random_forest'):
        self._ensure_models_loaded()
        if model_type not in self.models:
-            if self.models:
+            model_type = next(iter(self.models), 'default')
-                model_type = list(self.models.keys())[0]
+        if model_type == 'default':
-            else:
+            return self._get_default_importance()
                return self._get_default_importance()
        model = self.models[model_type]
-        
+        if not hasattr(model, 'feature_importances_'):
-        try:
+            return self._get_default_importance()
-            if hasattr(model, 'feature_importances_'):
+
-                importances = model.feature_importances_
+        importances = model.feature_importances_
-            else:
+        feature_names = self.selected_features or self.feature_names or []
-                return self._get_default_importance()
+        if len(feature_names) != len(importances):
-            
+            feature_names = [f'feature_{idx}' for idx in range(len(importances))]
-            feature_names = self.feature_names or [f'feature_{i}' for i in range(len(importances))]
+        ranked = sorted(zip(feature_names, importances), key=lambda item: item[1], reverse=True)[:15]
-            
+        return {
-            if len(feature_names) != len(importances):
+            'model_type': model_type,
-                feature_names = [f'feature_{i}' for i in range(len(importances))]
+            'features': [
-            
+                {
            feature_importance = list(zip(feature_names, importances))
            feature_importance.sort(key=lambda x: x[1], reverse=True)
            features = []
            for i, (name, imp) in enumerate(feature_importance[:15]):
                features.append({
                    'name': name,
                    'name_cn': config.FEATURE_NAME_CN.get(name, name),
-                    'importance': round(float(imp), 4),
+                    'importance': round(float(importance), 4),
-                    'rank': i + 1
+                    'rank': idx + 1,
-                })
+                }
-            
+                for idx, (name, importance) in enumerate(ranked)
-            return {
+            ],
-                'model_type': model_type,
+        }
-                'features': features
+
            }
        except Exception as e:
            print(f"Error getting feature importance: {e}")
            return self._get_default_importance()
    def _get_default_importance(self):
-        default_features = [
+        defaults = [
-            ('Reason for absence', 0.25),
+            ('加班通勤压力指数', 0.24),
-            ('Transportation expense', 0.12),
+            ('健康风险指数', 0.18),
-            ('Distance from Residence to Work', 0.10),
+            ('请假类型', 0.12),
-            ('Service time', 0.08),
+            ('通勤时长分钟', 0.1),
-            ('Age', 0.07),
+            ('月均加班时长', 0.08),
-            ('Work load Average/day', 0.06),
+            ('近90天缺勤次数', 0.07),
-            ('Body mass index', 0.05),
+            ('心理压力等级', 0.06),
-            ('Social drinker', 0.04),
+            ('家庭负担指数', 0.05),
            ('Hit target', 0.03),
            ('Son', 0.03),
            ('Pet', 0.02),
            ('Education', 0.02),
            ('Social smoker', 0.01)
        ]
        features = []
        for i, (name, imp) in enumerate(default_features):
            features.append({
                'name': name,
                'name_cn': config.FEATURE_NAME_CN.get(name, name),
                'importance': imp,
                'rank': i + 1
            })
        return {
            'model_type': 'default',
-            'features': features
+            'features': [
                {
                    'name': name,
                    'name_cn': config.FEATURE_NAME_CN.get(name, name),
                    'importance': importance,
                    'rank': idx + 1,
                }
                for idx, (name, importance) in enumerate(defaults)
            ],
        }
-    
+
    def get_correlation(self):
        return get_correlation_for_heatmap()
-    
+
    def get_group_comparison(self, dimension):
        valid_dimensions = ['drinker', 'smoker', 'education', 'children', 'pet']
        if dimension not in valid_dimensions:
            raise ValueError(f"Invalid dimension: {dimension}. Must be one of {valid_dimensions}")
        return group_comparison(dimension)
--- a/backend/services/cluster_service.py
+++ b/backend/services/cluster_service.py
@@ -11,7 +11,7 @@ class ClusterService:
    def get_cluster_profile(self, n_clusters=3):
        return self.analyzer.get_cluster_profile(n_clusters)
-    def get_scatter_data(self, n_clusters=3, x_axis='Age', y_axis='Absenteeism time in hours'):
+    def get_scatter_data(self, n_clusters=3, x_axis='月均加班时长', y_axis='缺勤时长（小时）'):
        return self.analyzer.get_scatter_data(n_clusters, x_axis, y_axis)
--- a/backend/services/data_service.py
+++ b/backend/services/data_service.py
@@ -1,6 +1,3 @@
 import pandas as pd
 import numpy as np
 import config
 from core.preprocessing import get_clean_data
@@ -8,154 +5,103 @@ from core.preprocessing import get_clean_data
 class DataService:
    def __init__(self):
        self._df = None
-    
+
    @property
    def df(self):
        if self._df is None:
            self._df = get_clean_data()
        return self._df
-    
+
    def get_basic_stats(self):
        df = self.df
        total_records = len(df)
-        total_employees = df['ID'].nunique()
+        total_employees = df[config.EMPLOYEE_ID_COLUMN].nunique()
-        total_absent_hours = df['Absenteeism time in hours'].sum()
+        avg_absent_hours = round(df[config.TARGET_COLUMN].mean(), 2)
-        avg_absent_hours = round(df['Absenteeism time in hours'].mean(), 2)
+        max_absent_hours = round(float(df[config.TARGET_COLUMN].max()), 1)
-        max_absent_hours = int(df['Absenteeism time in hours'].max())
+        min_absent_hours = round(float(df[config.TARGET_COLUMN].min()), 1)
-        min_absent_hours = int(df['Absenteeism time in hours'].min())
+        high_risk_count = len(df[df[config.TARGET_COLUMN] > 8])
        high_risk_count = len(df[df['Absenteeism time in hours'] > 8])
        high_risk_ratio = round(high_risk_count / total_records, 4)
        return {
            'total_records': total_records,
            'total_employees': total_employees,
            'total_absent_hours': int(total_absent_hours),
            'avg_absent_hours': avg_absent_hours,
            'max_absent_hours': max_absent_hours,
            'min_absent_hours': min_absent_hours,
-            'high_risk_ratio': high_risk_ratio
+            'high_risk_ratio': round(high_risk_count / total_records, 4),
            'industries_covered': int(df['所属行业'].nunique()),
        }
-    
+
    def get_monthly_trend(self):
        df = self.df
-        
+        monthly = df.groupby('缺勤月份').agg({config.TARGET_COLUMN: ['sum', 'mean', 'count']}).reset_index()
        monthly = df.groupby('Month of absence').agg({
            'Absenteeism time in hours': ['sum', 'mean', 'count']
        }).reset_index()
        monthly.columns = ['month', 'total_hours', 'avg_hours', 'record_count']
-        
+        result = {'months': [], 'total_hours': [], 'avg_hours': [], 'record_counts': []}
-        months = ['1月', '2月', '3月', '4月', '5月', '6月', 
+        for month in range(1, 13):
-                  '7月', '8月', '9月', '10月', '11月', '12月']
+            row = monthly[monthly['month'] == month]
-        
+            result['months'].append(f'{month}月')
-        result = {
+            if len(row):
-            'months': months,
+                result['total_hours'].append(round(float(row['total_hours'].values[0]), 1))
            'total_hours': [],
            'avg_hours': [],
            'record_counts': []
        }
        for i in range(1, 13):
            row = monthly[monthly['month'] == i]
            if len(row) > 0:
                result['total_hours'].append(int(row['total_hours'].values[0]))
                result['avg_hours'].append(round(float(row['avg_hours'].values[0]), 2))
                result['record_counts'].append(int(row['record_count'].values[0]))
            else:
                result['total_hours'].append(0)
                result['avg_hours'].append(0)
                result['record_counts'].append(0)
        return result
-    
+
    def get_weekday_distribution(self):
        df = self.df
-        
+        weekday = df.groupby('星期几').agg({config.TARGET_COLUMN: ['sum', 'mean', 'count']}).reset_index()
        weekday = df.groupby('Day of the week').agg({
            'Absenteeism time in hours': ['sum', 'mean', 'count']
        }).reset_index()
        weekday.columns = ['weekday', 'total_hours', 'avg_hours', 'record_count']
-        
+        result = {'weekdays': [], 'weekday_codes': [], 'total_hours': [], 'avg_hours': [], 'record_counts': []}
-        result = {
+        for code in range(1, 8):
            'weekdays': [],
            'weekday_codes': [],
            'total_hours': [],
            'avg_hours': [],
            'record_counts': []
        }
        for code in [2, 3, 4, 5, 6]:
            row = weekday[weekday['weekday'] == code]
            result['weekdays'].append(config.WEEKDAY_NAMES.get(code, str(code)))
            result['weekday_codes'].append(code)
-            if len(row) > 0:
+            if len(row):
-                result['total_hours'].append(int(row['total_hours'].values[0]))
+                result['total_hours'].append(round(float(row['total_hours'].values[0]), 1))
                result['avg_hours'].append(round(float(row['avg_hours'].values[0]), 2))
                result['record_counts'].append(int(row['record_count'].values[0]))
            else:
                result['total_hours'].append(0)
                result['avg_hours'].append(0)
                result['record_counts'].append(0)
        return result
-    
+
    def get_reason_distribution(self):
        df = self.df
-        
+        reason = df.groupby('请假原因大类').agg({config.TARGET_COLUMN: 'count'}).reset_index()
-        reason = df.groupby('Reason for absence').agg({
+        reason.columns = ['name', 'count']
            'Absenteeism time in hours': 'count'
        }).reset_index()
        reason.columns = ['code', 'count']
        reason = reason.sort_values('count', ascending=False)
        total = reason['count'].sum()
-        
+        return {
-        result = {
+            'reasons': [
-            'reasons': []
+                {
                    'name': row['name'],
                    'count': int(row['count']),
                    'percentage': round(float(row['count']) / total * 100, 1),
                }
                for _, row in reason.iterrows()
            ]
        }
-        
+
        for _, row in reason.iterrows():
            code = int(row['code'])
            result['reasons'].append({
                'code': code,
                'name': config.REASON_NAMES.get(code, f'原因{code}'),
                'count': int(row['count']),
                'percentage': round(row['count'] / total * 100, 1)
            })
        return result
    def get_season_distribution(self):
        df = self.df
-        
+        season = df.groupby('季节').agg({config.TARGET_COLUMN: ['sum', 'mean', 'count']}).reset_index()
        season = df.groupby('Seasons').agg({
            'Absenteeism time in hours': ['sum', 'mean', 'count']
        }).reset_index()
        season.columns = ['season', 'total_hours', 'avg_hours', 'record_count']
        total_records = season['record_count'].sum()
-        
+        result = {'seasons': []}
        result = {
            'seasons': []
        }
        for code in [1, 2, 3, 4]:
            row = season[season['season'] == code]
-            if len(row) > 0:
+            if not len(row):
-                result['seasons'].append({
+                continue
-                    'code': int(code),
+            result['seasons'].append({
-                    'name': config.SEASON_NAMES.get(code, f'季节{code}'),
+                'code': code,
-                    'total_hours': int(row['total_hours'].values[0]),
+                'name': config.SEASON_NAMES.get(code, f'季节{code}'),
-                    'avg_hours': round(float(row['avg_hours'].values[0]), 2),
+                'total_hours': round(float(row['total_hours'].values[0]), 1),
-                    'record_count': int(row['record_count'].values[0]),
+                'avg_hours': round(float(row['avg_hours'].values[0]), 2),
-                    'percentage': round(row['record_count'].values[0] / total_records * 100, 1)
+                'record_count': int(row['record_count'].values[0]),
-                })
+                'percentage': round(float(row['record_count'].values[0]) / total_records * 100, 1),
-        
+            })
        return result
--- a/backend/services/predict_service.py
+++ b/backend/services/predict_service.py
@@ -1,41 +1,31 @@
 import os
-import numpy as np
+
 import joblib
 import numpy as np
 import config
 from core.deep_learning_model import load_lstm_mlp_bundle, predict_lstm_mlp
 from core.model_features import (
    align_feature_frame,
    apply_label_encoders,
    build_prediction_dataframe,
    engineer_features,
    to_float_array,
 )
 MODEL_INFO = {
-    'random_forest': {
+    'random_forest': {'name': 'random_forest', 'name_cn': '随机森林', 'description': '稳健的树模型集成'},
-        'name': 'random_forest',
+    'xgboost': {'name': 'xgboost', 'name_cn': 'XGBoost', 'description': '梯度提升树模型'},
-        'name_cn': '随机森林',
+    'lightgbm': {'name': 'lightgbm', 'name_cn': 'LightGBM', 'description': '轻量级梯度提升树'},
-        'description': '基于决策树的集成学习算法'
+    'gradient_boosting': {'name': 'gradient_boosting', 'name_cn': 'GBDT', 'description': '梯度提升决策树'},
    'extra_trees': {'name': 'extra_trees', 'name_cn': '极端随机树', 'description': '高随机性的树模型'},
    'stacking': {'name': 'stacking', 'name_cn': 'Stacking集成', 'description': '多模型融合'},
    'lstm_mlp': {
        'name': 'lstm_mlp',
        'name_cn': '时序注意力融合网络',
        'description': 'Transformer时序编码 + 静态特征门控融合的深度学习模型',
    },
    'xgboost': {
        'name': 'xgboost',
        'name_cn': 'XGBoost',
        'description': '高效的梯度提升算法'
    },
    'lightgbm': {
        'name': 'lightgbm',
        'name_cn': 'LightGBM',
        'description': '微软轻量级梯度提升框架'
    },
    'gradient_boosting': {
        'name': 'gradient_boosting',
        'name_cn': 'GBDT',
        'description': '梯度提升决策树'
    },
    'extra_trees': {
        'name': 'extra_trees',
        'name_cn': '极端随机树',
        'description': '随机森林的变体，随机性更强'
    },
    'stacking': {
        'name': 'stacking',
        'name_cn': 'Stacking集成',
        'description': '多层堆叠集成学习'
    }
 }
@@ -47,326 +37,184 @@ class PredictService:
        self.selected_features = None
        self.label_encoders = {}
        self.model_metrics = {}
        self.training_metadata = {}
        self.default_model = 'random_forest'
-    
+
    def _ensure_models_loaded(self):
        if not self.models:
            self.load_models()
-    
+
    def load_models(self):
        metadata_path = os.path.join(config.MODELS_DIR, 'training_metadata.pkl')
        if os.path.exists(metadata_path):
            self.training_metadata = joblib.load(metadata_path)
        model_files = {
            'random_forest': 'random_forest_model.pkl',
            'xgboost': 'xgboost_model.pkl',
            'lightgbm': 'lightgbm_model.pkl',
            'gradient_boosting': 'gradient_boosting_model.pkl',
            'extra_trees': 'extra_trees_model.pkl',
-            'stacking': 'stacking_model.pkl'
+            'stacking': 'stacking_model.pkl',
            'lstm_mlp': 'lstm_mlp_model.pt',
        }
-        
+        allowed_models = self.training_metadata.get('available_models')
        if allowed_models:
            model_files = {k: v for k, v in model_files.items() if k in allowed_models}
        for name, filename in model_files.items():
-            model_path = os.path.join(config.MODELS_DIR, filename)
+            path = os.path.join(config.MODELS_DIR, filename)
-            if os.path.exists(model_path):
+            if os.path.exists(path):
                try:
-                    self.models[name] = joblib.load(model_path)
+                    if name == 'lstm_mlp':
-                    print(f"Loaded {name} model")
+                        bundle = load_lstm_mlp_bundle(path)
-                except Exception as e:
+                        if bundle is not None:
-                    print(f"Failed to load {name}: {e}")
+                            self.models[name] = bundle
-        
+                    else:
                        self.models[name] = joblib.load(path)
                except Exception as exc:
                    print(f'Failed to load model {name}: {exc}')
        if os.path.exists(config.SCALER_PATH):
            self.scaler = joblib.load(config.SCALER_PATH)
-        
+        for filename, attr in [
-        feature_names_path = os.path.join(config.MODELS_DIR, 'feature_names.pkl')
+            ('feature_names.pkl', 'feature_names'),
-        if os.path.exists(feature_names_path):
+            ('selected_features.pkl', 'selected_features'),
-            self.feature_names = joblib.load(feature_names_path)
+            ('label_encoders.pkl', 'label_encoders'),
-        
+            ('model_metrics.pkl', 'model_metrics'),
-        selected_features_path = os.path.join(config.MODELS_DIR, 'selected_features.pkl')
+        ]:
-        if os.path.exists(selected_features_path):
+            path = os.path.join(config.MODELS_DIR, filename)
-            self.selected_features = joblib.load(selected_features_path)
+            if os.path.exists(path):
-        
+                try:
-        label_encoders_path = os.path.join(config.MODELS_DIR, 'label_encoders.pkl')
+                    setattr(self, attr, joblib.load(path))
-        if os.path.exists(label_encoders_path):
+                except Exception as exc:
-            self.label_encoders = joblib.load(label_encoders_path)
+                    print(f'Failed to load artifact {filename}: {exc}')
-        
+
-        metrics_path = os.path.join(config.MODELS_DIR, 'model_metrics.pkl')
+        valid_metrics = {key: value for key, value in self.model_metrics.items() if key in self.models}
-        if os.path.exists(metrics_path):
+        if valid_metrics:
-            self.model_metrics = joblib.load(metrics_path)
+            self.default_model = max(valid_metrics.items(), key=lambda item: item[1]['r2'])[0]
-        
+
        if self.model_metrics:
            valid_metrics = {k: v for k, v in self.model_metrics.items() if k in self.models}
            if valid_metrics:
                best_model = max(valid_metrics.items(), key=lambda x: x[1]['r2'])
                self.default_model = best_model[0]
    def get_available_models(self):
        self._ensure_models_loaded()
        models = []
        for name in self.models.keys():
-            info = MODEL_INFO.get(name, {
+            info = MODEL_INFO.get(name, {'name': name, 'name_cn': name, 'description': ''}).copy()
                'name': name,
                'name_cn': name,
                'description': ''
            }).copy()
            info['is_available'] = True
-            info['is_default'] = (name == self.default_model)
+            info['is_default'] = name == self.default_model
-            
+            info['metrics'] = self.model_metrics.get(name, {'r2': 0, 'rmse': 0, 'mae': 0})
            if name in self.model_metrics:
                info['metrics'] = self.model_metrics[name]
            else:
                info['metrics'] = {'r2': 0, 'rmse': 0, 'mae': 0}
            models.append(info)
-        
+        models.sort(key=lambda item: item['metrics']['r2'], reverse=True)
        models.sort(key=lambda x: x['metrics']['r2'], reverse=True)
        return models
-    
+
    def predict_single(self, data, model_type=None):
        self._ensure_models_loaded()
-        
+        model_type = model_type or self.default_model
        if model_type is None:
            model_type = self.default_model
        if model_type not in self.models:
-            available = list(self.models.keys())
+            fallback = next(iter(self.models), None)
-            if available:
+            if fallback is None:
                model_type = available[0]
            else:
                return self._get_default_prediction(data)
-        
+            model_type = fallback
        model = self.models[model_type]
        if self.scaler is None or self.feature_names is None:
            return self._get_default_prediction(data)
-        
+
        features = self._prepare_features(data)
        try:
-            predicted_hours = model.predict([features])[0]
+            if model_type == 'lstm_mlp':
-            predicted_hours = max(0, float(predicted_hours))
+                current_df = build_prediction_dataframe(data)
-        except Exception as e:
+                predicted_hours = predict_lstm_mlp(self.models[model_type], current_df)
-            print(f"Prediction error: {e}")
+            else:
                predicted_hours = self.models[model_type].predict([features])[0]
                predicted_hours = self._inverse_transform_prediction(predicted_hours)
            predicted_hours = max(0.5, float(predicted_hours))
        except Exception:
            return self._get_default_prediction(data)
-        
+
        risk_level, risk_label = self._get_risk_level(predicted_hours)
-        
+        confidence = max(0.5, self.model_metrics.get(model_type, {}).get('r2', 0.82))
        confidence = 0.85
        if model_type in self.model_metrics:
            confidence = max(0.5, self.model_metrics[model_type].get('r2', 0.85))
        return {
            'predicted_hours': round(predicted_hours, 2),
            'risk_level': risk_level,
            'risk_label': risk_label,
            'confidence': round(confidence, 2),
            'model_used': model_type,
-            'model_name_cn': MODEL_INFO.get(model_type, {}).get('name_cn', model_type)
+            'model_name_cn': MODEL_INFO.get(model_type, {}).get('name_cn', model_type),
        }
-    
+
    def predict_compare(self, data):
        self._ensure_models_loaded()
        results = []
        for name in self.models.keys():
-            try:
+            result = self.predict_single(data, name)
-                result = self.predict_single(data, name)
+            result['model'] = name
-                result['model'] = name
+            result['model_name_cn'] = MODEL_INFO.get(name, {}).get('name_cn', name)
-                result['model_name_cn'] = MODEL_INFO.get(name, {}).get('name_cn', name)
+            result['r2'] = self.model_metrics.get(name, {}).get('r2', 0)
-                
+            results.append(result)
-                if name in self.model_metrics:
+        results.sort(key=lambda item: item.get('r2', 0), reverse=True)
                    result['r2'] = self.model_metrics[name]['r2']
                else:
                    result['r2'] = 0
                results.append(result)
            except Exception as e:
                print(f"Compare error for {name}: {e}")
        results.sort(key=lambda x: x.get('r2', 0), reverse=True)
        if results:
            results[0]['recommended'] = True
        return results
-    
+
    def _prepare_features(self, data):
-        feature_map = {
+        X_df = build_prediction_dataframe(data)
-            'Reason for absence': data.get('reason_for_absence', 23),
+        X_df = engineer_features(X_df)
-            'Month of absence': data.get('month_of_absence', 7),
+        X_df = apply_label_encoders(X_df, self.label_encoders)
-            'Day of the week': data.get('day_of_week', 3),
+        X_df = align_feature_frame(X_df, self.feature_names)
-            'Seasons': data.get('seasons', 1),
+        features = self.scaler.transform(to_float_array(X_df))[0]
            'Transportation expense': data.get('transportation_expense', 200),
            'Distance from Residence to Work': data.get('distance', 20),
            'Service time': data.get('service_time', 5),
            'Age': data.get('age', 30),
            'Work load Average/day': data.get('work_load', 250),
            'Hit target': data.get('hit_target', 95),
            'Disciplinary failure': data.get('disciplinary_failure', 0),
            'Education': data.get('education', 1),
            'Son': data.get('son', 0),
            'Social drinker': data.get('social_drinker', 0),
            'Social smoker': data.get('social_smoker', 0),
            'Pet': data.get('pet', 0),
            'Body mass index': data.get('bmi', 25)
        }
        age = feature_map['Age']
        service_time = feature_map['Service time']
        work_load = feature_map['Work load Average/day']
        distance = feature_map['Distance from Residence to Work']
        expense = feature_map['Transportation expense']
        bmi = feature_map['Body mass index']
        son = feature_map['Son']
        pet = feature_map['Pet']
        social_drinker = feature_map['Social drinker']
        social_smoker = feature_map['Social smoker']
        hit_target = feature_map['Hit target']
        seasons = feature_map['Seasons']
        day_of_week = feature_map['Day of the week']
        derived_features = {
            'workload_per_age': work_load / (age + 1),
            'expense_per_distance': expense / (distance + 1),
            'age_service_ratio': age / (service_time + 1),
            'has_children': 1 if son > 0 else 0,
            'has_pet': 1 if pet > 0 else 0,
            'family_responsibility': son + pet,
            'health_risk': 1 if (social_drinker == 1 or social_smoker == 1 or bmi > 30) else 0,
            'lifestyle_risk': int(social_drinker) + int(social_smoker),
            'age_group': 1 if age <= 30 else (2 if age <= 40 else (3 if age <= 50 else 4)),
            'service_group': 1 if service_time <= 5 else (2 if service_time <= 10 else (3 if service_time <= 20 else 4)),
            'bmi_category': 1 if bmi <= 18.5 else (2 if bmi <= 25 else (3 if bmi <= 30 else 4)),
            'workload_category': 1 if work_load <= 200 else (2 if work_load <= 250 else (3 if work_load <= 300 else 4)),
            'commute_category': 1 if distance <= 10 else (2 if distance <= 20 else (3 if distance <= 50 else 4)),
            'seasonal_risk': 1 if seasons in [1, 3] else 0,
            'weekday_risk': 1 if day_of_week in [2, 6] else 0,
            'hit_target_ratio': hit_target / 100,
            'experience_level': 1 if service_time <= 5 else (2 if service_time <= 10 else (3 if service_time <= 15 else 4)),
            'age_workload_interaction': age * work_load / 10000,
            'service_bmi_interaction': service_time * bmi / 100
        }
        all_features = {**feature_map, **derived_features}
        features = []
        for fname in self.feature_names:
            if fname in all_features:
                val = all_features[fname]
                if fname in self.label_encoders:
                    try:
                        val = self.label_encoders[fname].transform([str(val)])[0]
                    except:
                        val = 0
                features.append(float(val))
            else:
                features.append(0.0)
        features = np.array(features).reshape(1, -1)
        features = self.scaler.transform(features)[0]
        if self.selected_features:
-            selected_indices = []
+            selected_indices = [self.feature_names.index(name) for name in self.selected_features if name in self.feature_names]
            for sf in self.selected_features:
                if sf in self.feature_names:
                    selected_indices.append(self.feature_names.index(sf))
            if selected_indices:
                features = features[selected_indices]
        return features
-    
+
    def _inverse_transform_prediction(self, prediction):
        if self.training_metadata.get('target_transform') == 'log1p':
            return float(np.expm1(prediction))
        return float(prediction)
    def _get_risk_level(self, hours):
        if hours < 4:
            return 'low', '低风险'
-        elif hours <= 8:
+        if hours <= 8:
            return 'medium', '中风险'
-        else:
+        return 'high', '高风险'
-            return 'high', '高风险'
+
    def _get_default_prediction(self, data):
-        base_hours = 5.0
+        base_hours = 3.8
-        
+        base_hours += min(float(data.get('monthly_overtime_hours', 24)) / 20, 3.0)
-        expense = data.get('transportation_expense', 200)
+        base_hours += min(float(data.get('commute_minutes', 40)) / 50, 2.0)
-        if expense > 300:
+        base_hours += 1.6 if int(data.get('is_night_shift', 0)) == 1 else 0
-            base_hours += 1.0
+        base_hours += 1.8 if int(data.get('chronic_disease_flag', 0)) == 1 else 0
-        elif expense < 150:
+        base_hours += 0.9 if int(data.get('near_holiday_flag', 0)) == 1 else 0
        base_hours += 0.8 if int(data.get('medical_certificate_flag', 0)) == 1 else 0
        base_hours += 0.5 * int(data.get('children_count', 0))
        if data.get('leave_type') in ['病假', '工伤假', '婚假', '丧假']:
            base_hours += 2.5
        if data.get('stress_level') == '高':
            base_hours += 0.9
        if data.get('performance_level') == 'A':
            base_hours -= 0.5
        distance = data.get('distance', 20)
        if distance > 40:
            base_hours += 1.5
        elif distance > 25:
            base_hours += 0.8
        service_time = data.get('service_time', 5)
        if service_time < 3:
            base_hours += 0.5
        elif service_time > 15:
            base_hours -= 0.5
        age = data.get('age', 30)
        if age > 50:
            base_hours += 0.5
        elif age < 25:
            base_hours += 0.3
        work_load = data.get('work_load', 250)
        if work_load > 300:
            base_hours += 1.5
        elif work_load > 260:
            base_hours += 0.5
        bmi = data.get('bmi', 25)
        if bmi > 30:
            base_hours += 0.8
        elif bmi < 20:
            base_hours += 0.3
        if data.get('social_drinker', 0) == 1:
            base_hours += 0.8
        if data.get('social_smoker', 0) == 1:
            base_hours += 0.5
        son = data.get('son', 0)
        if son > 0:
            base_hours += 0.3 * son
        pet = data.get('pet', 0)
        if pet > 0:
            base_hours -= 0.1 * pet
        hit_target = data.get('hit_target', 95)
        if hit_target < 90:
            base_hours += 0.5
        base_hours = max(0.5, base_hours)
        risk_level, risk_label = self._get_risk_level(base_hours)
        return {
-            'predicted_hours': round(base_hours, 2),
+            'predicted_hours': round(max(0.5, base_hours), 2),
            'risk_level': risk_level,
            'risk_label': risk_label,
-            'confidence': 0.75,
+            'confidence': 0.72,
            'model_used': 'default',
-            'model_name_cn': '默认规则'
+            'model_name_cn': '默认规则',
        }
-    
+
    def get_model_info(self):
        self._ensure_models_loaded()
        models = self.get_available_models()
        return {
-            'models': models,
+            'models': self.get_available_models(),
            'training_info': {
-                'train_samples': 2884,
+                'train_samples': self.training_metadata.get('train_samples', 0),
-                'test_samples': 722,
+                'test_samples': self.training_metadata.get('test_samples', 0),
-                'feature_count': len(self.feature_names) if self.feature_names else 20,
+                'feature_count': self.training_metadata.get('feature_count_after_selection', 0),
-                'training_date': '2026-03-08'
+                'training_date': self.training_metadata.get('training_date', ''),
-            }
+                'sequence_window_size': self.training_metadata.get('sequence_window_size', 0),
                'deep_learning_available': self.training_metadata.get('deep_learning_available', False),
            },
        }
--- a/docs/0.md
+++ b/docs/0.md
@@ -1,83 +0,0 @@
 既然你的题目是**《基于多维特征挖掘的员工缺勤影响因素分析与预测研究》**，你的前端就不应该是一个“考勤录入系统”（比如点击“打卡”按钮），而应该是一个**“数据分析与可视化大屏”**。
 你的前端核心任务是：**把算法跑出来的结果，用图表漂亮地展示出来，并提供一个交互式的“预测窗口”。**
 以下是为你规划的**前端功能模块（4-5个页面）**，每个页面都直接对应你的题目和算法：
 ---
 ### 页面一：数据概览与全局统计
 **目的：** 让人一眼看懂数据集的整体情况。
 *   **关键指标卡片（KPI）：**
    *   总样本数（例如：740）
    *   平均缺勤时长
    *   高风险员工占比
    *   最常见的缺勤原因（例如：牙科咨询）
 *   **可视化图表：**
    *   **缺勤原因分布饼图：** 展示各种 ICD 疾病代码（或医疗咨询、献血等）的比例。
    *   **每月缺勤趋势折线图：** 横轴是1-12月，纵轴是缺勤总时长，看看哪个月大家最爱请假（是不是夏天？）。
    *   **星期几缺勤热力图：** 周一到周五，哪天颜色最深（缺勤最多）。
 ---
 ### 页面二：影响因素分析 —— **对应题目的“影响因素分析”**
 **目的：** 展示你的核心算法成果（特征重要性、相关性），回答“为什么缺勤”。
 *   **核心图表 1：特征重要性排序条形图**
    *   **内容：** 横轴是特征（距离、BMI、饮酒、工龄...），纵轴是重要性得分。
    *   **设计：** 降序排列，最高的那个（比如 Reason for absence 或 Service time）在最上面或最左边。
    *   **交互：** 鼠标悬停显示具体分数。
 *   **核心图表 2：相关性热力图**
    *   **内容：** 展示各个字段之间的相关系数矩阵。
    *   **亮点：** 高亮显示“饮酒”与“缺勤时长”的交点，或者“通勤距离”与“缺勤时长”的交点，颜色越深代表关联越强。
 *   **群体对比分析：**
    *   **柱状图：** 饮酒者 vs 不饮酒者的平均缺勤时长对比。
    *   **柱状图：** 高学历 vs 低学历的缺勤时长对比。
 ---
 ### 页面三：缺勤预测模型 —— **对应题目的“预测研究”**
 **目的：** 提供一个交互窗口，演示你的 XGBoost/随机森林模型是如何工作的。
 *   **左侧：参数输入表单**
    *   设计一个表单，列出数据集中的关键字段（供用户填写）：
        *   *ID*：随意填（如 36）
        *   *Reason for absence*：下拉菜单（1-28 或 归类后的“疾病/个人事务”）
        *   *Month*：下拉菜单（1-12）
        *   *Day of week*：下拉菜单（周一-周五）
        *   *Transportation expense*：滑动条或输入框（例如：200）
        *   *Distance from Residence to Work*：输入框（例如：15）
        *   *Service time*：输入框（例如：10年）
        *   *Age*：输入框（例如：35）
        *   *Work load Average/day*：输入框（例如：250000）
        *   *Hit target*：输入框（例如：90%）
        *   *Disciplinary failure*：单选框（是/否）
        *   *Education*：下拉菜单（高中/本科/硕士...）
        *   *Son*：数字输入（0, 1, 2...）
        *   *Social drinker*：单选框（是/否）
        *   *Social smoker*：单选框（是/否）
        *   *Pet*：数字输入
        *   *Body mass index*：输入框（例如：25）
    *   **底部按钮：** **“开始预测”**
 *   **右侧：预测结果展示**
    *   **结果数字：** 预测的缺勤时长（例如：预测结果 8 小时）。
    *   **风险等级：**
        *   < 4小时：绿色标签（低风险）
        *   4-8小时：黄色标签（中风险）
        *   > 8小时：红色标签（高风险，警钟图标）
    *   **模型可信度：** 显示当前模型的准确率（例如：85% Accuracy）。
 ---
 ### 页面四：员工画像与聚类 —— **对应“多维特征挖掘”的进阶**
 **目的：** 展示 K-Means 聚类算法挖掘出的群体特征。
 *   **雷达图：**
    *   画 3-4 个多边形，代表 3-4 类员工（如：模范型、压力型、生活习惯型）。
    *   轴向维度：[年龄, 工龄, 负荷, BMI, 缺勤倾向]。
    *   让人一眼看出不同群体的差异（例如：压力型的“负荷”轴特别长）。
 *   **散点图：**
    *   横轴：年龄，纵轴：缺勤时长。点按聚类结果着色（红点、蓝点、绿点）。
 ---
 ### 推荐技术栈（实现难度低，效果好）
 为了在短时间内做出漂亮的图表，推荐以下组合：
 1.  **前端框架：** **Vue.js** (Vue 3) 或 **React**。推荐 Vue，国内毕设用得极多，文档好查。
 2.  **UI 组件库：** **Element Plus** (配合 Vue) 或 **Ant Design**。
    *   这里面的表单组件、卡片、按钮可以直接拖过来用，不用自己写 CSS。
 3.  **图表库：** **ECharts** (百度开源的)。
    *   **必杀技：** 它的柱状图、饼图、雷达图、热力图效果非常炫酷，支持动画，非常适合答辩演示。
 4.  **后端接口：** Python **Flask** 或 **FastAPI**。
    *   写几个简单的 API 接口（`/api/predict`, `/api/feature_importance`），前端调这些接口拿数据。
 ### 答辩时的演示脚本
 1.  **打开页面一：** “大家请看，这是 700 多条数据的概览，我们发现周五的缺勤率最高...”
 2.  **打开页面二：** “通过随机森林算法，我们计算了各因素的影响权重，发现‘通勤距离’和‘工作负荷’是导致缺勤的两大主因...”
 3.  **打开页面三：** “为了验证模型实用性，我构建了这个预测模块。假设我们有一个 35 岁、住得很远、爱喝酒的员工，系统预测他可能会缺勤 8 小时，属于高风险...”
 4.  **打开页面四：** “最后通过聚类分析，我们将员工分为了三类，红色群体是‘高压高负荷’群体，建议HR重点关注...”
 这样一套下来，你的前端不仅漂亮，而且逻辑紧扣题目，绝对是加分项！
--- a/docs/00_需求规格说明书.md
+++ b/docs/00_需求规格说明书.md
@@ -1,609 +1,115 @@
 # 需求规格说明书
-## 基于多维特征挖掘的员工缺勤分析与预测系统
+## 1. 项目概述
-**文档版本**：V1.0  
+### 1.1 项目名称
 **编写日期**：2026年3月  
 **编写人**：张硕
---
+基于中国企业员工缺勤事件分析与预测系统
 ## 1. 引言
 ### 1.1 编写目的
 本文档旨在详细说明"基于多维特征挖掘的员工缺勤分析与预测系统"的功能需求和非功能需求，为系统的设计、开发、测试和验收提供依据。本文档的预期读者包括：
 - 项目指导教师
 - 系统开发人员
 - 测试人员
 - 项目评审专家
 ### 1.2 项目背景
-#### 1.2.1 课题来源
+在企业人力资源管理中，员工缺勤不仅影响排班与生产效率，也会对团队稳定性、运营成本和服务质量造成影响。传统方式往往依赖人工统计与经验判断，难以及时识别风险。为提升企业对缺勤行为的洞察能力，本项目设计并实现一个面向企业管理场景的缺勤分析与预测系统，通过数据统计、特征分析、风险预测和员工画像等功能，为管理者提供辅助决策支持。
-本课题为河南农业大学软件学院本科毕业设计项目。
+### 1.3 项目目标
-#### 1.2.2 项目背景
+- 实现缺勤数据的全局统计与可视化展示
 - 分析影响缺勤时长的关键因素
 - 提供单次缺勤风险预测能力
 - 基于聚类结果构建员工群体画像
 - 形成一套完整的毕业设计系统与配套论文材料
-随着企业数字化转型的深入推进，人力资源管理正从经验驱动向数据驱动转变。员工缺勤作为影响企业运营效率的重要因素，其背后蕴含着丰富的多维度信息。传统的缺勤管理方式主要依赖人工统计和经验判断，缺乏对多维度特征之间复杂关系的深入挖掘。
+## 2. 用户角色与使用场景
-本系统基于UCI Absenteeism数据集，利用机器学习算法对员工考勤数据进行深度分析，挖掘影响缺勤的多维度特征，构建缺勤预测模型，为企业人力资源管理提供科学、客观的决策支持。
+### 2.1 用户角色
-#### 1.2.3 术语定义
+- 企业人力资源管理人员
 - 部门负责人
 - 学校答辩评审教师
-| 术语 | 定义 |
+### 2.2 使用场景
 |------|------|
 | UCI | University of California Irvine，加州大学欧文分校，著名的机器学习数据集仓库 |
 | ICD | International Classification of Diseases，国际疾病分类代码 |
 | 缺勤 | 员工在应该工作的时间内未出勤的情况 |
 | 特征挖掘 | 从原始数据中提取有价值的特征信息的过程 |
 | K-Means | 一种经典的无监督聚类算法 |
 | 随机森林 | 一种基于决策树的集成学习算法 |
 | XGBoost | 一种高效的梯度提升算法 |
---
+- 查看缺勤整体趋势和分布结构
-
+- 分析不同因素对缺勤时长的影响
-## 2. 项目概述
+- 输入关键因素并预测单次缺勤风险
-
+- 查看员工群体画像与典型群体差异
-### 2.1 项目目标
+- 在毕业设计答辩中展示系统界面与分析结果
 本项目的核心目标是设计并实现一个完整的员工缺勤分析与预测系统，具体目标如下：
 1. **数据概览**：提供直观的数据统计和可视化展示，帮助企业快速了解整体考勤状况
 2. **因素分析**：挖掘影响缺勤的关键因素，回答"为什么缺勤"的问题
 3. **风险预测**：构建预测模型，实现对员工缺勤风险的精准识别和预警
 4. **员工画像**：利用聚类算法对员工进行分群，实现精细化管理
 ### 2.2 功能概述
 系统包含四大核心功能模块：
 | 模块编号 | 模块名称 | 功能概述 |
 |----------|----------|----------|
 | F01 | 数据概览与全局统计 | 展示基础统计指标、时间维度趋势、缺勤原因分布 |
 | F02 | 多维特征挖掘与影响因素分析 | 特征重要性排序、相关性分析、群体对比 |
 | F03 | 员工缺勤风险预测 | 单次预测、风险等级评估、模型性能展示 |
 | F04 | 员工画像与群体聚类 | K-Means聚类、群体雷达图、散点图展示 |
 ### 2.3 用户特征
 系统的目标用户主要包括：
 | 用户类型 | 描述 | 主要使用场景 |
 |----------|------|--------------|
 | HR管理人员 | 企业人力资源部门工作人员 | 查看考勤统计、识别高风险员工、制定管理策略 |
 | 部门主管 | 各业务部门负责人 | 了解本部门员工考勤情况、优化工作安排 |
 | 数据分析师 | 企业数据分析人员 | 深入分析考勤数据、挖掘潜在规律 |
 ### 2.4 运行环境
 #### 2.4.1 硬件环境
 | 项目 | 最低配置 | 推荐配置 |
 |------|----------|----------|
 | CPU | 双核 2.0GHz | 四核 2.5GHz及以上 |
 | 内存 | 4GB | 8GB及以上 |
 | 硬盘 | 10GB可用空间 | 20GB及以上 |
 | 网络 | 10Mbps | 100Mbps及以上 |
 #### 2.4.2 软件环境
 | 项目 | 要求 |
 |------|------|
 | 操作系统 | Windows 10/11、Linux、macOS |
 | 浏览器 | Chrome 90+、Firefox 88+、Edge 90+ |
 | Python版本 | 3.8及以上 |
 | Node.js版本 | 16.0及以上 |
 ---
 ## 3. 功能需求
-### 3.1 F01 数据概览与全局统计
+### 3.1 数据概览模块
-#### 3.1.1 F01-01 基础统计指标展示
+- 展示总缺勤事件数、员工覆盖数、平均缺勤时长、高风险事件占比等关键指标
 - 展示月度缺勤趋势
 - 展示星期分布
 - 展示请假原因大类分布
 - 展示季节分布
-**功能描述**：系统自动加载数据集，计算并展示关键统计指标。
+### 3.2 影响因素分析模块
-**输入**：无（自动加载）
+- 展示特征重要性排序
 - 展示关键变量相关性热力图
 - 提供多维度群体对比分析
 - 支持从行业、班次、岗位序列、婚姻状态、慢性病史等维度查看平均缺勤差异
-**输出**：
+### 3.3 缺勤预测模块
-| 指标名称 | 说明 |
+- 支持输入核心业务因子进行缺勤时长预测
-|----------|------|
+- 支持显示风险等级与置信度
-| 样本总数 | 数据集中的记录总数 |
+- 支持多模型结果对比
-| 员工总数 | 去重后的员工人数 |
+- 支持自动选择较优模型
 | 缺勤总时长 | 所有记录的缺勤小时数总和 |
 | 平均缺勤时长 | 每条记录的平均缺勤小时数 |
 | 最大缺勤时长 | 单次最大缺勤小时数 |
 | 最小缺勤时长 | 单次最小缺勤小时数 |
 | 高风险员工占比 | 缺勤时长超过8小时的员工比例 |
-**业务规则**：
+### 3.4 员工画像模块
 - 高风险定义：单次缺勤时长 > 8小时
 - 统计数据实时计算，不缓存
-**界面展示**：以KPI卡片形式展示，每个指标一张卡片。
+- 支持 K-Means 聚类分析
-
+- 展示群体雷达图
---
+- 展示聚类散点图
-
+- 展示各群体人数、占比和文字说明
 #### 3.1.2 F01-02 月度缺勤趋势分析
 **功能描述**：以折线图形式展示全年（1-12月）的缺勤变化趋势。
 **输入**：无
 **输出**：
 | 字段 | 说明 |
 |------|------|
 | month | 月份（1-12） |
 | total_hours | 该月缺勤总时长 |
 | avg_hours | 该月平均缺勤时长 |
 | record_count | 该月记录数 |
 **界面展示**：
 - 图表类型：折线图
 - 横轴：月份（1-12月）
 - 纵轴：缺勤时长（小时）
 - 支持鼠标悬停显示具体数值
 ---
 #### 3.1.3 F01-03 星期分布分析
 **功能描述**：分析周一至周五的缺勤分布情况。
 **输入**：无
 **输出**：
 | 字段 | 说明 |
 |------|------|
 | weekday | 星期（周一至周五） |
 | total_hours | 该星期缺勤总时长 |
 | avg_hours | 该星期平均缺勤时长 |
 | record_count | 该星期记录数 |
 **界面展示**：
 - 图表类型：柱状图或热力图
 - 横轴：星期（周一至周五）
 - 纵轴：缺勤时长或记录数
 ---
 #### 3.1.4 F01-04 缺勤原因分布分析
 **功能描述**：展示各类缺勤原因的占比分布。
 **输入**：无
 **输出**：
 | 字段 | 说明 |
 |------|------|
 | reason_code | 缺勤原因代码（0-28） |
 | reason_name | 缺勤原因名称 |
 | count | 该原因出现次数 |
 | percentage | 占比百分比 |
 **缺勤原因分类**：
 | 代码范围 | 类别 | 说明 |
 |----------|------|------|
 | 1-21 | ICD疾病 | 国际疾病分类代码 |
 | 22 | 医疗随访 | 患者随访 |
 | 23 | 医疗咨询 | 门诊咨询 |
 | 24 | 献血 | 无偿献血 |
 | 25 | 实验室检查 | 医学检查 |
 | 26 | 无故缺勤 | 未经批准的缺勤 |
 | 27 | 理疗 | 物理治疗 |
 | 28 | 牙科咨询 | 口腔科就诊 |
 | 0 | 未知 | 原因未记录 |
 **界面展示**：
 - 图表类型：饼图
 - 显示各类原因的占比
 - 支持点击查看详情
 ---
 ### 3.2 F02 多维特征挖掘与影响因素分析
 #### 3.2.1 F02-01 特征重要性排序
 **功能描述**：利用训练好的随机森林模型，计算各维度特征对缺勤的影响权重。
 **输入**：无
 **输出**：
 | 字段 | 说明 |
 |------|------|
 | feature_name | 特征名称 |
 | importance_score | 重要性得分（0-1） |
 | rank | 排名 |
 **分析的特征包括**：
 | 特征名称 | 中文名称 | 特征类型 |
 |----------|----------|----------|
 | Reason for absence | 缺勤原因 | 类别型 |
 | Month of absence | 缺勤月份 | 类别型 |
 | Day of the week | 星期几 | 类别型 |
 | Seasons | 季节 | 类别型 |
 | Transportation expense | 交通费用 | 数值型 |
 | Distance from Residence to Work | 通勤距离 | 数值型 |
 | Service time | 工龄 | 数值型 |
 | Age | 年龄 | 数值型 |
 | Work load Average/day | 日均工作负荷 | 数值型 |
 | Hit target | 达标率 | 数值型 |
 | Disciplinary failure | 违纪记录 | 二分类 |
 | Education | 学历 | 类别型 |
 | Son | 子女数量 | 数值型 |
 | Social drinker | 饮酒习惯 | 二分类 |
 | Social smoker | 吸烟习惯 | 二分类 |
 | Pet | 宠物数量 | 数值型 |
 | Body mass index | BMI指数 | 数值型 |
 **界面展示**：
 - 图表类型：水平柱状图
 - 按重要性得分降序排列
 - 鼠标悬停显示具体分数
 ---
 #### 3.2.2 F02-02 相关性热力图分析
 **功能描述**：计算特征之间的皮尔逊相关系数，以热力图形式展示。
 **输入**：无
 **输出**：相关系数矩阵（n×n）
 **重点关注的关联**：
 - 生活习惯（饮酒、吸烟）与缺勤时长的相关性
 - 通勤距离与缺勤时长的相关性
 - 工作负荷与缺勤时长的相关性
 **界面展示**：
 - 图表类型：热力图
 - 颜色范围：-1（负相关，蓝色）到 +1（正相关，红色）
 - 支持鼠标悬停显示具体相关系数
 ---
 #### 3.2.3 F02-03 群体对比分析
 **功能描述**：按不同维度分组，对比各组的平均缺勤时长。
 **支持的对比维度**：
 | 维度 | 分组 |
 |------|------|
 | 饮酒习惯 | 饮酒者 vs 不饮酒者 |
 | 吸烟习惯 | 吸烟者 vs 不吸烟者 |
 | 学历 | 高中 vs 本科 vs 研究生及以上 |
 | 是否有子女 | 有子女 vs 无子女 |
 | 是否有宠物 | 有宠物 vs 无宠物 |
 **输出**：
 | 字段 | 说明 |
 |------|------|
 | group_name | 分组名称 |
 | avg_hours | 平均缺勤时长 |
 | count | 记录数 |
 **界面展示**：
 - 图表类型：分组柱状图
 - 支持切换不同的对比维度
 - 显示差异百分比
 ---
 ### 3.3 F03 员工缺勤风险预测
 #### 3.3.1 F03-01 单次缺勤预测
 **功能描述**：接收用户输入的员工属性，调用预测模型返回预测的缺勤时长。
 **输入参数**：
 | 参数名 | 类型 | 取值范围 | 必填 |
 |--------|------|----------|------|
 | reason_for_absence | int | 0-28 | 是 |
 | month_of_absence | int | 1-12 | 是 |
 | day_of_week | int | 2-6 | 是 |
 | seasons | int | 1-4 | 是 |
 | transportation_expense | int | 100-400 | 是 |
 | distance | int | 1-60 | 是 |
 | service_time | int | 1-30 | 是 |
 | age | int | 18-60 | 是 |
 | work_load | float | 200-350 | 是 |
 | hit_target | int | 80-100 | 是 |
 | disciplinary_failure | int | 0-1 | 是 |
 | education | int | 1-4 | 是 |
 | son | int | 0-5 | 是 |
 | social_drinker | int | 0-1 | 是 |
 | social_smoker | int | 0-1 | 是 |
 | pet | int | 0-10 | 是 |
 | bmi | float | 18-40 | 是 |
 **输出**：
 | 字段 | 说明 |
 |------|------|
 | predicted_hours | 预测的缺勤时长（小时） |
 | risk_level | 风险等级（low/medium/high） |
 | confidence | 模型置信度 |
 **风险等级判定规则**：
 | 预测时长 | 风险等级 | 颜色标识 |
 |----------|----------|----------|
 | < 4小时 | 低风险（low） | 绿色 |
 | 4-8小时 | 中风险（medium） | 黄色 |
 | > 8小时 | 高风险（high） | 红色 |
 **界面展示**：
 - 左侧：参数输入表单
 - 右侧：预测结果展示
 - 底部：开始预测按钮
 ---
 #### 3.3.2 F03-02 风险等级评估
 **功能描述**：根据预测结果，自动评估并展示风险等级。
 **业务规则**：
 - 风险等级根据预测时长自动计算
 - 高风险员工需要特别关注标识
 - 支持风险等级的筛选和统计
 ---
 #### 3.3.3 F03-03 模型性能展示
 **功能描述**：展示当前预测模型的性能指标。
 **输出指标**：
 | 指标名称 | 说明 | 目标值 |
 |----------|------|--------|
 | R² | 决定系数 | ≥ 0.80 |
 | MSE | 均方误差 | - |
 | RMSE | 均方根误差 | - |
 | MAE | 平均绝对误差 | - |
 | 训练样本数 | 模型训练使用的样本量 | - |
 **界面展示**：
 - 以卡片形式展示各指标
 - 包含模型类型说明（随机森林/XGBoost）
 ---
 ### 3.4 F04 员工画像与群体聚类
 #### 3.4.1 F04-01 K-Means聚类结果展示
 **功能描述**：利用K-Means算法对员工进行聚类分析。
 **输入参数**（可选）：
 | 参数名 | 类型 | 默认值 | 说明 |
 |--------|------|--------|------|
 | n_clusters | int | 3 | 聚类数量 |
 **输出**：
 | 字段 | 说明 |
 |------|------|
 | cluster_id | 聚类编号 |
 | cluster_name | 聚类名称（自动生成或人工标注） |
 | member_count | 该聚类包含的员工数 |
 | center_point | 聚类中心点坐标 |
 **聚类特征维度**：
 - 年龄
 - 工龄
 - 工作负荷
 - BMI指数
 - 缺勤倾向
 ---
 #### 3.4.2 F04-02 员工群体雷达图
 **功能描述**：以雷达图形式展示各聚类群体的特征分布。
 **输出**：
 - 各聚类在多个维度上的特征值（归一化后）
 **界面展示**：
 - 图表类型：雷达图
 - 每个聚类用不同颜色表示
 - 维度：年龄、工龄、工作负荷、BMI、缺勤倾向
 ---
 #### 3.4.3 F04-03 聚类散点图
 **功能描述**：以散点图形式展示员工在聚类空间的分布。
 **输出**：
 | 字段 | 说明 |
 |------|------|
 | employee_id | 员工ID |
 | x | 横坐标（年龄或PCA降维后的第一主成分） |
 | y | 纵坐标（缺勤时长或PCA降维后的第二主成分） |
 | cluster_id | 所属聚类编号 |
 **界面展示**：
 - 图表类型：散点图
 - 不同聚类用不同颜色区分
 - 支持鼠标悬停查看员工详情
 ---
 ## 4. 非功能需求
 ### 4.1 性能需求
-| 指标 | 要求 |
+- 页面首次加载时间应控制在合理范围内
-|------|------|
+- 预测接口应在几秒内返回结果
-| 页面加载时间 | 首屏加载时间 ≤ 3秒 |
+- 常规图表接口应快速完成响应
 | 接口响应时间 | 普通查询接口 ≤ 500ms |
 | 预测接口响应时间 | ≤ 1秒 |
 | 并发用户数 | 支持10个并发用户 |
 | 数据处理能力 | 支持10000条以上记录处理 |
-### 4.2 安全需求
+### 4.2 易用性需求
-| 需求项 | 说明 |
+- 界面布局清晰，适合课堂与答辩展示
-|--------|------|
+- 关键功能入口明确，减少复杂操作
-| 数据安全 | 数据文件存储安全，防止未授权访问 |
+- 图表与统计结果需具有直观解释性
 | 接口安全 | API接口具备基本的访问控制 |
 | 输入验证 | 前后端均需对用户输入进行校验 |
 | 错误处理 | 不向前端暴露敏感的错误信息 |
-### 4.3 可用性需求
+### 4.3 可维护性需求
-| 需求项 | 说明 |
+- 前后端模块划分清晰
-|--------|------|
+- 数据生成、训练、服务和展示逻辑相互独立
-| 界面友好 | 界面简洁明了，操作直观 |
+- 文档和代码保持一致
 | 响应式设计 | 支持不同屏幕尺寸访问 |
 | 错误提示 | 提供清晰的错误提示和操作引导 |
 | 帮助信息 | 关键功能提供操作提示 |
 | 可访问性 | 支持主流浏览器访问 |
-### 4.4 兼容性需求
+### 4.4 安全性需求
-| 类型 | 要求 |
+- 系统不使用真实企业敏感数据
-|------|------|
+- 数据集中不包含身份证号、手机号、详细住址等敏感信息
 | 浏览器兼容 | Chrome 90+、Firefox 88+、Edge 90+、Safari 14+ |
 | 操作系统 | Windows 10/11、macOS 10.15+、主流Linux发行版 |
 | 屏幕分辨率 | 支持1366×768及以上分辨率 |
-### 4.5 可维护性需求
+## 5. 业务规则
-| 需求项 | 说明 |
+- 每条数据代表一次员工缺勤事件
-|--------|------|
+- 预测目标为缺勤时长（小时）
-| 代码规范 | 遵循Python PEP8和Vue风格指南 |
+- 风险等级划分规则：
-| 注释文档 | 关键代码提供注释说明 |
+  - 低风险：缺勤时长小于 4 小时
-| 模块化设计 | 高内聚低耦合，便于维护扩展 |
+  - 中风险：缺勤时长 4 至 8 小时
-| 版本控制 | 使用Git进行版本管理 |
+  - 高风险：缺勤时长大于 8 小时
---
+## 6. 约束条件
-## 5. 用例图与用例描述
+- 系统采用前后端分离架构
 - 后端基于 Flask
 - 前端基于 Vue 3 与 Element Plus
 - 训练数据采用项目内部生成的企业场景数据
-### 5.1 用例图
+## 7. 验收标准
-```
+- 系统可正常启动并访问所有页面
-                    +------------------------------------------+
+- 各模块接口能够正确返回数据
-                    |       员工缺勤分析与预测系统              |
+- 预测模块可以返回缺勤时长与风险等级
-                    |                                          |
+- 影响因素分析与聚类页面可正常展示图表
-                    |  +------------------+                    |
+- 文档能够支撑毕业设计提交和答辩展示
                    |  | F01 数据概览     |                    |
                    |  +------------------+                    |
                    |  | - 基础统计       |                    |
                    |  | - 月度趋势       |                    |
                    |  | - 星期分布       |                    |
                    |  | - 原因分布       |                    |
                    |  +------------------+                    |
                    |                                          |
                    |  +------------------+                    |
                    |  | F02 影响因素分析 |                    |
                    |  +------------------+                    |
   +--------+       |  | - 特征重要性     |       +--------+   |
   |        |------>|  | - 相关性分析     |<------|        |   |
   |  用户  |       |  | - 群体对比       |       |  用户  |   |
   |        |<------|  +------------------+       |        |   |
   +--------+       |                          +--------+   |
                    |  +------------------+     |        |   |
                    |  | F03 缺勤预测     |<----|  用户  |   |
                    |  +------------------+     |        |   |
                    |  | - 单次预测       |     +--------+   |
                    |  | - 风险评估       |                    |
                    |  | - 模型性能       |                    |
                    |  +------------------+                    |
                    |                                          |
                    |  +------------------+                    |
                    |  | F04 员工画像     |                    |
                    |  +------------------+                    |
                    |  | - 聚类结果       |                    |
                    |  | - 群体雷达图     |                    |
                    |  | - 散点图         |                    |
                    |  +------------------+                    |
                    |                                          |
                    +------------------------------------------+
 ```
 ### 5.2 用例详细描述
 #### UC01 查看数据概览
 | 项目 | 描述 |
 |------|------|
 | 用例名称 | 查看数据概览 |
 | 参与者 | 用户 |
 | 前置条件 | 用户已打开系统 |
 | 主要流程 | 1. 系统加载数据集<br>2. 计算基础统计指标<br>3. 展示KPI卡片<br>4. 渲染月度趋势图<br>5. 渲染星期分布图<br>6. 渲染原因分布饼图 |
 | 后置条件 | 数据概览页面展示完成 |
 | 异常流程 | 数据加载失败时显示错误提示 |
 #### UC02 分析影响因素
 | 项目 | 描述 |
 |------|------|
 | 用例名称 | 分析影响因素 |
 | 参与者 | 用户 |
 | 前置条件 | 预测模型已训练完成 |
 | 主要流程 | 1. 加载训练好的模型<br>2. 提取特征重要性<br>3. 计算相关系数矩阵<br>4. 展示特征重要性柱状图<br>5. 展示相关性热力图<br>6. 支持切换群体对比维度 |
 | 后置条件 | 影响因素分析结果展示完成 |
 #### UC03 进行缺勤预测
 | 项目 | 描述 |
 |------|------|
 | 用例名称 | 进行缺勤预测 |
 | 参与者 | 用户 |
 | 前置条件 | 预测模型已训练完成 |
 | 主要流程 | 1. 用户填写员工属性表单<br>2. 点击"开始预测"按钮<br>3. 系统调用预测模型<br>4. 返回预测结果<br>5. 展示风险等级 |
 | 后置条件 | 预测结果展示完成 |
 | 异常流程 | 输入参数不合法时提示错误 |
 #### UC04 查看员工画像
 | 项目 | 描述 |
 |------|------|
 | 用例名称 | 查看员工画像 |
 | 参与者 | 用户 |
 | 前置条件 | 聚类模型已训练完成 |
 | 主要流程 | 1. 执行K-Means聚类<br>2. 计算聚类中心<br>3. 展示聚类结果<br>4. 渲染群体雷达图<br>5. 渲染散点分布图 |
 | 后置条件 | 员工画像展示完成 |
 ---
 ## 6. 附录
 ### 6.1 参考文档
 1. UCI Machine Learning Repository. Absenteeism at work Data Set
 2. 开题报告文档
 3. 项目架构设计文档
 ### 6.2 文档修改历史
 | 版本 | 日期 | 修改人 | 修改内容 |
 |------|------|--------|----------|
 | V1.0 | 2026-03 | 张硕 | 初始版本 |
 ---
 **文档结束**
--- a/docs/01_系统架构设计.md
+++ b/docs/01_系统架构设计.md
@@ -1,613 +1,149 @@
-# 系统架构设计文档
+# 系统架构设计
-## 基于多维特征挖掘的员工缺勤分析与预测系统
+## 1. 总体架构
-**文档版本**：V1.0  
+系统采用前后端分离架构：
 **编写日期**：2026年3月  
 **编写人**：张硕
---
+- 前端：Vue 3 + Vue Router + Element Plus + ECharts
 - 后端：Flask + Pandas + Scikit-learn + PyTorch + Joblib
 - 数据层：CSV 数据文件 + 模型文件
-## 1. 概述
+整体架构分为四层：
-### 1.1 设计目标
+1. 表现层：负责页面展示、表单交互和图表可视化
 2. 接口层：负责 HTTP 路由转发与请求响应
 3. 业务层：负责数据统计、特征分析、预测和聚类逻辑
 4. 数据与模型层：负责原始数据、清洗数据、模型文件和训练元数据
-本系统架构设计旨在实现以下目标：
+## 2. 前端架构设计
-1. **高可用性**：系统稳定可靠，能够持续提供服务
+### 2.1 模块划分
 2. **可扩展性**：便于后续功能扩展和算法升级
 3. **可维护性**：代码结构清晰，便于理解和维护
 4. **高性能**：快速响应前端请求，提供流畅的用户体验
-### 1.2 设计原则
+- `Dashboard.vue`：数据概览页
 - `FactorAnalysis.vue`：影响因素分析页
 - `Prediction.vue`：缺勤预测页
 - `Clustering.vue`：员工画像页
-| 原则 | 说明 |
+### 2.2 核心职责
 |------|------|
 | 分层设计 | 前后端分离，后端采用三层架构 |
 | 模块化 | 功能模块独立，高内聚低耦合 |
 | 单一职责 | 每个模块只负责一个特定功能 |
 | 开闭原则 | 对扩展开放，对修改关闭 |
 | 接口隔离 | 接口设计精简，避免冗余 |
---
+- 页面布局与导航
-
+- 图表渲染
-## 2. 系统架构
+- 表单输入与结果展示
-
+- 接口调用与状态管理
 ### 2.1 整体架构图
 ```
 ┌─────────────────────────────────────────────────────────────────────┐
 │                           用户层 (User Layer)                        │
 │                     浏览器 (Chrome/Firefox/Edge)                     │
 └─────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
 ┌─────────────────────────────────────────────────────────────────────┐
 │                         前端层 (Frontend Layer)                      │
 │  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐   │
 │  │  Dashboard  │ │ FactorAnalysis│ │ Prediction │ │ Clustering  │   │
 │  │   数据概览   │ │   影响因素   │ │   缺勤预测   │ │   员工画像   │   │
 │  └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘   │
 │  ┌─────────────────────────────────────────────────────────────┐   │
 │  │              公共组件 (ChartComponent, ResultCard)            │   │
 │  └─────────────────────────────────────────────────────────────┘   │
 │  ┌─────────────────────────────────────────────────────────────┐   │
 │  │      Vue 3 + Element Plus + ECharts + Axios + Vue Router     │   │
 │  └─────────────────────────────────────────────────────────────┘   │
 └─────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ HTTP/REST API
                                    ▼
 ┌─────────────────────────────────────────────────────────────────────┐
 │                         后端层 (Backend Layer)                       │
 │  ┌─────────────────────────────────────────────────────────────┐   │
 │  │                      API Layer (api/)                         │   │
 │  │    overview_routes  │  analysis_routes  │  predict_routes    │   │
 │  │                          cluster_routes                        │   │
 │  └─────────────────────────────────────────────────────────────┘   │
 │                                    │                                 │
 │  ┌─────────────────────────────────────────────────────────────┐   │
 │  │                   Service Layer (services/)                   │   │
 │  │  data_service  │  analysis_service  │  predict_service       │   │
 │  │                        cluster_service                         │   │
 │  └─────────────────────────────────────────────────────────────┘   │
 │                                    │                                 │
 │  ┌─────────────────────────────────────────────────────────────┐   │
 │  │                     Core Layer (core/)                        │   │
 │  │   preprocessing  │  feature_mining  │  train_model           │   │
 │  │                          clustering                            │   │
 │  └─────────────────────────────────────────────────────────────┘   │
 │  ┌─────────────────────────────────────────────────────────────┐   │
 │  │              Flask + scikit-learn + XGBoost + pandas          │   │
 │  └─────────────────────────────────────────────────────────────┘   │
 └─────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
 ┌─────────────────────────────────────────────────────────────────────┐
 │                         数据层 (Data Layer)                          │
 │  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐  │
 │  │     data/raw/    │  │  data/processed/ │  │     models/      │  │
 │  │   原始CSV数据     │  │   处理后数据      │  │   模型文件.pkl   │  │
 │  └──────────────────┘  └──────────────────┘  └──────────────────┘  │
 └─────────────────────────────────────────────────────────────────────┘
 ```
 ### 2.2 技术架构
 ```
 ┌────────────────────────────────────────────────────────────────┐
 │                        技术栈总览                               │
 ├────────────────────────────────────────────────────────────────┤
 │                                                                │
 │  前端技术栈                     后端技术栈                      │
 │  ┌──────────────────┐          ┌──────────────────┐            │
 │  │ Vue 3            │          │ Python 3.8+      │            │
 │  │ Element Plus     │          │ Flask            │            │
 │  │ ECharts 5        │  ◄─────► │ scikit-learn     │            │
 │  │ Axios            │   HTTP   │ XGBoost          │            │
 │  │ Vue Router       │   REST   │ pandas           │            │
 │  │ Vite             │          │ numpy            │            │
 │  └──────────────────┘          │ joblib           │            │
 │                                └──────────────────┘            │
 │                                                                │
 │  算法技术                       数据存储                        │
 │  ┌──────────────────┐          ┌──────────────────┐            │
 │  │ 随机森林 (RF)     │          │ CSV文件          │            │
 │  │ XGBoost          │          │ PKL模型文件      │            │
 │  │ K-Means          │          │ JSON响应         │            │
 │  │ StandardScaler   │          │                  │            │
 │  │ OneHotEncoder    │          │                  │            │
 │  └──────────────────┘          └──────────────────┘            │
 │                                                                │
 └────────────────────────────────────────────────────────────────┘
 ```
 ### 2.3 部署架构
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                        单机部署架构                              │
 │                                                                 │
 │  ┌─────────────────────────────────────────────────────────┐   │
 │  │                      服务器                              │   │
 │  │                                                         │   │
 │  │   ┌─────────────────┐      ┌─────────────────┐         │   │
 │  │   │   Flask Server  │      │   Vite Dev      │         │   │
 │  │   │   Port: 5000    │      │   Port: 5173    │         │   │
 │  │   │                 │      │                 │         │   │
 │  │   │   - REST API    │      │   - Vue App     │         │   │
 │  │   │   - ML Models   │      │   - Static      │         │   │
 │  │   │   - Data Files  │      │                 │         │   │
 │  │   └─────────────────┘      └─────────────────┘         │   │
 │  │                                                         │   │
 │  │   ┌─────────────────────────────────────────────────┐  │   │
 │  │   │              文件系统                            │  │   │
 │  │   │   /backend/data/    - 数据文件                   │  │   │
 │  │   │   /backend/models/  - 模型文件                   │  │   │
 │  │   │   /frontend/dist/   - 前端构建产物               │  │   │
 │  │   └─────────────────────────────────────────────────┘  │   │
 │  │                                                         │   │
 │  └─────────────────────────────────────────────────────────┘   │
 │                                                                 │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ---
 ## 3. 后端架构设计
-### 3.1 分层设计
+### 3.1 路由层
-后端采用经典的三层架构，各层职责明确：
+- `overview_routes.py`
 - `analysis_routes.py`
 - `predict_routes.py`
 - `cluster_routes.py`
-| 层级 | 目录 | 职责 | 依赖关系 |
+### 3.2 服务层
 |------|------|------|----------|
 | API层 | api/ | 接收HTTP请求，参数校验，调用服务层，返回响应 | 依赖Service层 |
 | Service层 | services/ | 业务逻辑处理，协调Core层和Data层 | 依赖Core层 |
 | Core层 | core/ | 核心算法实现，数据处理，模型训练 | 无依赖 |
-### 3.2 模块划分
+- `data_service.py`：负责概览统计
 - `analysis_service.py`：负责特征重要性与群体对比
 - `predict_service.py`：负责模型加载与预测输出
 - `cluster_service.py`：负责聚类结果组织
-```
+### 3.3 核心算法层
 - `generate_dataset.py`：生成企业缺勤事件数据集
 - `preprocessing.py`：数据清洗与预处理
 - `model_features.py`：特征构建与预测输入映射
 - `train_model.py`：模型训练与评估
 - `deep_learning_model.py`：LSTM+MLP 深度学习训练与推理
 - `feature_mining.py`：相关性分析与群体对比
 - `clustering.py`：K-Means 聚类分析
 ## 4. 数据流设计
 ### 4.1 训练流程
 1. 生成企业缺勤事件数据集
 2. 加载并清洗数据
 3. 构建衍生特征
 4. 进行标签编码和特征筛选
 5. 训练多种模型并评估性能
 6. 保存模型、特征信息和训练元数据
 其中深度学习路径采用：
 - `LSTM` 处理员工最近多次缺勤事件构成的时间窗口序列
 - `MLP` 处理员工静态属性特征
 - 融合层输出缺勤时长回归结果
 ### 4.2 预测流程
 1. 前端输入核心预测字段
 2. 后端构建完整预测样本
 3. 自动补齐默认字段
 4. 执行特征工程和编码
 5. 加载模型进行预测
 6. 返回缺勤时长、风险等级、模型名称和置信度
 ### 4.3 分析流程
 1. 读取清洗后的数据
 2. 计算统计指标或相关关系
 3. 组织成前端图表所需结构
 4. 返回 JSON 数据供前端展示
 ## 5. 文件组织结构
 ```text
 backend/
-├── app.py                 # 应用入口，Flask实例配置
+  api/
-├── config.py              # 配置文件（路径、参数等）
+  core/
-├── requirements.txt       # Python依赖清单
+  data/
-│
+  models/
-├── api/                   # API接口层
+  services/
-│   ├── __init__.py
+  app.py
-│   ├── overview_routes.py     # 数据概览接口
+
-│   ├── analysis_routes.py     # 影响因素分析接口
+frontend/
-│   ├── predict_routes.py      # 预测接口
+  src/
-│   └── cluster_routes.py      # 聚类接口
+    api/
-│
+    router/
-├── services/              # 业务逻辑层
+    styles/
-│   ├── __init__.py
+    views/
-│   ├── data_service.py        # 数据服务
+    App.vue
-│   ├── analysis_service.py    # 分析服务
+    main.js
 │   ├── predict_service.py     # 预测服务
 │   └── cluster_service.py     # 聚类服务
 │
 ├── core/                  # 核心算法层
 │   ├── __init__.py
 │   ├── preprocessing.py       # 数据预处理
 │   ├── feature_mining.py      # 特征挖掘
 │   ├── train_model.py         # 模型训练
 │   └── clustering.py          # 聚类分析
 │
 ├── data/                  # 数据存储
 │   ├── raw/                   # 原始数据
 │   │   └── Absenteeism_at_work.csv
 │   └── processed/             # 处理后数据
 │       └── clean_data.csv
 │
 ├── models/                # 模型存储
 │   ├── rf_model.pkl           # 随机森林模型
 │   ├── xgb_model.pkl          # XGBoost模型
 │   ├── kmeans_model.pkl       # K-Means模型
 │   ├── scaler.pkl             # 标准化器
 │   └── encoder.pkl            # 编码器
 │
 └── utils/                 # 工具函数
    ├── __init__.py
    └── common.py              # 通用工具函数
 ```
-### 3.3 各模块职责详解
+## 6. 技术选型说明
-#### 3.3.1 API层 (api/)
+### 6.1 Flask
-| 文件 | 职责 | 主要接口 |
+- 轻量，适合本科毕设项目
-|------|------|----------|
+- 路由层清晰，便于拆分接口
 | overview_routes.py | 数据概览相关接口 | /api/overview/stats, /api/overview/trend |
 | analysis_routes.py | 影响因素分析接口 | /api/analysis/importance, /api/analysis/correlation |
 | predict_routes.py | 缺勤预测接口 | /api/predict/single, /api/predict/model-info |
 | cluster_routes.py | 聚类分析接口 | /api/cluster/result, /api/cluster/profile |
-#### 3.3.2 Service层 (services/)
+### 6.2 Vue 3
-| 文件 | 职责 | 核心方法 |
+- 组件化开发效率较高
-|------|------|----------|
+- 与 Element Plus、ECharts 配合较好
 | data_service.py | 数据读取与基础统计 | get_raw_data(), get_statistics() |
 | analysis_service.py | 特征分析业务逻辑 | get_importance(), get_correlation() |
 | predict_service.py | 预测业务逻辑 | predict_single(), load_model() |
 | cluster_service.py | 聚类业务逻辑 | get_clusters(), get_profile() |
-#### 3.3.3 Core层 (core/)
+### 6.3 Scikit-learn
-| 文件 | 职责 | 核心类/方法 |
+- 适合传统机器学习建模
-|------|------|-------------|
+- 提供随机森林、GBDT、Extra Trees 等成熟算法
 | preprocessing.py | 数据预处理 | DataPreprocessor类 |
 | feature_mining.py | 特征挖掘 | calculate_importance(), calculate_correlation() |
 | train_model.py | 模型训练 | train_rf(), train_xgboost() |
 | clustering.py | 聚类分析 | KMeansAnalyzer类 |
---
+### 6.4 PyTorch
-## 4. 前端架构设计
+- 用于实现 LSTM+MLP 深度学习模型
 - 支持将时序特征与静态特征进行融合建模
 - 便于在论文中增加深度学习对比实验内容
-### 4.1 组件化设计
+## 7. 部署方式
-```
+- 本地前端开发服务器：Vite
-frontend/src/
+- 本地后端服务：Flask 开发服务器
-├── components/              # 公共组件
+- 模型文件与数据文件均存储在本地项目目录中
 │   ├── ChartComponent.vue       # ECharts图表封装组件
 │   ├── ResultCard.vue           # 预测结果展示卡片
 │   ├── KPICard.vue              # KPI指标卡片
 │   └── LoadingSpinner.vue       # 加载动画组件
 │
 ├── views/                   # 页面组件
 │   ├── Dashboard.vue            # 数据概览页
 │   ├── FactorAnalysis.vue       # 影响因素分析页
 │   ├── Prediction.vue           # 缺勤预测页
 │   └── Clustering.vue           # 员工画像页
 │
 ├── api/                     # API调用
 │   ├── request.js               # Axios封装
 │   ├── overview.js              # 概览API
 │   ├── analysis.js              # 分析API
 │   ├── predict.js               # 预测API
 │   └── cluster.js               # 聚类API
 │
 ├── router/                  # 路由配置
 │   └── index.js
 │
 ├── assets/                  # 静态资源
 │   └── styles/
 │       └── main.css
 │
 ├── App.vue                  # 根组件
 └── main.js                  # 入口文件
 ```
-### 4.2 状态管理
+## 8. 架构特点
-由于本项目状态较为简单，不引入Vuex/Pinia，使用以下方式管理状态：
+- 结构清晰，便于答辩说明
-
+- 前后端职责明确
- **组件内部状态**：使用Vue 3的ref/reactive
+- 支持快速展示图表与预测效果
- **跨组件通信**：使用props和emit
+- 支持后续扩展为数据库或更复杂模型架构
- **API状态**：在API层统一管理
+- 同时支持传统机器学习模型与深度学习模型的实验对比
 ### 4.3 路由设计
 ```javascript
 const routes = [
  {
    path: '/',
    redirect: '/dashboard'
  },
  {
    path: '/dashboard',
    name: 'Dashboard',
    component: () => import('@/views/Dashboard.vue'),
    meta: { title: '数据概览' }
  },
  {
    path: '/analysis',
    name: 'FactorAnalysis',
    component: () => import('@/views/FactorAnalysis.vue'),
    meta: { title: '影响因素分析' }
  },
  {
    path: '/prediction',
    name: 'Prediction',
    component: () => import('@/views/Prediction.vue'),
    meta: { title: '缺勤预测' }
  },
  {
    path: '/clustering',
    name: 'Clustering',
    component: () => import('@/views/Clustering.vue'),
    meta: { title: '员工画像' }
  }
 ]
 ```
 ---
 ## 5. 算法架构设计
 ### 5.1 数据预处理流程
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                      数据预处理流程                              │
 │                                                                 │
 │  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐      │
 │  │  原始CSV数据  │───►│  数据清洗     │───►│  特征分离     │      │
 │  │              │    │  (缺失值处理) │    │              │      │
 │  └──────────────┘    └──────────────┘    └──────────────┘      │
 │                                                │                │
 │                                                ▼                │
 │  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐      │
 │  │  处理后数据   │◄───│  特征合并     │◄───│  特征编码     │      │
 │  │  clean_data  │    │              │    │  + 标准化    │      │
 │  └──────────────┘    └──────────────┘    └──────────────┘      │
 │                                                                 │
 │  特征处理方式：                                                  │
 │  ┌────────────────────────────────────────────────────────┐    │
 │  │ 类别型特征 → OneHotEncoder                              │    │
 │  │   - Reason for absence                                  │    │
 │  │   - Month, Day, Seasons                                 │    │
 │  │   - Education, Disciplinary failure                     │    │
 │  │   - Social drinker, Social smoker                       │    │
 │  ├────────────────────────────────────────────────────────┤    │
 │  │ 数值型特征 → StandardScaler                             │    │
 │  │   - Transportation expense                              │    │
 │  │   - Distance, Service time, Age                         │    │
 │  │   - Work load, Hit target                               │    │
 │  │   - Son, Pet, BMI                                       │    │
 │  └────────────────────────────────────────────────────────┘    │
 │                                                                 │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ### 5.2 特征挖掘流程
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                      特征挖掘流程                                │
 │                                                                 │
 │  ┌──────────────────────────────────────────────────────────┐  │
 │  │                    特征重要性计算                         │  │
 │  │                                                          │  │
 │  │   训练数据 ──► 随机森林模型 ──► feature_importances_      │  │
 │  │                                   │                      │  │
 │  │                                   ▼                      │  │
 │  │                          特征重要性排序结果               │  │
 │  │                                                          │  │
 │  └──────────────────────────────────────────────────────────┘  │
 │                                                                 │
 │  ┌──────────────────────────────────────────────────────────┐  │
 │  │                    相关性分析                             │  │
 │  │                                                          │  │
 │  │   数据矩阵 ──► pandas.DataFrame.corr() ──► 相关系数矩阵   │  │
 │  │                                                    │     │  │
 │  │                                                    ▼     │  │
 │  │                                           热力图数据      │  │
 │  │                                                          │  │
 │  └──────────────────────────────────────────────────────────┘  │
 │                                                                 │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ### 5.3 预测模型流程
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                      预测模型流程                                │
 │                                                                 │
 │  训练阶段：                                                      │
 │  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐      │
 │  │  处理后数据   │───►│  划分数据集   │───►│  模型训练     │      │
 │  │              │    │  Train/Test  │    │  RF + XGBoost │      │
 │  └──────────────┘    └──────────────┘    └──────────────┘      │
 │                                                │                │
 │                                                ▼                │
 │                         ┌──────────────────────────────────┐   │
 │                         │  模型评估                         │   │
 │                         │  - R² (决定系数)                  │   │
 │                         │  - MSE (均方误差)                 │   │
 │                         │  - RMSE (均方根误差)              │   │
 │                         └──────────────────────────────────┘   │
 │                                                │                │
 │                                                ▼                │
 │                         ┌──────────────────────────────────┐   │
 │                         │  保存模型 (.pkl文件)              │   │
 │                         └──────────────────────────────────┘   │
 │                                                                 │
 │  预测阶段：                                                      │
 │  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐      │
 │  │  用户输入     │───►│  特征预处理   │───►│  加载模型     │      │
 │  │  (表单数据)   │    │  (编码+标准化)│    │  预测推理     │      │
 │  └──────────────┘    └──────────────┘    └──────────────┘      │
 │                                                │                │
 │                                                ▼                │
 │                         ┌──────────────────────────────────┐   │
 │                         │  返回预测结果                     │   │
 │                         │  - 预测时长                       │   │
 │                         │  - 风险等级                       │   │
 │                         │  - 置信度                         │   │
 │                         └──────────────────────────────────┘   │
 │                                                                 │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ### 5.4 聚类分析流程
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                      聚类分析流程                                │
 │                                                                 │
 │  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐      │
 │  │  处理后数据   │───►│  特征选择     │───►│  K-Means     │      │
 │  │              │    │  (关键维度)   │    │  聚类        │      │
 │  └──────────────┘    └──────────────┘    └──────────────┘      │
 │                                                │                │
 │                                                ▼                │
 │  ┌──────────────────────────────────────────────────────────┐  │
 │  │                     聚类结果                              │  │
 │  │                                                          │  │
 │  │   ┌─────────────────┐      ┌─────────────────┐          │  │
 │  │   │  聚类标签        │      │  聚类中心        │          │  │
 │  │   │  (每条记录所属簇) │      │  (每个簇的中心点) │          │  │
 │  │   └─────────────────┘      └─────────────────┘          │  │
 │  │                                                          │  │
 │  └──────────────────────────────────────────────────────────┘  │
 │                                                │                │
 │                                                ▼                │
 │  ┌──────────────────────────────────────────────────────────┐  │
 │  │                     可视化输出                            │  │
 │  │                                                          │  │
 │  │   - 雷达图：展示各聚类群体的特征分布                       │  │
 │  │   - 散点图：展示员工在聚类空间的分布                       │  │
 │  │   - 统计表：各聚类的成员数量、特征均值                     │  │
 │  │                                                          │  │
 │  └──────────────────────────────────────────────────────────┘  │
 │                                                                 │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ---
 ## 6. 技术选型
 ### 6.1 后端技术栈
 | 技术 | 版本 | 用途 | 选择理由 |
 |------|------|------|----------|
 | Python | 3.8+ | 开发语言 | 丰富的数据科学和机器学习库 |
 | Flask | 2.x | Web框架 | 轻量级，易于上手，适合中小项目 |
 | scikit-learn | 1.x | 机器学习 | 提供完整的机器学习工具链 |
 | XGBoost | 1.x | 梯度提升 | 高性能，适合结构化数据预测 |
 | pandas | 1.x | 数据处理 | 强大的数据分析和处理能力 |
 | numpy | 1.x | 数值计算 | 高效的数组操作 |
 | joblib | 1.x | 模型持久化 | 高效的模型序列化 |
 ### 6.2 前端技术栈
 | 技术 | 版本 | 用途 | 选择理由 |
 |------|------|------|----------|
 | Vue | 3.x | 前端框架 | 组合式API，性能优秀，生态完善 |
 | Element Plus | 2.x | UI组件库 | 组件丰富，文档完善，适合管理后台 |
 | ECharts | 5.x | 图表库 | 功能强大，图表类型丰富，国内主流 |
 | Axios | 1.x | HTTP客户端 | Promise支持，拦截器功能完善 |
 | Vue Router | 4.x | 路由管理 | Vue官方路由解决方案 |
 | Vite | 4.x | 构建工具 | 开发体验好，构建速度快 |
 ### 6.3 算法选型
 | 算法 | 用途 | 选择理由 |
 |------|------|----------|
 | 随机森林 | 特征重要性计算、预测 | 可解释性强，能输出特征重要性 |
 | XGBoost | 预测模型 | 性能优异，适合回归任务 |
 | K-Means | 员工聚类 | 简单高效，适合无监督聚类 |
 | StandardScaler | 数值标准化 | 消除量纲影响，提高模型效果 |
 | OneHotEncoder | 类别编码 | 处理类别型特征的标准方法 |
 ---
 ## 7. 附录
 ### 7.1 目录结构完整版
 ```
 Absenteeism_Analysis_System/
 │
 ├── backend/                          # 后端项目
 │   ├── app.py                        # 应用入口
 │   ├── config.py                     # 配置文件
 │   ├── requirements.txt              # 依赖清单
 │   │
 │   ├── api/                          # API接口层
 │   │   ├── __init__.py
 │   │   ├── overview_routes.py
 │   │   ├── analysis_routes.py
 │   │   ├── predict_routes.py
 │   │   └── cluster_routes.py
 │   │
 │   ├── services/                     # 业务逻辑层
 │   │   ├── __init__.py
 │   │   ├── data_service.py
 │   │   ├── analysis_service.py
 │   │   ├── predict_service.py
 │   │   └── cluster_service.py
 │   │
 │   ├── core/                         # 核心算法层
 │   │   ├── __init__.py
 │   │   ├── preprocessing.py
 │   │   ├── feature_mining.py
 │   │   ├── train_model.py
 │   │   └── clustering.py
 │   │
 │   ├── data/                         # 数据目录
 │   │   ├── raw/
 │   │   │   └── Absenteeism_at_work.csv
 │   │   └── processed/
 │   │       └── clean_data.csv
 │   │
 │   ├── models/                       # 模型目录
 │   │   ├── rf_model.pkl
 │   │   ├── xgb_model.pkl
 │   │   ├── kmeans_model.pkl
 │   │   ├── scaler.pkl
 │   │   └── encoder.pkl
 │   │
 │   └── utils/                        # 工具函数
 │       ├── __init__.py
 │       └── common.py
 │
 ├── frontend/                         # 前端项目
 │   ├── public/
 │   ├── src/
 │   │   ├── api/
 │   │   │   ├── request.js
 │   │   │   ├── overview.js
 │   │   │   ├── analysis.js
 │   │   │   ├── predict.js
 │   │   │   └── cluster.js
 │   │   ├── assets/
 │   │   │   └── styles/
 │   │   │       └── main.css
 │   │   ├── components/
 │   │   │   ├── ChartComponent.vue
 │   │   │   ├── ResultCard.vue
 │   │   │   ├── KPICard.vue
 │   │   │   └── LoadingSpinner.vue
 │   │   ├── router/
 │   │   │   └── index.js
 │   │   ├── views/
 │   │   │   ├── Dashboard.vue
 │   │   │   ├── FactorAnalysis.vue
 │   │   │   ├── Prediction.vue
 │   │   │   └── Clustering.vue
 │   │   ├── App.vue
 │   │   └── main.js
 │   ├── index.html
 │   ├── package.json
 │   ├── pnpm-lock.yaml
 │   └── vite.config.js
 │
 ├── docs/                             # 文档目录
 │   ├── 00_需求规格说明书.md
 │   ├── 01_系统架构设计.md
 │   ├── 02_接口设计文档.md
 │   ├── 03_数据设计文档.md
 │   ├── 04_UI原型设计.md
 │   └── ...
 │
 ├── data/                             # 原始数据（项目根目录）
 │   └── Absenteeism_at_work.csv
 │
 └── README.md                         # 项目说明
 ```
 ### 7.2 文档修改历史
 | 版本 | 日期 | 修改人 | 修改内容 |
 |------|------|--------|----------|
 | V1.0 | 2026-03 | 张硕 | 初始版本 |
 ---
 **文档结束**
--- a/docs/02_接口设计文档.md
+++ b/docs/02_接口设计文档.md
--- a/docs/03_数据设计文档.md
+++ b/docs/03_数据设计文档.md
@@ -1,426 +1,171 @@
 # 数据设计文档
-## 基于多维特征挖掘的员工缺勤分析与预测系统
+## 1. 数据集说明
-
+
-**文档版本**：V1.0  
+本系统数据集为中国企业员工缺勤事件数据集。每条记录表示一次员工缺勤事件，预测目标为缺勤时长（小时）。
-**编写日期**：2026年3月  
+
-**编写人**：张硕
+数据文件：
-
+
---
+- `backend/data/raw/china_enterprise_absence_events.csv`
-
+
-## 1. 数据集概述
+## 2. 数据粒度
-
+
-### 1.1 数据来源
+- 记录粒度：单次缺勤事件
-
+- 员工粒度：同一员工可对应多条缺勤记录
-| 项目 | 内容 |
+- 企业粒度：多个企业组成整体样本池
-|------|------|
+
-| 数据集名称 | Absenteeism at work |
+## 3. 字段分类
-| 数据来源 | UCI Machine Learning Repository |
+
-| 原始提供方 | 巴西某快递公司 (2007-2010年) |
+### 3.1 企业与组织字段
-| 数据提供者 | Andrea Martiniano, Ricardo Pinto Ferreira, Renato Jose Sassi |
+
-| 所属机构 | Universidade Nove de Julho, Brazil |
+- 企业编号
-
+- 所属行业
-### 1.2 数据规模
+- 企业规模
-
+- 所在城市等级
-| 项目 | 数值 |
+- 用工类型
-|------|------|
+- 部门条线
-| 记录总数 | 740条 |
+- 岗位序列
-| 特征数量 | 21个字段 |
+- 岗位级别
-| 员工数量 | 36人 |
+
-| 时间跨度 | 2007年7月 - 2010年7月 |
+### 3.2 员工基础字段
-
+
-### 1.3 数据质量
+- 员工编号
-
+- 性别
-| 检查项 | 结果 | 说明 |
+- 年龄
-|--------|------|------|
+- 司龄年数
-| 缺失值 | 无 | 数据完整无缺失 |
+- 最高学历
-| 重复记录 | 无 | 无重复数据 |
+- 婚姻状态
-| 异常值 | 需检查 | 部分字段可能存在异常值 |
+- 是否本地户籍
-| 数据一致性 | 良好 | 字段格式一致 |
+- 子女数量
-
+- 是否独生子女家庭负担
---
+- 居住类型
-
+
-## 2. 字段说明
+### 3.3 工作负荷字段
-
+
-### 2.1 字段完整列表
+- 班次类型
-
+- 是否夜班岗位
-| 序号 | 字段名 | 中文名称 | 数据类型 | 取值范围 | 说明 |
+- 月均加班时长
-|------|--------|----------|----------|----------|------|
+- 近30天出勤天数
-| 1 | ID | 员工标识 | int | 1-36 | 唯一标识员工 |
+- 近90天缺勤次数
-| 2 | Reason for absence | 缺勤原因 | int | 0-28 | ICD代码或非疾病原因 |
+- 近180天请假总时长
-| 3 | Month of absence | 缺勤月份 | int | 1-12 | 月份 |
+- 通勤时长分钟
-| 4 | Day of the week | 星期几 | int | 2-6 | 2=周一, 6=周五 |
+- 通勤距离公里
-| 5 | Seasons | 季节 | int | 1-4 | 1=夏, 4=春 |
+- 是否跨城通勤
-| 6 | Transportation expense | 交通费用 | int | 118-388 | 月交通费用（雷亚尔） |
+- 绩效等级
-| 7 | Distance from Residence to Work | 通勤距离 | int | 5-52 | 公里数 |
+- 近12月违纪次数
-| 8 | Service time | 工龄 | int | 1-29 | 年数 |
+- 团队人数
-| 9 | Age | 年龄 | int | 27-58 | 周岁 |
+- 直属上级管理跨度
-| 10 | Work load Average/day | 日均工作负荷 | float | 205-350 | 目标达成量/天 |
+
-| 11 | Hit target | 达标率 | int | 81-100 | 百分比 |
+### 3.4 健康与生活方式字段
-| 12 | Disciplinary failure | 违纪记录 | int | 0-1 | 0=否, 1=是 |
+
-| 13 | Education | 学历 | int | 1-4 | 1=高中, 4=博士 |
+- BMI
-| 14 | Son | 子女数量 | int | 0-4 | 子女人数 |
+- 是否慢性病史
-| 15 | Social drinker | 饮酒习惯 | int | 0-1 | 0=否, 1=是 |
+- 年度体检异常标记
-| 16 | Social smoker | 吸烟习惯 | int | 0-1 | 0=否, 1=是 |
+- 近30天睡眠时长均值
-| 17 | Pet | 宠物数量 | int | 0-8 | 宠物数量 |
+- 每周运动频次
-| 18 | Weight | 体重 | int | 56-108 | 公斤 |
+- 是否吸烟
-| 19 | Height | 身高 | int | 163-196 | 厘米 |
+- 是否饮酒
-| 20 | Body mass index | BMI指数 | float | 19-38 | 体重/身高² |
+- 心理压力等级
-| 21 | Absenteeism time in hours | 缺勤时长 | int | 0-120 | 目标变量（小时） |
+- 是否长期久坐岗位
-
+
-### 2.2 特征分类
+### 3.5 缺勤事件字段
-
+
-#### 2.2.1 类别型特征
+- 缺勤月份
-
+- 星期几
-| 字段名 | 类别数 | 类别说明 |
+- 是否节假日前后
-|--------|--------|----------|
+- 季节
-| Reason for absence | 29 | 0-28，ICD疾病代码或非疾病原因 |
+- 事件日期
-| Month of absence | 12 | 1-12月 |
+- 事件日期索引
-| Day of the week | 5 | 周一至周五 |
+- 事件序号
-| Seasons | 4 | 夏秋冬春 |
+- 员工历史事件数
-| Disciplinary failure | 2 | 是/否 |
+- 请假申请渠道
-| Education | 4 | 高中/本科/研究生/博士 |
+- 请假类型
-| Social drinker | 2 | 是/否 |
+- 请假原因大类
-| Social smoker | 2 | 是/否 |
+- 是否提供医院证明
-
+- 是否临时请假
-#### 2.2.2 数值型特征
+- 是否连续缺勤
-
+- 前一工作日是否加班
-| 字段名 | 类型 | 范围 | 均值 | 标准差 |
+- 缺勤时长（小时）
-|--------|------|------|------|--------|
+
-| Transportation expense | 连续 | 118-388 | 221.3 | 69.1 |
+## 4. 目标变量设计
-| Distance from Residence to Work | 连续 | 5-52 | 29.6 | 14.8 |
+
-| Service time | 连续 | 1-29 | 12.0 | 5.7 |
+目标变量：
-| Age | 连续 | 27-58 | 36.9 | 6.5 |
+
-| Work load Average/day | 连续 | 205-350 | 270.7 | 37.1 |
+- `缺勤时长（小时）`
-| Hit target | 连续 | 81-100 | 94.6 | 4.0 |
+
-| Son | 离散 | 0-4 | 1.0 | 1.1 |
+风险等级映射：
-| Pet | 离散 | 0-8 | 0.8 | 1.5 |
+
-| Weight | 连续 | 56-108 | 79.0 | 12.4 |
+- 小于 4 小时：低风险
-| Height | 连续 | 163-196 | 172.9 | 6.0 |
+- 4 至 8 小时：中风险
-| Body mass index | 连续 | 19-38 | 26.7 | 4.3 |
+- 大于 8 小时：高风险
-| Absenteeism time in hours | 连续 | 0-120 | 6.9 | 13.3 |
+
-
+## 5. 特征工程设计
-### 2.3 缺勤原因详细说明
+
-
+系统在原始字段基础上构建以下衍生特征：
-#### 2.3.1 ICD疾病分类（代码1-21）
+
-
+- 加班通勤压力指数
-| 代码 | ICD分类 | 疾病类型 |
+- 家庭负担指数
-|------|---------|----------|
+- 健康风险指数
-| 1 | I | 传染病和寄生虫病 |
+- 岗位稳定性指数
-| 2 | II | 肿瘤 |
+- 节假日风险标记
-| 3 | III | 血液及造血器官疾病 |
+- 排班压力标记
-| 4 | IV | 内分泌、营养和代谢疾病 |
+- 缺勤历史强度
-| 5 | V | 精神和行为障碍 |
+- 生活规律指数
-| 6 | VI | 神经系统疾病 |
+- 管理负荷指数
-| 7 | VII | 眼及其附属器疾病 |
+- 工龄分层
-| 8 | VIII | 耳及乳突疾病 |
+- 年龄分层
-| 9 | IX | 循环系统疾病 |
+- 通勤分层
-| 10 | X | 呼吸系统疾病 |
+- 加班分层
-| 11 | XI | 消化系统疾病 |
+
-| 12 | XII | 皮肤和皮下组织疾病 |
+## 6. 数据生成逻辑
-| 13 | XIII | 肌肉骨骼系统和结缔组织疾病 |
+
-| 14 | XIV | 泌尿生殖系统疾病 |
+### 6.1 生成原则
-| 15 | XV | 妊娠、分娩和产褥期 |
+
-| 16 | XVI | 围产期疾病 |
+- 结合中国企业实际管理场景设计字段
-| 17 | XVII | 先天性畸形 |
+- 保证类别分布与数值范围具有合理性
-| 18 | XVIII | 症状、体征异常发现 |
+- 让关键特征和目标变量之间保持稳定、可学习关系
-| 19 | XIX | 损伤、中毒 |
+
-| 20 | XX | 外部原因导致的发病和死亡 |
+### 6.2 影响关系示例
-| 21 | XXI | 影响健康状态的因素 |
+
-
+- 请假类型对缺勤时长有显著影响
-#### 2.3.2 非疾病原因（代码22-28）
+- 医院证明通常对应更高缺勤时长
-
+- 夜班、长通勤和高加班会提升缺勤风险
-| 代码 | 名称 | 说明 |
+- 慢性病史和健康异常会提升缺勤时长
-|------|------|------|
+- 年假和调休通常对应较短缺勤时长
-| 22 | 医疗随访 | 患者定期随访复查 |
+
-| 23 | 医疗咨询 | 门诊就医咨询 |
+### 6.3 时序样本构造
-| 24 | 献血 | 无偿献血活动 |
+
-| 25 | 实验室检查 | 医学检验检查 |
+为支持 LSTM+MLP 深度学习模型，数据集在事件层面额外补充了时序字段：
-| 26 | 无故缺勤 | 未经批准的缺勤 |
+
-| 27 | 理疗 | 物理治疗康复 |
+- `事件日期`：缺勤事件发生日期
-| 28 | 牙科咨询 | 口腔科就诊 |
+- `事件日期索引`：便于排序和窗口切片的数值型时间索引
-
+- `事件序号`：同一员工内部的事件顺序
-#### 2.3.3 特殊值
+- `员工历史事件数`：该员工在数据集中对应的事件总数
-
+
-| 代码 | 说明 |
+深度学习样本构造规则如下：
-|------|------|
+
-| 0 | 未知原因（数据中存在） |
+- 以员工为单位按 `事件日期索引` 和 `事件序号` 排序
-
+- 取最近 `5` 次缺勤事件作为时间窗口输入
-### 2.4 季节编码说明
+- 序列不足时使用前向零填充
-
+- 当前事件作为窗口最后一个时间步
-| 代码 | 季节 | 月份范围（巴西） |
+- 静态特征单独输入 MLP 分支，与 LSTM 输出融合后进行回归预测
-|------|------|------------------|
+
-| 1 | 夏季 | 12月-2月 |
+## 7. 数据质量要求
-| 2 | 秋季 | 3月-5月 |
+
-| 3 | 冬季 | 6月-8月 |
+- 无大量缺失值
-| 4 | 春季 | 9月-11月 |
+- 类别字段取值可控
-
+- 数值字段范围合理
-### 2.5 学历编码说明
+- 高风险比例处于可接受范围
-
+- 关键变量与目标方向关系合理
-| 代码 | 学历 | 说明 |
+
-|------|------|------|
+## 8. 当前数据集统计
-| 1 | 高中 | 高中及以下学历 |
+
-| 2 | 本科 | 大学本科学历 |
+- 样本量：12000
-| 3 | 研究生 | 硕士研究生 |
+- 员工覆盖数：2575
-| 4 | 博士 | 博士研究生 |
+- 企业覆盖数：180
-
+- 行业数：7
---
+- 字段总数：52
-
+
-## 3. 数据预处理
+详细统计可参考：
-
+
-### 3.1 数据清洗
+- [中国企业缺勤模拟数据集说明.md](D:/VScodeProject/forsetsystem/中国企业缺勤模拟数据集说明.md)
 #### 3.1.1 缺失值处理
 数据集本身无缺失值，但在预处理过程中需确保：
 ```
 检查步骤：
 1. 统计每个字段的缺失值数量
 2. 如发现缺失值，数值型用中位数填充，类别型用众数填充
 ```
 #### 3.1.2 异常值处理
 | 字段 | 异常值判定标准 | 处理方式 |
 |------|----------------|----------|
 | Absenteeism time in hours | > 24小时（超过一天） | 保留，但做标记 |
 | Work load Average/day | < 100 或 > 500 | 检查后决定保留或剔除 |
 | Age | < 18 或 > 65 | 检查数据有效性 |
 #### 3.1.3 数据类型转换
 | 字段 | 原始类型 | 转换后类型 | 说明 |
 |------|----------|------------|------|
 | ID | int | int | 保持不变 |
 | Reason for absence | int | category | 转为类别型 |
 | Month of absence | int | category | 转为类别型 |
 | Day of the week | int | category | 转为类别型 |
 | Seasons | int | category | 转为类别型 |
 | Education | int | category | 转为类别型 |
 | Disciplinary failure | int | category | 转为类别型 |
 | Social drinker | int | category | 转为类别型 |
 | Social smoker | int | category | 转为类别型 |
 ### 3.2 特征编码
 #### 3.2.1 独热编码 (One-Hot Encoding)
 对以下类别型特征进行独热编码：
 | 字段 | 编码后特征数 | 说明 |
 |------|--------------|------|
 | Reason for absence | 29 | 每个原因一个二进制特征 |
 | Month of absence | 12 | 每个月份一个二进制特征 |
 | Day of the week | 5 | 每个星期一个二进制特征 |
 | Seasons | 4 | 每个季节一个二进制特征 |
 | Education | 4 | 每个学历一个二进制特征 |
 | Disciplinary failure | 2 | 是/否两个特征 |
 | Social drinker | 2 | 是/否两个特征 |
 | Social smoker | 2 | 是/否两个特征 |
 **编码示例**：
 ```
 原始数据：Reason for absence = 23
 编码后：
  Reason_0: 0
  Reason_1: 0
  ...
  Reason_23: 1
  ...
  Reason_28: 0
 ```
 #### 3.2.2 标准化处理 (StandardScaler)
 对以下数值型特征进行标准化处理（均值为0，标准差为1）：
 | 字段 | 标准化公式 |
 |------|------------|
 | Transportation expense | (x - μ) / σ |
 | Distance from Residence to Work | (x - μ) / σ |
 | Service time | (x - μ) / σ |
 | Age | (x - μ) / σ |
 | Work load Average/day | (x - μ) / σ |
 | Hit target | (x - μ) / σ |
 | Son | (x - μ) / σ |
 | Pet | (x - μ) / σ |
 | Weight | (x - μ) / σ |
 | Height | (x - μ) / σ |
 | Body mass index | (x - μ) / σ |
 ### 3.3 特征工程
 #### 3.3.1 派生特征
 可考虑创建以下派生特征：
 | 派生特征 | 计算方式 | 说明 |
 |----------|----------|------|
 | has_children | Son > 0 | 是否有子女（二分类） |
 | has_pet | Pet > 0 | 是否有宠物（二分类） |
 | age_group | 年龄分组 | 青年/中年/老年 |
 | service_category | 工龄分组 | 新员工/老员工 |
 | bmi_category | BMI分组 | 正常/超重/肥胖 |
 | workload_level | 负荷等级 | 低/中/高 |
 #### 3.3.2 特征选择
 基于特征重要性分析，选择对预测最有价值的特征：
 | 优先级 | 特征 | 选择依据 |
 |--------|------|----------|
 | 高 | Reason for absence | 业务含义明确，影响直接 |
 | 高 | Transportation expense | 特征重要性高 |
 | 高 | Distance from Residence to Work | 特征重要性高 |
 | 高 | Service time | 特征重要性高 |
 | 高 | Age | 特征重要性高 |
 | 中 | Work load Average/day | 有一定影响 |
 | 中 | Body mass index | 有一定影响 |
 | 中 | Social drinker | 群体差异明显 |
 | 低 | Pet | 影响较小 |
 | 低 | Height | 信息可由BMI代替 |
 ### 3.4 数据划分
 #### 3.4.1 训练集/测试集划分
 | 数据集 | 比例 | 记录数 | 用途 |
 |--------|------|--------|------|
 | 训练集 | 80% | 592条 | 模型训练 |
 | 测试集 | 20% | 148条 | 模型评估 |
 #### 3.4.2 划分方式
 - 使用分层抽样，确保各缺勤原因在训练集和测试集中比例一致
 - 随机种子固定（random_state=42），保证结果可复现
 ---
 ## 4. 数据存储方案
 ### 4.1 目录结构
 ```
 backend/data/
 ├── raw/                                    # 原始数据
 │   └── Absenteeism_at_work.csv            # UCI原始数据集
 │
 ├── processed/                              # 处理后数据
 │   ├── clean_data.csv                     # 清洗后的数据
 │   ├── encoded_data.csv                   # 编码后的数据
 │   ├── train_data.csv                     # 训练数据
 │   └── test_data.csv                      # 测试数据
 │
 └── analysis/                               # 分析结果数据
    ├── statistics.json                    # 统计结果
    ├── correlation.json                   # 相关性矩阵
    └── feature_importance.json            # 特征重要性
 ```
 ### 4.2 模型存储
 ```
 backend/models/
 ├── rf_model.pkl                           # 随机森林模型
 ├── xgb_model.pkl                          # XGBoost模型
 ├── kmeans_model.pkl                       # K-Means模型
 ├── scaler.pkl                             # StandardScaler对象
 ├── encoder.pkl                            # OneHotEncoder对象
 └── model_info.json                        # 模型元信息
 ```
 ### 4.3 数据文件格式
 #### 4.3.1 CSV文件格式
 ```
 分隔符：分号 (;)
 编码：UTF-8
 表头：第一行为字段名
 ```
 #### 4.3.2 JSON文件格式
 ```json
 {
  "created_at": "2026-03-01T10:00:00",
  "version": "1.0",
  "data": {
    // 具体数据内容
  }
 }
 ```
 ---
 ## 5. 数据字典
 ### 5.1 原始数据字典
 | 字段名 | 数据类型 | 是否为空 | 默认值 | 说明 |
 |--------|----------|----------|--------|------|
 | ID | INTEGER | NOT NULL | - | 员工唯一标识 |
 | Reason for absence | INTEGER | NOT NULL | - | 缺勤原因代码 |
 | Month of absence | INTEGER | NOT NULL | - | 月份(1-12) |
 | Day of the week | INTEGER | NOT NULL | - | 星期(2-6) |
 | Seasons | INTEGER | NOT NULL | - | 季节(1-4) |
 | Transportation expense | INTEGER | NOT NULL | - | 交通费用 |
 | Distance from Residence to Work | INTEGER | NOT NULL | - | 通勤距离(km) |
 | Service time | INTEGER | NOT NULL | - | 工龄(年) |
 | Age | INTEGER | NOT NULL | - | 年龄 |
 | Work load Average/day | REAL | NOT NULL | - | 日均工作负荷 |
 | Hit target | INTEGER | NOT NULL | - | 达标率(%) |
 | Disciplinary failure | INTEGER | NOT NULL | 0 | 违纪记录(0/1) |
 | Education | INTEGER | NOT NULL | - | 学历(1-4) |
 | Son | INTEGER | NOT NULL | 0 | 子女数量 |
 | Social drinker | INTEGER | NOT NULL | 0 | 饮酒习惯(0/1) |
 | Social smoker | INTEGER | NOT NULL | 0 | 吸烟习惯(0/1) |
 | Pet | INTEGER | NOT NULL | 0 | 宠物数量 |
 | Weight | INTEGER | NOT NULL | - | 体重(kg) |
 | Height | INTEGER | NOT NULL | - | 身高(cm) |
 | Body mass index | REAL | NOT NULL | - | BMI指数 |
 | Absenteeism time in hours | INTEGER | NOT NULL | - | 缺勤时长(目标变量) |
 ---
 ## 6. 附录
 ### 6.1 数据统计摘要
 ```
 数据集基本信息：
 - 记录数：740
 - 特征数：21
 - 员工数：36
 - 缺勤总时长：5028小时
 - 平均缺勤时长：6.9小时
 缺勤原因TOP5：
 1. 医疗咨询(23)：149次 (20.1%)
 2. 牙科咨询(28)：112次 (15.1%)
 3. 理疗(27)：94次 (12.7%)
 4. 疾病咨询(22)：74次 (10.0%)
 5. 消化系统疾病(11)：59次 (8.0%)
 学历分布：
 - 高中：633人 (85.5%)
 - 本科：79人 (10.7%)
 - 研究生及以上：28人 (3.8%)
 生活习惯：
 - 饮酒者：340人 (45.9%)
 - 吸烟者：90人 (12.2%)
 ```
 ### 6.2 文档修改历史
 | 版本 | 日期 | 修改人 | 修改内容 |
 |------|------|--------|----------|
 | V1.0 | 2026-03 | 张硕 | 初始版本 |
 ---
 **文档结束**
--- a/docs/04_UI原型设计.md
+++ b/docs/04_UI原型设计.md
@@ -1,787 +1,109 @@
-# UI原型设计文档
+# UI原型设计
 ## 基于多维特征挖掘的员工缺勤分析与预测系统
 **文档版本**：V1.0  
 **编写日期**：2026年3月  
 **编写人**：张硕
 ---
 ## 1. 设计原则
 ### 1.1 视觉风格
 | 设计要素 | 设计规范 |
 |----------|----------|
 | 主色调 | Element Plus默认蓝色 (#409EFF) |
 | 辅助色 | 成功绿(#67C23A)、警告黄(#E6A23C)、危险红(#F56C6C) |
 | 背景色 | 浅灰色 (#F5F7FA) |
 | 字体 | 系统默认字体（中文：微软雅黑/PingFang SC） |
 | 字号 | 标题16px、正文14px、辅助文字12px |
 | 圆角 | 4px |
 | 阴影 | 轻微阴影增加层次感 |
 ### 1.2 交互原则
 | 原则 | 说明 |
 |------|------|
 | 一致性 | 相同功能使用相同的交互方式 |
 | 反馈性 | 操作后给予明确的视觉反馈 |
 | 容错性 | 提供撤销操作和错误提示 |
 | 易学性 | 界面简洁直观，降低学习成本 |
 | 高效性 | 减少操作步骤，提高工作效率 |
 ### 1.3 响应式设计
 | 屏幕尺寸 | 适配方案 |
 |----------|----------|
 | ≥1920px | 大屏显示，图表放大 |
 | 1366-1920px | 标准显示，默认布局 |
 | <1366px | 紧凑布局，图表自适应 |
 ---
 ## 2. 整体布局
 ### 2.1 页面框架
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │  Header (顶部导航栏)                                             │
 │  ┌───────────────────────────────────────────────────────────┐  │
 │  │  Logo  │  数据概览  │  影响因素  │  缺勤预测  │  员工画像  │  │
 │  └───────────────────────────────────────────────────────────┘  │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │                                                                 │
 │                      Main Content                               │
 │                      (主内容区域)                                │
 │                                                                 │
 │                                                                 │
 ├─────────────────────────────────────────────────────────────────┤
 │  Footer (底部信息栏 - 可选)                                      │
 │  ┌───────────────────────────────────────────────────────────┐  │
 │  │  © 2026 基于多维特征挖掘的员工缺勤分析与预测系统              │  │
 │  └───────────────────────────────────────────────────────────┘  │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ### 2.2 导航设计
 **顶部导航菜单**：
 | 菜单项 | 图标 | 路由 | 说明 |
 |--------|------|------|------|
 | 数据概览 | 📊 | /dashboard | 首页，展示整体统计 |
 | 影响因素 | 🔍 | /analysis | 特征重要性分析 |
 | 缺勤预测 | 🎯 | /prediction | 预测功能入口 |
 | 员工画像 | 👥 | /clustering | 聚类分析结果 |
 ---
 ## 3. 页面一：数据概览 (Dashboard)
 ### 3.1 页面布局
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │  数据概览                                                        │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐            │
 │  │ KPI卡片1 │  │ KPI卡片2 │  │ KPI卡片3 │  │ KPI卡片4 │            │
 │  │ 总记录数 │  │ 员工总数 │  │平均缺勤  │  │高风险占比│            │
 │  │  740    │  │   36    │  │  6.9h   │  │  15%    │            │
 │  └─────────┘  └─────────┘  └─────────┘  └─────────┘            │
 │                                                                 │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  ┌─────────────────────────┐  ┌─────────────────────────┐      │
 │  │                         │  │                         │      │
 │  │   月度缺勤趋势折线图     │  │   星期分布柱状图         │      │
 │  │                         │  │                         │      │
 │  │   (ECharts Line Chart)  │  │   (ECharts Bar Chart)   │      │
 │  │                         │  │                         │      │
 │  └─────────────────────────┘  └─────────────────────────┘      │
 │                                                                 │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  ┌─────────────────────────┐  ┌─────────────────────────┐      │
 │  │                         │  │                         │      │
 │  │   缺勤原因分布饼图       │  │   季节分布饼图           │      │
 │  │                         │  │                         │      │
 │  │   (ECharts Pie Chart)   │  │   (ECharts Pie Chart)   │      │
 │  │                         │  │                         │      │
 │  └─────────────────────────┘  └─────────────────────────┘      │
 │                                                                 │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ### 3.2 组件说明
 #### 3.2.1 KPI卡片组件
 ```
 ┌─────────────────────────────────────┐
 │  ┌───────┐                          │
 │  │ 图标  │   总记录数               │
 │  │  📊   │                          │
 │  └───────┘                          │
 │                                     │
 │        740                          │
 │        条                           │
 │                                     │
 │  较上月 ↑ 5%                        │
 └─────────────────────────────────────┘
 ```
 **组件属性**：
 | 属性 | 类型 | 说明 |
 |------|------|------|
 | title | string | 指标名称 |
 | value | number/string | 指标值 |
 | unit | string | 单位 |
 | icon | string | 图标 |
 | trend | string | 趋势（可选） |
 | trendType | string | 趋势类型（up/down） |
 #### 3.2.2 月度趋势折线图
 **ECharts配置要点**：
 ```javascript
 {
  title: { text: '月度缺勤趋势' },
  xAxis: { 
    type: 'category',
    data: ['1月', '2月', ..., '12月']
  },
  yAxis: { 
    type: 'value',
    name: '缺勤时长(小时)'
  },
  series: [{
    type: 'line',
    smooth: true,
    data: [80, 65, 90, ...]
  }],
  tooltip: {
    trigger: 'axis'
  }
 }
 ```
 #### 3.2.3 缺勤原因饼图
 **ECharts配置要点**：
 ```javascript
 {
  title: { text: '缺勤原因分布' },
  series: [{
    type: 'pie',
    radius: ['40%', '70%'],  // 环形图
    data: [
      { value: 149, name: '医疗咨询' },
      { value: 112, name: '牙科咨询' },
      // ...
    ]
  }],
  legend: {
    orient: 'vertical',
    right: 10
  }
 }
 ```
 ### 3.3 交互流程
 1. 用户进入页面，自动加载统计数据
 2. KPI卡片依次显示（可添加动画效果）
 3. 图表异步加载，显示加载动画
 4. 图表支持鼠标悬停查看详情
 5. 点击图表某区域可钻取详情（可选）
 ---
 ## 4. 页面二：影响因素分析 (FactorAnalysis)
 ### 4.1 页面布局
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │  影响因素分析                                                    │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  ┌───────────────────────────────────────────────────────────┐  │
 │  │                                                           │  │
 │  │              特征重要性排序条形图                          │  │
 │  │              (水平柱状图，降序排列)                        │  │
 │  │                                                           │  │
 │  │   通勤距离   ████████████████████████  0.35               │  │
 │  │   交通费用   ███████████████████       0.28               │  │
 │  │   工龄       ██████████████            0.21               │  │
 │  │   年龄       ████████████              0.18               │  │
 │  │   工作负荷   ████████                  0.12               │  │
 │  │   ...                                                     │  │
 │  │                                                           │  │
 │  └───────────────────────────────────────────────────────────┘  │
 │                                                                 │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  ┌─────────────────────────┐  ┌─────────────────────────┐      │
 │  │                         │  │                         │      │
 │  │   相关性热力图           │  │   群体对比分析           │      │
 │  │                         │  │                         │      │
 │  │   (Heatmap)             │  │   ┌───────────────────┐ │      │
 │  │                         │  │   │ 对比维度: [下拉框] │ │      │
 │  │   显示特征间相关系数     │  │   └───────────────────┘ │      │
 │  │                         │  │                         │      │
 │  │                         │  │   (分组柱状图)          │      │
 │  └─────────────────────────┘  └─────────────────────────┘      │
 │                                                                 │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ### 4.2 组件说明
 #### 4.2.1 特征重要性条形图
 ```
 特征重要性排序
 ┌────────────────────────────────────────────────────────┐
 │                                                        │
 │  Reason for absence   ████████████████████████████    │ 0.35
 │  Transportation exp   ████████████████████            │ 0.28
 │  Distance             █████████████████               │ 0.24
 │  Service time         ██████████████                  │ 0.21
 │  Age                  ████████████                    │ 0.18
 │  Work load            ██████████                      │ 0.15
 │  BMI                  ████████                        │ 0.12
 │  Social drinker       ██████                          │ 0.09
 │  Hit target           ████                            │ 0.06
 │  Son                  ███                             │ 0.05
 │  Pet                  ██                              │ 0.03
 │  Education            ██                              │ 0.03
 │  Social smoker        █                               │ 0.01
 │                                                        │
 └────────────────────────────────────────────────────────┘
 ```
 **ECharts配置要点**：
 ```javascript
 {
  title: { text: '特征重要性排序' },
  grid: { left: '20%' },  // 留出标签空间
  xAxis: { 
    type: 'value',
    name: '重要性得分'
  },
  yAxis: { 
    type: 'category',
    data: ['Reason for absence', 'Transportation', ...]
  },
  series: [{
    type: 'bar',
    data: [0.35, 0.28, ...],
    itemStyle: {
      color: '#409EFF'
    }
  }]
 }
 ```
 #### 4.2.2 相关性热力图
 ```
 ┌─────────────────────────────────────────────────────────┐
 │                      相关性热力图                        │
 │                                                         │
 │         Age   SrvT  Dist  Load  BMI   AbsH             │
 │       ┌─────────────────────────────────────┐          │
 │  Age  │ 1.0  0.67  0.12  0.08  0.15  0.05  │          │
 │       │ ■■■  ■■□  □□□  □□□  □□□  □□□  │          │
 │  SrvT │ 0.67 1.0   0.10  0.05  0.12  0.08  │          │
 │       │ ■■□  ■■■  □□□  □□□  □□□  □□□  │          │
 │  Dist │ 0.12 0.10  1.0   0.03  0.05  0.18  │          │
 │       │ □□□  □□□  ■■■  □□□  □□□  □□□  │          │
 │  ...  │ ...                                      │          │
 │       └─────────────────────────────────────┘          │
 │                                                         │
 │  图例: -1 (蓝色) ←→ 0 (白色) ←→ +1 (红色)              │
 └─────────────────────────────────────────────────────────┘
 ```
 **ECharts配置要点**：
 ```javascript
 {
  title: { text: '相关性热力图' },
  tooltip: {
    formatter: function(params) {
      return `${params.name}: ${params.value[2].toFixed(2)}`;
    }
  },
  xAxis: { type: 'category', data: featureNames },
  yAxis: { type: 'category', data: featureNames },
  visualMap: {
    min: -1,
    max: 1,
    calculable: true,
    inRange: {
      color: ['#313695', '#ffffff', '#a50026']
    }
  },
  series: [{
    type: 'heatmap',
    data: correlationData
  }]
 }
 ```
 #### 4.2.3 群体对比选择器
 ```
 ┌───────────────────────────────────────────────────────────┐
 │  群体对比分析                                              │
 │                                                           │
 │  选择对比维度:  [  饮酒习惯  ▼  ]                          │
 │                                                           │
 │  ┌─────────────────────────────────────────────────────┐  │
 │  │                                                     │  │
 │  │  平均缺勤时长（小时）                                │  │
 │  │                                                     │  │
 │  │  不饮酒  ████████████████           1.2h           │  │
 │  │  饮酒    ██████████████████████████ 2.1h           │  │
 │  │                                                     │  │
 │  │  差异: 饮酒者比不饮酒者高 75%                        │  │
 │  │                                                     │  │
 │  └─────────────────────────────────────────────────────┘  │
 │                                                           │
 └───────────────────────────────────────────────────────────┘
 ```
 **对比维度选项**：
 | 选项 | 分组 |
 |------|------|
 | 饮酒习惯 | 饮酒 / 不饮酒 |
 | 吸烟习惯 | 吸烟 / 不吸烟 |
 | 学历 | 高中 / 本科 / 研究生+ |
 | 子女 | 有子女 / 无子女 |
 | 宠物 | 有宠物 / 无宠物 |
 ### 4.3 交互流程
 1. 页面加载时自动获取特征重要性数据
 2. 渲染特征重要性条形图
 3. 并行加载相关性矩阵，渲染热力图
 4. 用户选择对比维度后，更新群体对比图
 5. 所有图表支持鼠标悬停查看详情
 ---
 ## 5. 页面三：缺勤预测 (Prediction)
 ### 5.1 页面布局
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │  缺勤预测                                                        │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  ┌───────────────────────────┐  ┌───────────────────────────┐  │
 │  │                           │  │                           │  │
 │  │   参数输入表单             │  │   预测结果展示            │  │
 │  │                           │  │                           │  │
 │  │   缺勤原因: [下拉选择]     │  │   ┌───────────────────┐  │  │
 │  │   缺勤月份: [下拉选择]     │  │   │                   │  │  │
 │  │   星期几:   [下拉选择]     │  │   │   预测结果        │  │  │
 │  │   季节:     [下拉选择]     │  │   │                   │  │  │
 │  │                           │  │   │     5.2 小时      │  │  │
 │  │   交通费用: [输入框]       │  │   │                   │  │  │
 │  │   通勤距离: [输入框]       │  │   │   ● 中风险        │  │  │
 │  │   工龄:     [输入框]       │  │   │                   │  │  │
 │  │   年龄:     [输入框]       │  │   └───────────────────┘  │  │
 │  │                           │  │                           │  │
 │  │   日均工作负荷: [输入框]   │  │   ┌───────────────────┐  │  │
 │  │   达标率:     [输入框]     │  │   │   模型信息        │  │  │
 │  │   违纪记录:   [是/否]      │  │   │   R²: 0.82        │  │  │
 │  │   学历:       [下拉选择]   │  │   │   MSE: 15.5       │  │  │
 │  │   子女数量:   [输入框]     │  │   │   置信度: 85%     │  │  │
 │  │   饮酒习惯:   [是/否]      │  │   └───────────────────┘  │  │
 │  │   吸烟习惯:   [是/否]      │  │                           │  │
 │  │   宠物数量:   [输入框]     │  │                           │  │
 │  │   BMI指数:    [输入框]     │  │                           │  │
 │  │                           │  │                           │  │
 │  │   [ 开始预测 ]             │  │                           │  │
 │  │                           │  │                           │  │
 │  └───────────────────────────┘  └───────────────────────────┘  │
 │                                                                 │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ### 5.2 组件说明
 #### 5.2.1 参数输入表单
 **表单字段设计**：
 | 字段 | 组件类型 | 选项/范围 | 默认值 |
 |------|----------|-----------|--------|
 | 缺勤原因 | el-select | 0-28 | 23 |
 | 缺勤月份 | el-select | 1-12 | 当前月 |
 | 星期几 | el-select | 周一-周五 | 周一 |
 | 季节 | el-select | 夏秋冬春 | 当前季节 |
 | 交通费用 | el-input-number | 100-400 | 200 |
 | 通勤距离 | el-input-number | 1-60 | 20 |
 | 工龄 | el-input-number | 1-30 | 5 |
 | 年龄 | el-input-number | 18-60 | 30 |
 | 日均工作负荷 | el-input-number | 200-350 | 250 |
 | 达标率 | el-input-number | 80-100 | 95 |
 | 违纪记录 | el-radio-group | 是/否 | 否 |
 | 学历 | el-select | 高中/本科/研究生/博士 | 本科 |
 | 子女数量 | el-input-number | 0-5 | 0 |
 | 饮酒习惯 | el-radio-group | 是/否 | 否 |
 | 吸烟习惯 | el-radio-group | 是/否 | 否 |
 | 宠物数量 | el-input-number | 0-10 | 0 |
 | BMI指数 | el-input-number | 18-40 | 25 |
 **表单验证规则**：
 | 字段 | 验证规则 |
 |------|----------|
 | 缺勤原因 | 必填 |
 | 缺勤月份 | 必填，范围1-12 |
 | 交通费用 | 必填，范围100-400 |
 | 通勤距离 | 必填，范围1-60 |
 | 年龄 | 必填，范围18-60 |
 | BMI指数 | 必填，范围18-40 |
 #### 5.2.2 预测结果卡片
 ```
 ┌─────────────────────────────────────┐
 │                                     │
 │          预测结果                   │
 │                                     │
 │           5.2                       │
 │          小时                       │
 │                                     │
 │   ┌─────────────────────────────┐  │
 │   │    ● 中风险 (黄色)          │  │
 │   │    缺勤时长: 4-8小时        │  │
 │   └─────────────────────────────┘  │
 │                                     │
 │   模型置信度: 85%                   │
 │   使用模型: 随机森林                │
 │                                     │
 └─────────────────────────────────────┘
 ```
 **风险等级展示**：
 | 等级 | 颜色 | 图标 | 说明 |
 |------|------|------|------|
 | 低风险 | 绿色 (#67C23A) | ✓ | 缺勤时长 < 4小时 |
 | 中风险 | 黄色 (#E6A23C) | ⚠ | 缺勤时长 4-8小时 |
 | 高风险 | 红色 (#F56C6C) | ✕ | 缺勤时长 > 8小时 |
 ### 5.3 交互流程
 1. 页面加载，显示空表单
 2. 用户填写表单字段
 3. 点击"开始预测"按钮
 4. 前端验证表单数据
 5. 发送请求到后端API
 6. 显示加载动画
 7. 接收预测结果
 8. 渲染结果卡片（带动画效果）
 ---
 ## 6. 页面四：员工画像 (Clustering)
 ### 6.1 页面布局
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │  员工画像                                                        │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  聚类数量: [ 3 ▼ ]    [ 重新聚类 ]                               │
 │                                                                 │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  ┌───────────────────────────────────────────────────────────┐  │
 │  │                                                           │  │
 │  │              员工群体雷达图                                │  │
 │  │                                                           │  │
 │  │                        年龄                               │  │
 │  │                         ▲                                 │  │
 │  │                        /│\                                │  │
 │  │                       / │ \                               │  │
 │  │             工龄 ◄──────┼──────► 工作负荷                 │  │
 │  │                     \   │   /                             │  │
 │  │                      \  │  /                              │  │
 │  │                       \ │ /                               │  │
 │  │                 缺勤倾向 ▼ BMI                            │  │
 │  │                                                           │  │
 │  │   图例: ─── 模范型  ─── 压力型  ─── 生活习惯型             │  │
 │  │                                                           │  │
 │  └───────────────────────────────────────────────────────────┘  │
 │                                                                 │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  ┌─────────────────────────┐  ┌─────────────────────────┐      │
 │  │                         │  │                         │      │
 │  │   聚类结果统计           │  │   聚类散点图             │      │
 │  │                         │  │                         │      │
 │  │   模范型: 120人 (33%)   │  │     ●                    │      │
 │  │   压力型: 100人 (28%)   │  │   ●     ●   ○            │      │
 │  │   生活习惯型: 140人(39%)│  │     ●  ○      ●          │      │
 │  │                         │  │       ○     ●            │      │
 │  │   点击查看详细建议...    │  │                         │      │
 │  │                         │  │   ● 模范型 ○ 压力型      │      │
 │  └─────────────────────────┘  └─────────────────────────┘      │
 │                                                                 │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ### 6.2 组件说明
 #### 6.2.1 员工群体雷达图
 ```
                    年龄
                      ▲
                     /|\
                    / | \
                   /  |  \
                  /   |   \
                 /    |    \
        工龄 ◄───────┼───────► 工作负荷
                 \    |    /
                  \   |   /
                   \  |  /
                    \ | /
                     \|/
                      ▼
             缺勤倾向     BMI
 各聚类特征（归一化）:
 ─────────────────────────────────────────
 模范型 (绿色):      0.75  0.90  0.60  0.55  0.20
 压力型 (橙色):      0.35  0.20  0.85  0.45  0.70
 生活习惯型 (红色):  0.55  0.50  0.65  0.80  0.45
 ```
 **ECharts配置要点**：
 ```javascript
 {
  title: { text: '员工群体画像' },
  legend: { data: ['模范型', '压力型', '生活习惯型'] },
  radar: {
    indicator: [
      { name: '年龄', max: 1 },
      { name: '工龄', max: 1 },
      { name: '工作负荷', max: 1 },
      { name: 'BMI', max: 1 },
      { name: '缺勤倾向', max: 1 }
    ]
  },
  series: [{
    type: 'radar',
    data: [
      { value: [0.75, 0.90, 0.60, 0.55, 0.20], name: '模范型' },
      { value: [0.35, 0.20, 0.85, 0.45, 0.70], name: '压力型' },
      { value: [0.55, 0.50, 0.65, 0.80, 0.45], name: '生活习惯型' }
    ]
  }]
 }
 ```
 #### 6.2.2 聚类结果统计
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │  聚类结果统计                                                │
 ├─────────────────────────────────────────────────────────────┤
 │                                                             │
 │  ┌─────────────────────────────────────────────────────┐   │
 │  │ 模范型员工                           120人 (33.3%)  │   │
 │  │ ████████████████████████████████                    │   │
 │  │ 特点: 工龄长、工作稳定、缺勤率低                     │   │
 │  │ 建议: 保持现有管理方式，可作为榜样员工               │   │
 │  └─────────────────────────────────────────────────────┘   │
 │                                                             │
 │  ┌─────────────────────────────────────────────────────┐   │
 │  │ 压力型员工                           100人 (27.8%)  │   │
 │  │ ████████████████████████                            │   │
 │  │ 特点: 年轻、工龄短、工作负荷大、缺勤较多             │   │
 │  │ 建议: 关注工作压力，适当减少加班                     │   │
 │  └─────────────────────────────────────────────────────┘   │
 │                                                             │
 │  ┌─────────────────────────────────────────────────────┐   │
 │  │ 生活习惯型员工                       140人 (38.9%)  │   │
 │  │ ████████████████████████████████████                │   │
 │  │ 特点: BMI偏高、有饮酒习惯、中等缺勤率                │   │
 │  │ 建议: 关注员工健康，组织体检和健康活动               │   │
 │  └─────────────────────────────────────────────────────┘   │
 │                                                             │
 └─────────────────────────────────────────────────────────────┘
 ```
 #### 6.2.3 聚类散点图
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                      聚类散点图                              │
 │                                                             │
 │  缺勤                                                        │
 │  时长                                                        │
 │    ▲                                                        │
 │ 40 │                           ○                            │
 │    │                    ○  ○                                │
 │ 30 │              ○              ○                         │
 │    │         ○  ●     ○  ○                                 │
 │ 20 │    ●  ●     ○        ○                                │
 │    │  ●  ●  ●     ○  ○                                     │
 │ 10 │●  ●     ○  ○     ○                                    │
 │    │●  ●  ○  ○        ○                                    │
 │  0 │●  ●  ○  ○  ○                                          │
 │    └─────────────────────────────────────────────────────►  │
 │     20    30    40    50    60    年龄                      │
 │                                                             │
 │  ● 模范型  ○ 压力型  ◐ 生活习惯型                           │
 └─────────────────────────────────────────────────────────────┘
 ```
 ### 6.3 交互流程
 1. 页面加载，默认使用3个聚类
 2. 渲染雷达图和散点图
 3. 用户可调整聚类数量（2-5）
 4. 点击"重新聚类"按钮更新结果
 5. 点击某个聚类可查看详细信息和建议
 6. 散点图支持鼠标悬停查看员工详情
 ---
 ## 7. 公共组件
 ### 7.1 ChartComponent.vue
 **用途**：封装ECharts图表，统一管理图表生命周期
 **Props**：
 | 属性 | 类型 | 默认值 | 说明 |
 |------|------|--------|------|
 | option | Object | {} | ECharts配置项 |
 | loading | Boolean | false | 是否加载中 |
 | height | String | '400px' | 图表高度 |
 | width | String | '100%' | 图表宽度 |
 **使用示例**：
 ```vue
 <ChartComponent 
  :option="chartOption" 
  :loading="loading"
  height="300px"
 />
 ```
 ### 7.2 ResultCard.vue
 **用途**：展示预测结果
 **Props**：
 | 属性 | 类型 | 默认值 | 说明 |
 |------|------|--------|------|
 | predictedHours | Number | 0 | 预测时长 |
 | riskLevel | String | 'low' | 风险等级 |
 | confidence | Number | 0 | 置信度 |
 ### 7.3 KPICard.vue
 **用途**：展示KPI指标卡片
 **Props**：
 | 属性 | 类型 | 默认值 | 说明 |
 |------|------|--------|------|
 | title | String | '' | 指标名称 |
 | value | String/Number | '' | 指标值 |
 | unit | String | '' | 单位 |
 | icon | String | '' | 图标类名 |
 | color | String | '#409EFF' | 主题色 |
 ### 7.4 LoadingSpinner.vue
 **用途**：加载动画组件
 **Props**：
 | 属性 | 类型 | 默认值 | 说明 |
 |------|------|--------|------|
 | text | String | '加载中...' | 提示文字 |
---
+## 1. 设计目标
-## 8. 配色方案
+- 界面适合毕业设计展示和答辩演示
 - 页面层次清楚，重点内容突出
 - 图表、卡片和表单布局统一
 - 兼顾桌面端展示效果与基本响应式适配
-### 8.1 主色调
+## 2. 整体设计风格
-| 用途 | 颜色值 | 说明 |
+- 左侧为固定导航栏
-|------|--------|------|
+- 右侧为主内容区域
-| 主色 | #409EFF | Element Plus主色 |
+- 采用卡片化布局组织统计、图表和预测内容
-| 成功 | #67C23A | 低风险、正向指标 |
+- 通过浅色/深色模式增强视觉表现
-| 警告 | #E6A23C | 中风险、需关注 |
+- 侧边栏支持折叠，提高界面灵活性
 | 危险 | #F56C6C | 高风险、异常 |
 | 信息 | #909399 | 辅助信息 |
-### 8.2 图表配色
+## 3. 页面原型说明
-```javascript
+### 3.1 数据概览页
 const chartColors = [
  '#5470c6',  // 蓝色
  '#91cc75',  // 绿色
  '#fac858',  // 黄色
  '#ee6666',  // 红色
  '#73c0de',  // 浅蓝
  '#3ba272',  // 深绿
  '#fc8452',  // 橙色
  '#9a60b4',  // 紫色
  '#ea7ccc'   // 粉色
 ];
 ```
---
+页面组成：
-## 9. 附录
+- 顶部页面头图
 - 第一行 KPI 卡片
 - 第二行趋势图与星期分布图
 - 第三行原因分布与季节分布图
-### 9.1 页面清单
+展示重点：
-| 页面 | 路由 | 主要图表 | 主要交互 |
+- 让评审快速看到系统能做总体统计与可视化
 |------|------|----------|----------|
 | 数据概览 | /dashboard | 折线图、饼图、柱状图 | 图表悬停、钻取 |
 | 影响因素 | /analysis | 条形图、热力图 | 维度切换 |
 | 缺勤预测 | /prediction | - | 表单提交 |
 | 员工画像 | /clustering | 雷达图、散点图 | 聚类数调整 |
-### 9.2 文档修改历史
+### 3.2 影响因素分析页
-| 版本 | 日期 | 修改人 | 修改内容 |
+页面组成：
 |------|------|--------|----------|
 | V1.0 | 2026-03 | 张硕 | 初始版本 |
---
+- 页面头图
 - 特征重要性横向条形图
 - 相关性热力图
 - 群体对比柱状图
-**文档结束**
+展示重点：
 - 模型可解释性
 - 关键因素主次关系
 ### 3.3 缺勤预测页
 页面组成：
 - 页面头图
 - 左侧多卡片输入区
 - 右侧结果与风险说明整合卡片
 - 模型对比卡片
 展示重点：
 - 界面紧凑、核心输入突出
 - 结果展示直观
 - 风险说明清晰易讲解
 ### 3.4 员工画像页
 页面组成：
 - 页面头图
 - 雷达图卡片
 - 聚类结果表格卡片
 - 散点图卡片
 展示重点：
 - 呈现人群分层能力
 - 用“典型群体”增强答辩展示效果
 ## 4. 导航设计
 - 数据概览
 - 影响因素
 - 缺勤预测
 - 员工画像
 导航要求：
 - 显示图标与文字
 - 支持侧边栏折叠
 - 折叠后保留图标
 ## 5. 视觉元素设计
 - KPI 使用大数字卡片
 - 图表外层统一卡片边框和圆角
 - 表单采用卡片分组，避免长表单垂直堆叠
 - 预测结果使用重点色突出
 - 风险等级使用颜色标签区分
 ## 6. 交互设计
 - 支持模型自动选择
 - 支持查看模型对比
 - 支持深色模式切换
 - 支持侧边栏折叠
 ## 7. 答辩展示建议
 - 先展示数据概览页，说明系统整体能力
 - 再进入影响因素页解释模型逻辑
 - 然后演示缺勤预测页输入与结果
 - 最后展示员工画像页总结系统分析能力
--- a/docs/05_毕业论文摘要与关键词草案.md
+++ b/docs/05_毕业论文摘要与关键词草案.md
@@ -0,0 +1,24 @@
 # 毕业论文摘要与关键词草案
 ## 中文摘要草案
 随着企业管理数字化水平的提升，员工缺勤行为分析逐渐成为人力资源管理中的重要研究内容。针对传统缺勤管理方式依赖人工统计、分析效率较低、风险预警能力不足等问题，本文设计并实现了一套基于中国企业员工缺勤事件分析与预测系统。系统围绕缺勤事件数据，构建了数据概览、影响因素分析、缺勤风险预测和员工群体画像四个核心模块，实现了缺勤时长统计分析、关键因素挖掘、多模型预测与聚类画像展示等功能。
 在系统实现过程中，后端采用 Flask 框架构建接口服务，结合 Pandas、Scikit-learn 与 PyTorch 完成数据处理、特征工程、模型训练与预测；前端采用 Vue 3、Element Plus 与 ECharts 实现交互式可视化界面。针对毕业设计场景，系统构建了一套符合中国企业特征的员工缺勤事件数据集，并设计了请假类型、医院证明、加班通勤压力、健康风险等关键影响因素。同时，为增强论文的算法研究内容，系统引入了 LSTM+MLP 深度学习模型，将员工历史缺勤事件序列与静态属性特征进行融合建模。实验结果表明，系统能够较好地完成缺勤时长预测任务，并通过可视化方式直观展现缺勤趋势、影响因素和员工群体特征。
 本文的研究工作对企业缺勤行为分析与管理辅助决策具有一定参考价值，同时也为后续扩展员工行为分析、离职预警和绩效管理等方向提供了基础。
 ## 关键词
 - 员工缺勤分析
 - 风险预测
 - 特征挖掘
 - 机器学习
 - 深度学习
 - 可视化系统
 - Vue
 - Flask
 ## 英文摘要标题参考
 Employee Absence Analysis and Prediction System Based on Multi-dimensional Feature Mining
--- a/docs/06_毕业论文目录与章节设计.md
+++ b/docs/06_毕业论文目录与章节设计.md
@@ -0,0 +1,72 @@
 # 毕业论文目录与章节设计
 ## 建议目录
 ### 摘要
 ### Abstract
 ### 第1章 绪论
 - 1.1 研究背景与意义
 - 1.2 国内外研究现状
 - 1.3 研究内容
 - 1.4 论文结构安排
 ### 第2章 相关技术与理论基础
 - 2.1 Flask 后端框架
 - 2.2 Vue 3 前端框架
 - 2.3 ECharts 可视化技术
 - 2.4 机器学习相关算法
 - 2.5 深度学习相关算法
 - 2.6 K-Means 聚类方法
 ### 第3章 系统需求分析
 - 3.1 可行性分析
 - 3.2 功能需求分析
 - 3.3 非功能需求分析
 - 3.4 业务流程分析
 ### 第4章 系统总体设计
 - 4.1 系统架构设计
 - 4.2 模块划分设计
 - 4.3 数据设计
 - 4.4 接口设计
 - 4.5 UI 设计
 ### 第5章 系统详细设计与实现
 - 5.1 数据概览模块实现
 - 5.2 影响因素分析模块实现
 - 5.3 缺勤预测模块实现
 - 5.4 LSTM+MLP 深度学习模型实现
 - 5.5 员工画像模块实现
 - 5.6 前端界面实现
 ### 第6章 系统测试与结果分析
 - 6.1 测试环境
 - 6.2 功能测试
 - 6.3 接口测试
 - 6.4 传统模型与深度学习模型对比
 - 6.5 系统展示效果分析
 ### 第7章 总结与展望
 - 7.1 研究总结
 - 7.2 不足分析
 - 7.3 后续展望
 ### 参考文献
 ### 致谢
 ## 章节写作建议
 - 第1章强调课题意义和系统定位
 - 第3章与第4章突出系统分析与设计能力
 - 第5章重点写实现过程和关键代码逻辑
 - 第6章突出系统已完成的功能效果与模型结果
--- a/docs/07_毕业论文写作提纲.md
+++ b/docs/07_毕业论文写作提纲.md
@@ -0,0 +1,109 @@
 # 毕业论文写作提纲
 ## 第1章 绪论
 写作要点：
 - 说明企业缺勤管理的重要性
 - 说明传统方式存在的问题
 - 引出本系统的设计目标和研究意义
 可写内容：
 - 企业缺勤对生产效率和管理成本的影响
 - 数据驱动管理在企业中的应用趋势
 - 本课题的研究价值和实践意义
 ## 第2章 相关技术与理论基础
 写作要点：
 - 简要介绍本系统使用的主要技术
 - 介绍预测与聚类相关算法原理
 可写内容：
 - Flask 的基本特点
 - Vue 3 的组件化优势
 - Element Plus 和 ECharts 的可视化能力
 - 随机森林、GBDT、Extra Trees 的基本原理
 - LSTM 与 MLP 的基本原理
 - 时序序列建模与多输入融合思想
 - K-Means 聚类思想
 ## 第3章 系统需求分析
 写作要点：
 - 从用户需求和业务目标两方面展开
 - 使用模块化方式描述功能需求
 可写内容：
 - 数据概览需求
 - 影响因素分析需求
 - 缺勤预测需求
 - 员工画像需求
 - 易用性与可维护性要求
 ## 第4章 系统总体设计
 写作要点：
 - 用架构图、模块图、流程图等方式说明设计思路
 - 体现系统工程设计能力
 可写内容：
 - 前后端分离架构
 - 功能模块划分
 - 数据集字段设计
 - 接口交互流程
 - 页面原型说明
 ## 第5章 系统实现
 写作要点：
 - 这是论文主体章节，重点描述“怎么做”
 - 结合关键代码和界面截图说明实现过程
 可写内容：
 - 数据生成与预处理实现
 - 特征工程实现
 - 模型训练与保存实现
 - LSTM+MLP 深度学习训练流程
 - 后端接口实现
 - 前端页面实现
 - 预测页卡片布局与交互实现
 ## 第6章 系统测试与分析
 写作要点：
 - 用表格和截图体现系统已经完成
 - 模型效果与页面效果都要写
 可写内容：
 - 页面访问测试
 - 接口联调测试
 - 预测功能测试
 - 聚类与分析结果测试
 - 模型性能指标分析
 - 传统模型与深度学习模型对比分析
 ## 第7章 总结与展望
 写作要点：
 - 总结本课题完成的主要工作
 - 承认不足并提出改进方向
 可写内容：
 - 已完成的系统功能
 - 论文研究成果
 - 系统存在的限制
 - 后续可扩展方向
--- a/docs/08_答辩汇报提纲.md
+++ b/docs/08_答辩汇报提纲.md
@@ -0,0 +1,62 @@
 # 答辩汇报提纲
 ## 1. 课题背景
 - 企业缺勤管理存在统计分散、分析不及时、预测能力不足的问题
 - 本课题旨在构建一个可视化、可分析、可预测的缺勤管理辅助系统
 ## 2. 课题目标
 - 展示缺勤数据整体分布
 - 分析关键影响因素
 - 实现缺勤风险预测
 - 构建员工群体画像
 ## 3. 系统总体设计
 - 前后端分离架构
 - 前端负责界面与图表
 - 后端负责数据处理、模型预测与聚类分析
 ## 4. 核心功能展示顺序
 ### 4.1 数据概览
 - 展示总量指标
 - 展示月度趋势、星期分布、原因分布、季节分布
 ### 4.2 影响因素分析
 - 展示特征重要性排序
 - 解释为什么请假类型、医院证明、加班通勤压力等因素更重要
 ### 4.3 缺勤预测
 - 输入关键字段
 - 展示预测时长与风险等级
 - 展示模型对比结果
 ### 4.4 员工画像
 - 展示群体雷达图
 - 展示聚类结果与散点图
 ## 5. 技术实现亮点
 - 前后端分离结构清晰
 - 采用多模型训练与比较
 - 引入 LSTM+MLP 深度学习模型，支持时序行为建模
 - 融合特征工程与聚类分析
 - 前端页面采用卡片式可视化布局，适合展示
 ## 6. 项目成果
 - 系统可完成统计、分析、预测、画像四类任务
 - 页面可视化效果完整
 - 项目文档和论文材料配套齐全
 ## 7. 不足与改进方向
 - 可进一步引入真实企业数据
 - 可加入更复杂的深度学习模型
 - 可引入权限管理、报表导出和数据库存储
--- a/docs/09_环境配置与安装说明.md
+++ b/docs/09_环境配置与安装说明.md
@@ -0,0 +1,193 @@
 # 环境配置与安装说明
 ## 1. 推荐环境
 为保证传统机器学习模型和 `LSTM+MLP` 深度学习模型均可正常训练，推荐使用 **conda 虚拟环境** 管理本项目依赖。
 推荐环境：
 - 操作系统：Windows 10 / Windows 11
 - Python：3.11
 - Conda：Anaconda 或 Miniconda
 - Node.js：16+
 - pnpm：8+
 - CUDA：建议与 PyTorch GPU 轮子版本匹配
 ## 2. 创建 conda 虚拟环境
 ```powershell
 conda create -n forsetenv python=3.11 -y
 conda activate forsetenv
 ```
 说明：
 - 后续所有 Python 依赖安装、数据生成、模型训练和后端启动，均建议在 `forsetenv` 环境中进行。
 ## 3. 推荐安装顺序
 推荐严格按下面顺序执行：
 1. 创建并激活 `conda` 虚拟环境
 2. 单独安装 `PyTorch GPU` 版
 3. 安装其余后端依赖
 4. 安装前端依赖
 说明：
 - `backend/requirements.txt` 中包含 `torch==2.6.0`
 - 如果在 Windows 下先直接执行 `pip install -r backend/requirements.txt`，可能安装成非预期构建
 - 因此深度学习环境建议先执行官方 `cu124` 安装命令，再补齐其余依赖
 ## 4. 安装 PyTorch GPU 版
 本项目的 hybrid 深度学习模型要求：
 - `torch >= 2.6`
 推荐安装方式：
 - 使用 **pip 官方 cu124 轮子**
 - 避免在 Windows 上由 conda 自动解析成 `cpu_mkl` 构建
 安装命令如下：
 ```powershell
 pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
 ```
 ## 5. 安装其余后端依赖
 如果你已经按上一步安装了 GPU 版 `torch`，推荐补装其余后端依赖：
 ```powershell
 pip install Flask==2.3.3 Flask-CORS==4.0.0 python-dotenv==1.0.0
 pip install pandas==2.0.3 numpy==1.24.3 scikit-learn==1.3.0 joblib==1.3.1
 pip install xgboost==1.7.6 lightgbm==4.1.0
 ```
 如果你仍然希望直接使用依赖文件，可以在完成 GPU 版 `torch` 安装后执行：
 ```powershell
 pip install -r backend/requirements.txt
 ```
 这一步通常不会影响已经安装好的 `cu124` 版本；如有覆盖风险，可在执行后再次运行上一节的 GPU 安装命令。
 ## 6. 安装前端依赖
 ```powershell
 cd frontend
 pnpm install
 ```
 ## 7. 一键执行示例
 下面是一套推荐的 `conda` 环境安装流程：
 ```powershell
 conda create -n forsetenv python=3.11 -y
 conda activate forsetenv
 pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
 pip install Flask==2.3.3 Flask-CORS==4.0.0 python-dotenv==1.0.0
 pip install pandas==2.0.3 numpy==1.24.3 scikit-learn==1.3.0 joblib==1.3.1
 pip install xgboost==1.7.6 lightgbm==4.1.0
 cd frontend
 pnpm install
 ```
 ## 8. 验证安装
 ### 8.1 验证基础依赖
 ```powershell
 python -c "import pandas,numpy,sklearn,flask;print('base ok')"
 ```
 ### 8.2 验证传统模型依赖
 ```powershell
 python -c "import xgboost,lightgbm;print('ml ok')"
 ```
 ### 8.3 验证 PyTorch GPU
 ```powershell
 python -c "import torch;print(torch.__version__);print(torch.cuda.is_available())"
 ```
 如果输出为 `True`，说明 GPU 版本 PyTorch 可正常使用。
 ## 9. 项目启动顺序
 ### 9.1 生成数据集
 ```powershell
 cd backend
 python core/generate_dataset.py
 ```
 ### 9.2 训练模型
 ```powershell
 python core/train_model.py
 ```
 ### 9.3 启动后端
 ```powershell
 python app.py
 ```
 ### 9.4 启动前端
 ```powershell
 cd ..\frontend
 pnpm dev
 ```
 ## 10. 常见问题
 ### 10.1 PyTorch 被安装成 CPU 版
 原因：
 - 使用了默认 `pip install torch`
 - 或使用 conda 在 Windows 上自动解析成 CPU 构建
 建议：
 - 直接使用本文提供的官方 `cu124` 安装命令
 ### 10.2 训练过程中无法加载深度学习模型
 检查项：
 - 当前是否处于 `forsetenv` conda 环境
 - `torch` 是否成功安装
 - `torch.cuda.is_available()` 是否为 `True`
 ### 10.3 xgboost / lightgbm 缺失
 可执行：
 ```powershell
 pip install xgboost==1.7.6 lightgbm==4.1.0
 ```
 ### 10.4 如何确认当前使用的是 conda 环境
 可执行：
 ```powershell
 conda info --envs
 where python
 ```
 如果当前环境为 `forsetenv`，且 `python` 指向对应环境目录，说明切换成功。
 ## 11. 建议
 - 毕设演示或论文实验时，统一使用 `conda activate forsetenv`
 - 深度学习模型训练时优先使用 GPU 环境
 - 若仅进行页面展示，可先训练传统模型，再补充深度学习实验结果
--- a/docs/1.md
+++ b/docs/1.md
@@ -1,111 +0,0 @@
 这是一个典型的**前后端分离**架构的毕设项目结构。为了契合你的题目《基于多维特征挖掘的员工缺勤分析与预测系统设计与实现》，我们将项目分为 `Backend`（Python 后端，负责算法与逻辑）和 `Frontend`（Vue 前端，负责展示与交互）。
 以下是详细的工程目录结构及说明：
 ---
 ### 📁 项目根目录：Absenteeism_Analysis_System/
 ```
 Absenteeism_Analysis_System/
 │
 ├── backend/               # 后端项目目录 (Python/Flask)
 │   ├── app.py             # 程序入口文件 (启动服务)
 │   ├── config.py          # 配置文件 (路径、密钥等)
 │   ├── requirements.txt   # Python依赖库清单 (pandas, scikit-learn, flask等)
 │   │
 │   ├── data/              # 数据存储目录
 │   │   ├── raw/           # 原始数据集
 │   │   │   └── Absenteeism_at_work.csv  # 从UCI下载的原始数据
 │   │   └── processed/     # 处理后的数据集
 │   │       └── clean_data.csv           # 经过清洗、编码后的数据
 │   │
 │   ├── models/            # 模型存储目录
 │   │   ├── rf_model.pkl   # 训练好的随机森林模型文件
 │   │   ├── xgb_model.pkl  # 训练好的XGBoost模型文件
 │   │   └── kmeans_model.pkl # 聚类模型文件
 │   │
 │   ├── core/              # 核心算法模块 (对应论文的“多维特征挖掘”)
 │   │   ├── __init__.py
 │   │   ├── preprocessing.py  # 数据预处理：缺失值填充、独热编码、归一化
 │   │   ├── feature_mining.py # 特征挖掘：相关性分析、特征重要性计算
 │   │   ├── train_model.py    # 模型训练脚本：训练RF/XGBoost并保存模型
 │   │   └── clustering.py     # 聚类分析：K-Means算法实现
 │   │
 │   ├── services/          # 业务逻辑层
 │   │   ├── __init__.py
 │   │   ├── analysis_service.py  # 分析服务：调用特征挖掘模块，返回图表数据
 │   │   ├── predict_service.py   # 预测服务：加载模型，进行推理
 │   │   └── data_service.py      # 数据服务：读取CSV，提供基础统计
 │   │
 │   ├── api/               # API接口层 (路由)
 │   │   ├── __init__.py
 │   │   ├── analysis_routes.py  # 接口：获取特征重要性、相关性等
 │   │   ├── predict_routes.py   # 接口：接收前端表单，返回预测结果
 │   │   └── cluster_routes.py   # 接口：返回聚类结果/员工画像
 │   │
 │   └── utils/             # 工具函数
 │       └── common.py      # 通用工具：JSON封装、CORS处理等
 │
 ├── frontend/              # 前端项目目录
 │   ├── public/            # 静态资源
 │   ├── src/
 │   │   ├── assets/        # 资源文件 (图片、样式)
 │   │   ├── components/    # 公共组件
 │   │   │   ├── ChartComponent.vue  # ECharts图表封装组件
 │   │   │   └── ResultCard.vue      # 预测结果展示卡片
 │   │   │
 │   │   ├── views/         # 页面视图 (对应你的前端设计)
 │   │   │   ├── Dashboard.vue       # 页面一：数据概览与统计
 │   │   │   ├── FactorAnalysis.vue  # 页面二：影响因素分析 (核心)
 │   │   │   ├── Prediction.vue      # 页面三：缺勤预测 (输入表单+结果)
 │   │   │   └── Clustering.vue      # 页面四：员工画像与聚类
 │   │   │
 │   │   ├── router/        # 路由配置
 │   │   │   └── index.js
 │   │   ├── api/           # 前端API调用封装
 │   │   │   └── request.js # 配置axios，连接后端接口
 │   │   ├── App.vue        # 根组件
 │   │   └── main.js        # 入口文件
 │   │
 │   ├── package.json       # 前端依赖 (vue, element-plus, echarts)
 │   └── vite.config.js     # Vue构建配置 (如果用Vite) 或 vue.config.js
 │
 └── README.md              # 项目说明文档
 ```
 ---
 ### 🔧 核心模块功能详解（对应论文）
 #### 1. 后端 `core/` 模块详解
 这是你论文中“算法设计”部分的代码落地：
 *   **`preprocessing.py`**:
    *   实现 `OneHotEncoder` 处理 `Reason for absence` 等类别。
    *   实现 `StandardScaler` 处理 `Transportation expense` 等数值。
    *   实现 `get_clean_data()` 函数，供其他模块调用。
 *   **`feature_mining.py`**:
    *   实现 `calculate_correlation()`: 使用 Pandas 计算相关系数矩阵。
    *   实现 `get_feature_importance()`: 加载随机森林模型，提取 `feature_importances_`。
 *   **`train_model.py`**:
    *   包含 `train_rf()` 和 `train_xgboost()` 函数。
    *   负责划分训练集/测试集，计算 MSE/R2，并保存 `.pkl` 文件。
 *   **`clustering.py`**:
    *   包含 `kmeans_analysis()` 函数。
    *   对员工进行分类，并返回每个簇的中心点数据（用于画雷达图）。
 #### 2. 后端 `api/` 模块详解
 这是前后端交互的桥梁：
 *   **`analysis_routes.py`**:
    *   路由: `/api/analysis/importance`
    *   作用: 调用 `feature_mining.py`，返回格式化后的 JSON (例如 `{'feature': 'Distance', 'score': 0.45}`) 供前端画柱状图。
 *   **`predict_routes.py`**:
    *   路由: `/api/predict/single`
    *   作用: 接收前端 POST 过来的 JSON (表单数据)，调用 `predict_service.py` 加载模型，返回预测的小时数。
 #### 3. 前端 `views/` 模块详解
 这是你论文中“系统实现”部分的展示：
 *   **`FactorAnalysis.vue`**:
    *   引入 `ECharts`。
    *   `mounted()` 生命周期里调用 `/api/analysis/importance`。
    *   渲染：横向柱状图（特征重要性）、热力图（相关性矩阵）。
 *   **`Prediction.vue`**:
    *   左侧：`el-form` 表单，字段包含 `age`, `distance`, `drink` 等。
    *   右侧：`el-card`，初始状态为空，点击“开始预测”后，显示返回的 `hours` 和风险等级（红/黄/绿标签）。
 ---
 ### 💡 为什么要这样设计？
 1.  **分层清晰（MVC模式）**：`core` 专心算算法，`services` 处理业务，`api` 管接口。这在答辩时非常有优势，老师会觉得你代码结构规范，软件工程素养好。
 2.  **算法模块化**：你没有把算法代码直接写在接口文件里，而是单独封装了 `core` 目录。这意味着如果以后你想换算法（比如不用随机森林改用神经网络），只需要改 `core` 里的代码，不需要动接口，这叫**“高内聚低耦合”**。
 3.  **数据隔离**：原始数据 (`raw`) 和 清洗数据 (`processed`) 分开，训练好的模型单独放在 `models`，这样项目管理非常干净，不会一团乱麻。
 按照这个结构去写代码和画图（系统架构图、时序图），你的论文逻辑会非常顺畅！
--- a/docs/10_题目名称_技术路线_预期结果.md
+++ b/docs/10_题目名称_技术路线_预期结果.md
@@ -0,0 +1,23 @@
 # 题目名称、主要技术路线或方法、预期结果
 ## 1. 题目名称
 基于中国企业员工缺勤事件的分析与预测系统设计与实现
 ## 2. 论文（设计）采取的主要技术路线或方法
 本课题围绕企业员工缺勤管理场景，采用前后端分离的系统设计思路开展研究与实现。前端基于 `Vue 3`、`Element Plus` 和 `ECharts` 构建可视化展示界面，实现缺勤趋势、影响因素、预测结果和员工画像等内容的交互式展示；后端基于 `Flask` 搭建接口服务，负责数据处理、分析计算、模型推理和聚类结果组织。
 在数据处理方面，首先结合项目内部构建的中国企业员工缺勤事件数据集，使用 `Pandas` 和 `NumPy` 完成数据清洗、字段转换、统计分析与特征整理。随后围绕员工属性、岗位信息、班次安排、健康风险、请假原因、通勤压力和加班情况等因素进行特征工程，形成适用于分析与预测的结构化数据。
 在算法研究方面，课题采用传统机器学习与深度学习相结合的技术路线。传统模型依托 `scikit-learn`、`XGBoost` 和 `LightGBM` 完成缺勤时长预测与模型对比分析，并通过特征重要性排序和相关性分析挖掘关键影响因素；深度学习部分基于 `PyTorch` 构建 `LSTM+MLP` 融合模型，将员工历史缺勤事件序列与静态属性特征结合，用于提升预测研究的完整性和论文的技术深度。
 在员工画像分析方面，课题采用 `K-Means` 聚类方法对员工缺勤行为进行分群，结合散点图、雷达图和群体说明完成群体画像展示，从而辅助企业识别不同类型的缺勤风险群体。最终通过系统集成与前后端联调，实现缺勤数据概览、影响因素分析、单次缺勤预测和员工画像分析四个核心功能模块。
 ## 3. 论文（设计）预期结果
 本课题预期完成一个可运行、可展示、可支撑论文撰写的员工缺勤分析与预测系统。系统能够实现缺勤事件统计展示、趋势分析、原因分布分析、关键因素挖掘、缺勤时长预测、风险等级评估和员工群体画像展示等功能，满足本科毕业设计对系统实现和功能展示的要求。
 在研究结果方面，预期能够形成一套较完整的员工缺勤分析方法流程，包括数据预处理、特征工程、相关性分析、特征重要性评估、预测建模和聚类画像分析。系统应能够根据输入的关键业务字段输出缺勤时长预测结果、风险等级和多模型对比结果，并通过可视化图表直观展示分析结论，为企业人力资源管理提供辅助决策依据。
 在论文成果方面，预期形成与项目实现一致的毕业设计文档与论文材料，包括需求分析、系统架构设计、接口设计、数据设计、系统实现、实验分析和总结展望等内容，并能够支撑后续开题、中期检查、论文提交和答辩展示工作。
--- a/docs/2.md
+++ b/docs/2.md
@@ -1,48 +0,0 @@
 基于你的项目架构和题目《基于多维特征挖掘的员工缺勤分析与预测系统设计与实现》，预期实现的功能可以分为四个核心模块。你可以直接把这些内容写到开题报告的“研究内容”或“系统功能需求”章节里。
 ---
 ### 一、 数据概览与全局统计分析功能
 这是系统的“仪表盘”，让用户对整体情况一目了然。
 *   **多维统计展示：**
    *   **功能描述：** 系统自动加载 UCI 考勤数据集，展示基础统计指标（样本总数、缺勤总时长、平均缺勤时长、最大/最小缺勤时长）。
    *   **实现价值：** 帮助管理者快速了解企业整体考勤健康状况。
 *   **时间维度趋势分析：**
    *   **功能描述：** 以折线图形式展示全年（1-12月）的缺勤变化趋势；以柱状图展示周一至周五的缺勤分布；以饼图展示不同季节（春夏秋冬）的缺勤比例。
    *   **实现价值：** 识别出缺勤的高发时间段（例如：发现周五缺勤率最高，或夏季缺勤最多）。
 ### 二、 多维特征挖掘与影响因素分析功能
 这是系统的核心亮点，对应题目中的“多维特征挖掘”，解决“为什么缺勤”的问题。
 *   **特征重要性排序：**
    *   **功能描述：** 利用训练好的随机森林模型，计算并展示各维度特征对缺勤的影响权重。例如：柱状图显示“通勤距离”影响最大，“BMI指数”次之，“宠物数量”影响最小。
    *   **实现价值：** 量化指标，让管理者直观看到哪些是导致缺勤的“罪魁祸首”。
 *   **关联性热力图分析：**
    *   **功能描述：** 计算特征之间的相关系数矩阵，以热力图形式展示。重点突出“生活习惯”（如 Social drinker）与“缺勤时长”之间的强相关关系。
    *   **实现价值：** 挖掘隐性规律，比如发现“爱喝酒的员工”更容易“无故缺勤”，为制定公司制度（如禁止酒后上岗）提供数据支持。
 *   **群体特征对比：**
    *   **功能描述：** 提供分组统计功能，对比不同群体（如：高学历 vs 低学历，有子女 vs 无子女）的平均缺勤时长。
    *   **实现价值：** 细分人群，实现精细化管理。
 ### 三、 员工缺勤风险预测功能
 这是系统的实用工具，对应题目中的“预测”，解决“未来会怎样”的问题。
 *   **单次缺勤时长预测：**
    *   **功能描述：** 提供一个交互式表单，用户输入（或选择）某员工的各项属性（年龄、距离、交通费、BMI、是否饮酒、月份、工作负荷等），系统调用后台预测模型（XGBoost/RF），实时返回预测的缺勤时长（例如：预测结果为 8 小时）。
    *   **实现价值：** 当某个月工作负荷很大或季节变化时，可提前预判该员工的缺勤情况。
 *   **缺勤风险等级评估：**
    *   **功能描述：** 根据预测时长，自动将员工标记为“低风险（绿色）”、“中风险（黄色）”或“高风险（红色）”。
    *   **实现价值：** 快速筛选出需要重点关注的“刺头”员工或困难员工。
 *   **新入职员工评估（扩展）：**
    *   **功能描述：** 针对没有历史数据的新员工，仅凭其入职时的属性信息（如居住地、年龄、体检BMI等），系统给出其潜在缺勤风险的预估。
    *   **实现价值：** 辅助HR在招聘环节进行人员筛选。
 ### 四、 员工画像与群体聚类功能
 这是系统的高级分析功能，展示算法对人群的分类能力。
 *   **K-Means 聚类分析：**
    *   **功能描述：** 系统利用 K-Means 算法自动将所有员工划分为 3-4 个类别（如：模范型、压力型、生活习惯型）。
 *   **员工群体画像（雷达图）：**
    *   **功能描述：** 对每个聚类群体的特征（工龄、负荷、BMI、距离、缺勤倾向）绘制雷达图。
    *   **实现价值：** 
        *   比如识别出“压力型群体”（工龄短、负荷极大、缺勤多），建议HR减少加班；
        *   识别出“生活习惯型群体”（BMI高、爱喝酒），建议HR关注体检。
 ### 五、 系统管理功能
 基础功能，保证系统的可用性。
 *   **数据导入与更新：** 支持上传新的 CSV 考勤文件，系统自动解析并更新数据库。
 *   **模型管理：** 展示当前使用的算法模型（随机森林/XGBoost）以及该模型在测试集上的准确率、均方误差（MSE）等性能指标。
 ---
 ### 💡 总结一句话
 本系统预期实现从**“数据录入”**到**“可视化统计”**，再到**“深度归因分析”**，最后实现**“精准风险预测”**和**“人群画像划分”**的全流程功能，能够为企业提供一套完整的人力资源考勤数据智能分析解决方案。
--- a/docs/3.md
+++ b/docs/3.md
@@ -1,221 +0,0 @@
 Data Set Name:
 Absenteeism at work - Part I
 Abstract:
 The database was created with records of absenteeism at work from July 2007 to July 2010 at a courier company in Brazil.
 Source:
 Creators original owner and donors: Andrea Martiniano (1), Ricardo Pinto Ferreira (2), and Renato Jose Sassi (3).
 E-mail address: 
 andrea.martiniano'@'gmail.com (1) - PhD student;
 log.kasparov'@'gmail.com (2) - PhD student;
 sassi'@'uni9.pro.br (3) - Prof. Doctor.
 Universidade Nove de Julho - Postgraduate Program in Informatics and Knowledge Management.
 Address: Rua Vergueiro, 235/249 Liberdade, Sao Paulo, SP, Brazil. Zip code: 01504-001.
 Website: http://www.uninove.br/curso/informatica-e-gestao-do-conhecimento/
 Data Type: Multivariate   Univariate   Sequential   Time-Series   Text   Domain-Theory  
 Task: Classification   Regression   Clustering   Causal Discovery
 Attribute Type: Categorical   Integer   Real
 Area: Life Sciences Physical Sciences CS / Engineering Social Sciences Business Game Other
 Format Type: Matrix Non-Matrix
 Does your data set contain missing values? Yes No
 Number of Instances (records in your data set): 
 Number of Attributes (fields within each record): 
 *-*-*-*-*-*
 Relevant Information:
 The data set allows for several new combinations of attributes and attribute exclusions, or the modification of the attribute type (categorical, integer, or real) depending on the purpose of the research.The data set (Absenteeism at work - Part I) was used in academic research at the Universidade Nove de Julho - Postgraduate Program in Informatics and Knowledge Management.
 Attribute Information:
 1. Individual identification (ID)
 2. Reason for absence (ICD).
 Absences attested by the International Code of Diseases (ICD) stratified into 21 categories (I to XXI) as follows:
 I Certain infectious and parasitic diseases  
 II Neoplasms  
 III Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism  
 IV Endocrine, nutritional and metabolic diseases  
 V Mental and behavioural disorders  
 VI Diseases of the nervous system  
 VII Diseases of the eye and adnexa  
 VIII Diseases of the ear and mastoid process  
 IX Diseases of the circulatory system  
 X Diseases of the respiratory system  
 XI Diseases of the digestive system  
 XII Diseases of the skin and subcutaneous tissue  
 XIII Diseases of the musculoskeletal system and connective tissue  
 XIV Diseases of the genitourinary system  
 XV Pregnancy, childbirth and the puerperium  
 XVI Certain conditions originating in the perinatal period  
 XVII Congenital malformations, deformations and chromosomal abnormalities  
 XVIII Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified  
 XIX Injury, poisoning and certain other consequences of external causes  
 XX External causes of morbidity and mortality  
 XXI Factors influencing health status and contact with health services.
 And 7 categories without (CID) patient follow-up (22), medical consultation (23), blood donation (24), laboratory examination (25), unjustified absence (26), physiotherapy (27), dental consultation (28).
 3. Month of absence
 4. Day of the week (Monday (2), Tuesday (3), Wednesday (4), Thursday (5), Friday (6))
 5. Seasons (summer (1), autumn (2), winter (3), spring (4))
 6. Transportation expense
 7. Distance from Residence to Work (kilometers)
 8. Service time
 9. Age
 10. Work load Average/day 
 11. Hit target
 12. Disciplinary failure (yes=1; no=0)
 13. Education (high school (1), graduate (2), postgraduate (3), master and doctor (4))
 14. Son (number of children)
 15. Social drinker (yes=1; no=0)
 16. Social smoker (yes=1; no=0)
 17. Pet (number of pet)
 18. Weight
 19. Height
 20. Body mass index
 21. Absenteeism time in hours (target)
 .arff header for Weka: 
@relation Absenteeism_at_work
@attribute ID {31.0, 27.0, 19.0, 30.0, 7.0, 20.0, 24.0, 32.0, 3.0, 33.0, 26.0, 29.0, 18.0, 25.0, 17.0, 14.0, 16.0, 23.0, 2.0, 21.0, 36.0, 15.0, 22.0, 5.0, 12.0, 9.0, 6.0, 34.0, 10.0, 28.0, 13.0, 11.0, 1.0, 4.0, 8.0, 35.0}
@attribute Reason_for_absence {17.0, 3.0, 15.0, 4.0, 21.0, 2.0, 9.0, 24.0, 18.0, 1.0, 12.0, 5.0, 16.0, 7.0, 27.0, 25.0, 8.0, 10.0, 26.0, 19.0, 28.0, 6.0, 23.0, 22.0, 13.0, 14.0, 11.0, 0.0}
@attribute Month_of_absence REAL
@attribute Day_of_the_week {5.0, 2.0, 3.0, 4.0, 6.0}
@attribute Seasons {4.0, 1.0, 2.0, 3.0}
@attribute Transportation_expense REAL
@attribute Distance_from_Residence_to_Work REAL
@attribute Service_time INTEGER
@attribute Age INTEGER
@attribute Work_load_Average/day_ REAL
@attribute Hit_target REAL
@attribute Disciplinary_failure {1.0, 0.0}
@attribute Education REAL
@attribute Son REAL
@attribute Social_drinker {1.0, 0.0}
@attribute Social_smoker {1.0, 0.0}
@attribute Pet REAL
@attribute Weight REAL
@attribute Height REAL
@attribute Body_mass_index REAL
@attribute Absenteeism_time_in_hours REAL
 Relevant Papers:
 Martiniano, A., Ferreira, R. P., Sassi, R. J., & Affonso, C. (2012). Application of a neuro fuzzy network in prediction of absenteeism at work. In Information Systems and Technologies (CISTI), 7th Iberian Conference on (pp. 1-4). IEEE.
 Citation Requests / Acknowledgements:
 Martiniano, A., Ferreira, R. P., Sassi, R. J., & Affonso, C. (2012). Application of a neuro fuzzy network in prediction of absenteeism at work. In Information Systems and Technologies (CISTI), 7th Iberian Conference on (pp. 1-4). IEEE.
 Acknowledgements:
 Professor Gary Johns for contributing to the selection of relevant research attributes.
 Professor Emeritus of Management
 Honorary Concordia University Research Chair in Management
 John Molson School of Business
 Concordia University
 Montreal, Quebec, Canada
 Adjunct Professor, OB/HR Division
 Sauder School of Business,
 University of British Columbia
 Vancouver, British Columbia, Canada
 ---------------------------------------------------------------------------
 Attribute Information:
 1. Individual identification (ID)
 2. Reason for absence (ICD).
 Absences attested by the International Code of Diseases (ICD) stratified into 21 categories (I to XXI) as follows:
 I Certain infectious and parasitic diseases  
 II Neoplasms  
 III Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism  
 IV Endocrine, nutritional and metabolic diseases  
 V Mental and behavioural disorders  
 VI Diseases of the nervous system  
 VII Diseases of the eye and adnexa  
 VIII Diseases of the ear and mastoid process  
 IX Diseases of the circulatory system  
 X Diseases of the respiratory system  
 XI Diseases of the digestive system  
 XII Diseases of the skin and subcutaneous tissue  
 XIII Diseases of the musculoskeletal system and connective tissue  
 XIV Diseases of the genitourinary system  
 XV Pregnancy, childbirth and the puerperium  
 XVI Certain conditions originating in the perinatal period  
 XVII Congenital malformations, deformations and chromosomal abnormalities  
 XVIII Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified  
 XIX Injury, poisoning and certain other consequences of external causes  
 XX External causes of morbidity and mortality  
 XXI Factors influencing health status and contact with health services.
 And 7 categories without (CID) patient follow-up (22), medical consultation (23), blood donation (24), laboratory examination (25), unjustified absence (26), physiotherapy (27), dental consultation (28).
 3. Month of absence
 4. Day of the week (Monday (2), Tuesday (3), Wednesday (4), Thursday (5), Friday (6))
 5. Seasons
 6. Transportation expense
 7. Distance from Residence to Work (kilometers)
 8. Service time
 9. Age
 10. Work load Average/day 
 11. Hit target
 12. Disciplinary failure (yes=1; no=0)
 13. Education (high school (1), graduate (2), postgraduate (3), master and doctor (4))
 14. Son (number of children)
 15. Social drinker (yes=1; no=0)
 16. Social smoker (yes=1; no=0)
 17. Pet (number of pet)
 18. Weight
 19. Height
 20. Body mass index
 21. Absenteeism time in hours (target)
 .arff header for Weka: 
@relation Absenteeism_at_work
@attribute ID {31.0, 27.0, 19.0, 30.0, 7.0, 20.0, 24.0, 32.0, 3.0, 33.0, 26.0, 29.0, 18.0, 25.0, 17.0, 14.0, 16.0, 23.0, 2.0, 21.0, 36.0, 15.0, 22.0, 5.0, 12.0, 9.0, 6.0, 34.0, 10.0, 28.0, 13.0, 11.0, 1.0, 4.0, 8.0, 35.0}
@attribute Reason_for_absence {17.0, 3.0, 15.0, 4.0, 21.0, 2.0, 9.0, 24.0, 18.0, 1.0, 12.0, 5.0, 16.0, 7.0, 27.0, 25.0, 8.0, 10.0, 26.0, 19.0, 28.0, 6.0, 23.0, 22.0, 13.0, 14.0, 11.0, 0.0}
@attribute Month_of_absence REAL
@attribute Day_of_the_week {5.0, 2.0, 3.0, 4.0, 6.0}
@attribute Seasons {4.0, 1.0, 2.0, 3.0}
@attribute Transportation_expense REAL
@attribute Distance_from_Residence_to_Work REAL
@attribute Service_time INTEGER
@attribute Age INTEGER
@attribute Work_load_Average/day_ REAL
@attribute Hit_target REAL
@attribute Disciplinary_failure {1.0, 0.0}
@attribute Education REAL
@attribute Son REAL
@attribute Drinker {1.0, 0.0}
@attribute Smoker {1.0, 0.0}
@attribute Pet REAL
@attribute Weight REAL
@attribute Height REAL
@attribute Body_mass_index REAL
@attribute Absenteeism_time_in_hours REAL
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,31 @@
 # 毕设文档目录
 本目录用于存放“基于中国企业员工缺勤事件分析与预测系统”的毕业设计配套文档。
 ## 系统文档
 - [00_需求规格说明书.md](D:/VScodeProject/forsetsystem/docs/00_需求规格说明书.md)
 - [01_系统架构设计.md](D:/VScodeProject/forsetsystem/docs/01_系统架构设计.md)
 - [02_接口设计文档.md](D:/VScodeProject/forsetsystem/docs/02_接口设计文档.md)
 - [03_数据设计文档.md](D:/VScodeProject/forsetsystem/docs/03_数据设计文档.md)
 - [04_UI原型设计.md](D:/VScodeProject/forsetsystem/docs/04_UI原型设计.md)
 ## 论文配套文档
 - [05_毕业论文摘要与关键词草案.md](D:/VScodeProject/forsetsystem/docs/05_毕业论文摘要与关键词草案.md)
 - [06_毕业论文目录与章节设计.md](D:/VScodeProject/forsetsystem/docs/06_毕业论文目录与章节设计.md)
 - [07_毕业论文写作提纲.md](D:/VScodeProject/forsetsystem/docs/07_毕业论文写作提纲.md)
 - [08_答辩汇报提纲.md](D:/VScodeProject/forsetsystem/docs/08_答辩汇报提纲.md)
 - [10_题目名称_技术路线_预期结果.md](D:/VScodeProject/forsetsystem/docs/10_题目名称_技术路线_预期结果.md)
 ## 环境配置文档
 - [09_环境配置与安装说明.md](D:/VScodeProject/forsetsystem/docs/09_环境配置与安装说明.md)
 ## 说明
 - 系统文档以当前项目实现为准，围绕中国企业员工缺勤分析、风险预测与群体画像展开。
 - 论文文档采用本科毕业设计常用结构，便于后续继续扩写为正式论文。
 - 若后续系统功能或字段发生变化，应同步更新本目录下相关文档。
 - 深度学习部分推荐使用 `conda` 虚拟环境配合 `pip` 安装 PyTorch GPU 版。
 - 推荐安装顺序为：创建 `conda` 环境、安装官方 `cu124` 的 PyTorch、再补充其余后端依赖。
--- a/docs/中国企业缺勤模拟数据集说明.md
+++ b/docs/中国企业缺勤模拟数据集说明.md
@@ -0,0 +1,227 @@
 # 中国企业缺勤模拟数据集说明
 ## 1. 数据集概述
 - 数据文件：`backend/data/raw/china_enterprise_absence_events.csv`
 - 数据定位：中国企业员工缺勤事件模拟数据集
 - 数据来源：项目内部独立模拟生成，与原 `UCI Absenteeism` 数据集无任何字段映射和业务关联
 - 样本粒度：每一行表示一次员工缺勤事件
 - 样本量：`12000` 条
 - 员工覆盖数：`2575`
 - 企业覆盖数：`180`
 - 行业覆盖数：`7`
 - 字段总数：`52`
 - 预测目标：`缺勤时长（小时）`
 ## 2. 目标变量分布
 目标列为 `缺勤时长（小时）`，当前统计结果如下：
 | 指标 | 数值 |
 |---|---:|
 | count | 12000.00 |
 | mean | 6.36 |
 | std | 2.26 |
 | min | 0.50 |
 | 25% | 4.70 |
 | 50% | 6.30 |
 | 75% | 7.80 |
 | max | 16.70 |
 风险分层说明：
 - 低风险：`0-4` 小时
 - 中风险：`4-8` 小时
 - 高风险：`8-12` 小时
 - 极高风险：`12+` 小时
 当前目标分布：
 - 低风险约 `15.66%`
 - 中风险约 `63.29%`
 - 高风险约 `19.10%`
 - 极高风险约 `1.95%`
 - 高风险及以上（`>8` 小时）占比约 `21.05%`
 该分布特征为“中风险为主、少量高风险、极端长缺勤较少”，适合用于回归预测与风险分层分析。
 ## 3. 字段设计原则
 - 字段语义贴合中国企业实际 HR、考勤、排班、请假管理场景
 - 不包含身份证号、手机号、详细住址等敏感信息
 - 类别字段以有限枚举为主，方便前端表单录入和模型编码
 - 数值字段控制在合理范围内，避免训练时出现大面积异常值
 - 通过规则驱动加扰动的方式生成数据，使关键特征与目标值之间存在稳定、可学习的关系
 ## 4. 字段清单
 ### 4.1 企业与组织字段
 | 字段名 | 含义 |
 |---|---|
 | 企业编号 | 企业主体唯一标识 |
 | 所属行业 | 企业所属行业，如制造业、互联网、物流运输等 |
 | 企业规模 | 企业员工规模分层 |
 | 所在城市等级 | 企业所在城市层级 |
 | 用工类型 | 正式员工、派遣、外包、实习等 |
 | 部门条线 | 员工所属业务或职能条线 |
 | 岗位序列 | 岗位类别，如管理、专业技术、生产操作等 |
 | 岗位级别 | 岗位层级，如初级、中级、高级、主管等 |
 ### 4.2 员工基础字段
 | 字段名 | 含义 |
 |---|---|
 | 员工编号 | 员工唯一标识 |
 | 性别 | 员工性别 |
 | 年龄 | 员工年龄 |
 | 司龄年数 | 员工在当前企业工作年限 |
 | 最高学历 | 员工最高学历层次 |
 | 婚姻状态 | 未婚、已婚、离异/其他 |
 | 是否本地户籍 | 是否为企业所在城市本地户籍 |
 | 子女数量 | 子女人数 |
 | 是否独生子女家庭负担 | 是否存在较高家庭抚养压力 |
 | 居住类型 | 自有住房、租房、宿舍 |
 ### 4.3 工作负荷与出勤环境字段
 | 字段名 | 含义 |
 |---|---|
 | 班次类型 | 标准白班、两班倒、三班倒、弹性班 |
 | 是否夜班岗位 | 是否属于夜班场景 |
 | 月均加班时长 | 月均加班小时数 |
 | 近30天出勤天数 | 近30天实际出勤天数 |
 | 近90天缺勤次数 | 近90天缺勤事件次数 |
 | 近180天请假总时长 | 近180天累计请假时长 |
 | 通勤时长分钟 | 单程或综合通勤时长 |
 | 通勤距离公里 | 通勤距离 |
 | 是否跨城通勤 | 是否存在跨城通勤情况 |
 | 绩效等级 | A/B/C/D 绩效等级 |
 | 近12月违纪次数 | 最近一年违纪次数 |
 | 团队人数 | 员工所在团队人数 |
 | 直属上级管理跨度 | 上级管理人数范围 |
 ### 4.4 健康与生活方式字段
 | 字段名 | 含义 |
 |---|---|
 | BMI | 身体质量指数 |
 | 是否慢性病史 | 是否存在慢性病史 |
 | 年度体检异常标记 | 年度体检是否存在异常 |
 | 近30天睡眠时长均值 | 近30天平均睡眠时长 |
 | 每周运动频次 | 每周运动次数 |
 | 是否吸烟 | 是否吸烟 |
 | 是否饮酒 | 是否饮酒 |
 | 心理压力等级 | 低、中、高 |
 | 是否长期久坐岗位 | 是否属于长期久坐岗位 |
 ### 4.5 缺勤事件字段
 | 字段名 | 含义 |
 |---|---|
 | 缺勤月份 | 本次缺勤发生月份 |
 | 星期几 | 本次缺勤发生星期 |
 | 是否节假日前后 | 是否发生在节假日前后窗口期 |
 | 季节 | 冬季、春季、夏季、秋季 |
 | 请假申请渠道 | 系统申请、主管代提、临时电话报备 |
 | 请假类型 | 病假、事假、年假、调休、婚假、丧假、产检育儿假、工伤假、其他 |
 | 请假原因大类 | 身体不适、家庭事务、子女照护、交通受阻、突发事件、职业疲劳、就医复查 |
 | 是否提供医院证明 | 是否提供医院证明材料 |
 | 是否临时请假 | 是否为临时发起请假 |
 | 是否连续缺勤 | 是否存在连续缺勤现象 |
 | 前一工作日是否加班 | 缺勤前一个工作日是否加班 |
 | 缺勤时长（小时） | 本次缺勤事件持续时长，预测目标列 |
 ## 5. 数值字段范围概览
 | 字段名 | 均值 | 最小值 | 最大值 |
 |---|---:|---:|---:|
 | 年龄 | 32.66 | 20.00 | 55.00 |
 | 司龄年数 | 11.74 | 0.20 | 32.00 |
 | 月均加班时长 | 34.84 | 4.10 | 66.10 |
 | 通勤时长分钟 | 41.38 | 8.00 | 109.70 |
 | 通勤距离公里 | 22.74 | 2.80 | 65.00 |
 | BMI | 24.30 | 17.50 | 36.50 |
 | 近30天睡眠时长均值 | 6.78 | 4.50 | 9.00 |
 | 每周运动频次 | 2.15 | 0.00 | 7.00 |
 | 近90天缺勤次数 | 1.33 | 0.00 | 7.00 |
 | 近180天请假总时长 | 22.92 | 0.00 | 65.90 |
 ## 6. 结构性分布信息
 - 夜班岗位占比约 `30.86%`
 - 节假日前后事件占比约 `23.43%`
 - 提供医院证明占比约 `58.49%`
 - 慢性病史占比约 `7.92%`
 - 星期分布基本均衡
 - 季节分布基本均衡
 ## 7. 行业层面的平均缺勤时长
 | 行业 | 样本数 | 平均缺勤时长 |
 |---|---:|---:|
 | 制造业 | 2366 | 6.671 |
 | 物流运输 | 1679 | 6.665 |
 | 互联网 | 1434 | 6.374 |
 | 建筑工程 | 1101 | 6.252 |
 | 医药健康 | 2274 | 6.208 |
 | 零售连锁 | 1820 | 6.197 |
 | 金融服务 | 1326 | 6.016 |
 可见制造业、物流运输的缺勤时长整体偏高，金融服务相对较低，符合行业工作强度与排班特征差异。
 ## 8. 请假类型与目标变量关系
 | 请假类型 | 样本数 | 平均缺勤时长 |
 |---|---:|---:|
 | 工伤假 | 258 | 11.092 |
 | 婚假 | 336 | 9.768 |
 | 丧假 | 238 | 9.437 |
 | 病假 | 3574 | 7.638 |
 | 产检育儿假 | 743 | 7.536 |
 | 事假 | 2612 | 5.998 |
 | 其他 | 1045 | 5.597 |
 | 调休 | 1708 | 4.252 |
 | 年假 | 1486 | 4.240 |
 该分布说明请假类型对目标值具有明显区分度，是模型的重要信号源之一。
 ## 9. 请假原因大类与目标变量关系
 | 请假原因大类 | 样本数 | 平均缺勤时长 |
 |---|---:|---:|
 | 就医复查 | 1503 | 7.073 |
 | 身体不适 | 3194 | 6.824 |
 | 子女照护 | 611 | 6.485 |
 | 突发事件 | 1223 | 6.109 |
 | 职业疲劳 | 2261 | 6.096 |
 | 家庭事务 | 2161 | 5.907 |
 | 交通受阻 | 1047 | 5.689 |
 ## 10. 数据集适用场景
 本数据集适用于以下任务：
 - 员工缺勤时长回归预测
 - 缺勤风险分层预警
 - 特征重要性分析
 - 行业/岗位/班次群体对比
 - 员工群体聚类画像
 - 前端数据可视化展示与业务汇报
 ## 11. 使用说明
 - 生成脚本：`backend/core/generate_dataset.py`
 - 训练脚本：`backend/core/train_model.py`
 - 预处理入口：`backend/core/preprocessing.py`
 如果需要重新生成全新数据集，可删除旧文件后重新执行：
 ```powershell
 cd backend
 python core/generate_dataset.py
 ```
 ## 12. 说明
 该数据集为模拟数据，不对应任何真实企业、真实员工或真实业务记录，仅用于毕业设计系统中的算法训练、接口联调与前端展示。
--- a/frontend/package.json
+++ b/frontend/package.json
@@ -9,11 +9,12 @@
    "preview": "vite preview"
  },
  "dependencies": {
-    "vue": "^3.4.0",
+    "@element-plus/icons-vue": "^2.3.2",
-    "vue-router": "^4.2.5",
+    "axios": "^1.6.2",
    "element-plus": "^2.4.4",
    "echarts": "^5.4.3",
-    "axios": "^1.6.2"
+    "element-plus": "^2.4.4",
    "vue": "^3.4.0",
    "vue-router": "^4.2.5"
  },
  "devDependencies": {
    "@vitejs/plugin-vue": "^4.5.2",
--- a/frontend/pnpm-lock.yaml
+++ b/frontend/pnpm-lock.yaml
@@ -8,6 +8,9 @@ importers:
  .:
    dependencies:
      '@element-plus/icons-vue':
        specifier: ^2.3.2
        version: 2.3.2(vue@3.5.29)
      axios:
        specifier: ^1.6.2
        version: 1.13.6
--- a/frontend/src/App.vue
+++ b/frontend/src/App.vue
@@ -1,74 +1,345 @@
 <template>
-  <el-container class="app-container">
+  <div class="shell" :class="{ 'shell-collapsed': isSidebarCollapsed }">
-    <el-header class="app-header">
+    <aside class="shell-sidebar">
-      <div class="logo">员工缺勤分析与预测系统</div>
+      <div class="brand-block">
-      <el-menu
+        <div class="brand-mark">HR</div>
-        :default-active="activeMenu"
+        <div v-if="!isSidebarCollapsed">
-        mode="horizontal"
+          <div class="brand-title">企业缺勤分析台</div>
-        router
+          <div class="brand-subtitle">Human Resource Insight Console</div>
-        class="nav-menu"
+        </div>
-      >
+      </div>
-        <el-menu-item index="/dashboard">数据概览</el-menu-item>
+
-        <el-menu-item index="/analysis">影响因素</el-menu-item>
+      <div class="sidebar-panel">
-        <el-menu-item index="/prediction">缺勤预测</el-menu-item>
+        <div v-if="!isSidebarCollapsed" class="sidebar-label">导航</div>
-        <el-menu-item index="/clustering">员工画像</el-menu-item>
+        <el-menu :default-active="activeMenu" router class="nav-menu">
-      </el-menu>
+          <el-menu-item index="/dashboard">
-    </el-header>
+            <el-icon class="nav-icon"><Grid /></el-icon>
-    <el-main class="app-main">
+            <span class="nav-label">数据概览</span>
-      <router-view />
+          </el-menu-item>
-    </el-main>
+          <el-menu-item index="/analysis">
-  </el-container>
+            <el-icon class="nav-icon"><DataAnalysis /></el-icon>
            <span class="nav-label">影响因素</span>
          </el-menu-item>
          <el-menu-item index="/prediction">
            <el-icon class="nav-icon"><TrendCharts /></el-icon>
            <span class="nav-label">缺勤预测</span>
          </el-menu-item>
          <el-menu-item index="/clustering">
            <el-icon class="nav-icon"><UserFilled /></el-icon>
            <span class="nav-label">员工画像</span>
          </el-menu-item>
        </el-menu>
      </div>
      <div v-if="!isSidebarCollapsed" class="sidebar-note">
        <div class="sidebar-label">系统摘要</div>
        <p>面向企业管理场景的缺勤趋势、风险预测与群体画像展示。</p>
      </div>
    </aside>
    <main class="shell-main">
      <header class="topbar">
        <div class="topbar-main">
          <el-button class="collapse-btn" circle @click="isSidebarCollapsed = !isSidebarCollapsed">
            {{ isSidebarCollapsed ? '>' : '<' }}
          </el-button>
          <div>
            <div class="topbar-title">{{ currentMeta.title || '企业缺勤分析台' }}</div>
            <div class="topbar-subtitle">{{ currentMeta.subtitle }}</div>
          </div>
        </div>
        <div class="topbar-badges">
          <el-button class="theme-toggle" @click="toggleTheme">
            {{ isDarkMode ? '浅色模式' : '深色模式' }}
          </el-button>
          <span class="topbar-badge">企业健康运营分析</span>
          <span class="topbar-badge topbar-badge-accent">可视化决策界面</span>
        </div>
      </header>
      <section class="main-content">
        <router-view />
      </section>
    </main>
  </div>
 </template>
 <script setup>
-import { computed } from 'vue'
+import { computed, onMounted, ref, watch } from 'vue'
 import { useRoute } from 'vue-router'
 import { DataAnalysis, Grid, TrendCharts, UserFilled } from '@element-plus/icons-vue'
 const route = useRoute()
 const activeMenu = computed(() => route.path)
 const isSidebarCollapsed = ref(false)
 const isDarkMode = ref(false)
 const metaMap = {
  '/dashboard': {
    title: '数据概览',
    subtitle: '从企业缺勤事件的总量、时序与结构分布切入，建立整体认知。'
  },
  '/analysis': {
    title: '影响因素',
    subtitle: '观察模型最关注的驱动因素，辅助解释缺勤风险的来源。'
  },
  '/prediction': {
    title: '缺勤预测',
    subtitle: '围绕最核心的业务信号输入，快速获得缺勤时长与风险等级。'
  },
  '/clustering': {
    title: '员工画像',
    subtitle: '通过聚类划分典型群体，为答辩演示提供更直观的人群视角。'
  }
 }
 const currentMeta = computed(() => metaMap[route.path] || { title: '企业缺勤分析台', subtitle: '' })
 function applyTheme(isDark) {
  const theme = isDark ? 'dark' : 'light'
  document.documentElement.setAttribute('data-theme', theme)
  localStorage.setItem('ui-theme', theme)
 }
 function toggleTheme() {
  isDarkMode.value = !isDarkMode.value
 }
 onMounted(() => {
  const savedTheme = localStorage.getItem('ui-theme')
  isDarkMode.value = savedTheme === 'dark'
  applyTheme(isDarkMode.value)
 })
 watch(isDarkMode, value => {
  applyTheme(value)
 })
 </script>
-<style>
+<style scoped>
-* {
+.shell {
-  margin: 0;
+  display: grid;
-  padding: 0;
+  grid-template-columns: 280px minmax(0, 1fr);
  box-sizing: border-box;
 }
 body {
  font-family: 'Microsoft YaHei', 'PingFang SC', sans-serif;
  background-color: #f5f7fa;
 }
 .app-container {
  min-height: 100vh;
  transition: grid-template-columns 0.28s ease;
 }
-.app-header {
+.shell.shell-collapsed {
-  background-color: #fff;
+  grid-template-columns: 96px minmax(0, 1fr);
-  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
+}
 .shell-sidebar {
  position: sticky;
  top: 0;
  display: flex;
  flex-direction: column;
  gap: 22px;
  height: 100vh;
  padding: 26px 22px;
  background: var(--sidebar-bg);
  color: var(--sidebar-text);
  transition: padding 0.28s ease;
  border-right: 1px solid var(--sidebar-border);
 }
 .brand-block {
  display: flex;
  align-items: center;
-  padding: 0 20px;
+  gap: 14px;
  height: 60px !important;
 }
-.logo {
+.brand-mark {
-  font-size: 18px;
+  display: grid;
-  font-weight: bold;
+  place-items: center;
-  color: #409EFF;
+  width: 48px;
-  margin-right: 40px;
+  height: 48px;
-  white-space: nowrap;
+  border-radius: 16px;
  background: linear-gradient(135deg, #fef3c7, #fdba74);
  color: #7c2d12;
  font-weight: 800;
  letter-spacing: 0.08em;
 }
 .brand-title {
  font-size: 20px;
  font-weight: 700;
  color: var(--sidebar-text);
 }
 .brand-subtitle {
  margin-top: 4px;
  font-size: 12px;
  color: var(--sidebar-text-subtle);
 }
 .sidebar-panel,
 .sidebar-note {
  padding: 18px;
  border: 1px solid var(--sidebar-border);
  border-radius: 22px;
  background: var(--sidebar-surface);
  backdrop-filter: blur(14px);
 }
 .sidebar-label {
  margin-bottom: 14px;
  font-size: 12px;
  letter-spacing: 0.18em;
  text-transform: uppercase;
  color: var(--sidebar-text-subtle);
 }
 .nav-menu {
-  border-bottom: none;
+  border: none;
-  flex: 1;
+  background: transparent;
 }
-.app-main {
+:deep(.nav-menu .el-menu-item) {
-  padding: 20px;
+  height: 48px;
-  background-color: #f5f7fa;
+  margin-bottom: 8px;
-  min-height: calc(100vh - 60px);
+  border-radius: 14px;
  color: var(--sidebar-text);
  background: transparent;
  transition: all 0.2s ease;
 }
 .nav-icon {
  display: inline-flex;
  align-items: center;
  justify-content: center;
  width: 18px;
  margin-right: 10px;
  font-size: 17px;
  color: var(--sidebar-text-subtle);
 }
 .nav-label {
  font-size: 14px;
 }
 :deep(.nav-menu .el-menu-item.is-active) {
  color: var(--sidebar-text);
  background: var(--sidebar-menu-active);
 }
 :deep(.nav-menu .el-menu-item:hover) {
  color: var(--sidebar-text);
  background: var(--sidebar-menu-hover);
 }
 .sidebar-note p {
  margin: 0;
  font-size: 13px;
  line-height: 1.7;
  color: var(--sidebar-text-subtle);
 }
 .shell-main {
  min-width: 0;
  padding: 24px;
 }
 .topbar {
  display: flex;
  align-items: flex-start;
  justify-content: space-between;
  gap: 18px;
  margin-bottom: 22px;
 }
 .topbar-main {
  display: flex;
  align-items: flex-start;
  gap: 14px;
 }
 .collapse-btn {
  margin-top: 4px;
  border: 1px solid var(--line-soft);
  background: rgba(255, 255, 255, 0.82);
  color: var(--brand-strong);
 }
 .topbar-title {
  font-size: 30px;
  font-weight: 700;
  color: var(--text-main);
 }
 .topbar-subtitle {
  margin-top: 8px;
  max-width: 760px;
  font-size: 14px;
  line-height: 1.7;
  color: var(--text-subtle);
 }
 .topbar-badges {
  display: flex;
  flex-wrap: wrap;
  justify-content: flex-end;
  align-items: center;
  gap: 10px;
 }
 .theme-toggle {
  border: 1px solid var(--line-soft);
  background: var(--surface);
  color: var(--text-main);
 }
 .topbar-badge {
  padding: 9px 14px;
  border: 1px solid var(--line-soft);
  border-radius: 999px;
  background: var(--surface);
  font-size: 12px;
  color: var(--brand-strong);
 }
 .topbar-badge-accent {
  color: var(--accent);
 }
 .main-content {
  min-width: 0;
 }
 .shell-collapsed .shell-sidebar {
  padding-left: 14px;
  padding-right: 14px;
 }
 .shell-collapsed .brand-block {
  justify-content: center;
 }
 .shell-collapsed .sidebar-panel {
  padding: 14px 10px;
 }
 .shell-collapsed :deep(.nav-menu .el-menu-item) {
  justify-content: center;
  padding: 0;
 }
 .shell-collapsed .nav-icon {
  margin-right: 0;
  width: 20px;
 }
 .shell-collapsed .nav-label {
  display: none;
 }
@media (max-width: 1100px) {
  .shell {
    grid-template-columns: 1fr;
  }
  .shell-sidebar {
    position: static;
    height: auto;
  }
 }
 </style>
--- a/frontend/src/main.js
+++ b/frontend/src/main.js
@@ -1,6 +1,7 @@
 import { createApp } from 'vue'
 import ElementPlus from 'element-plus'
 import 'element-plus/dist/index.css'
 import './styles/theme.css'
 import App from './App.vue'
 import router from './router'
--- a/frontend/src/styles/theme.css
+++ b/frontend/src/styles/theme.css
@@ -0,0 +1,203 @@
 :root {
  --bg-base: #eef3f7;
  --bg-soft: #f8fbfd;
  --surface: rgba(255, 255, 255, 0.88);
  --surface-strong: #ffffff;
  --line-soft: rgba(23, 43, 77, 0.08);
  --line-strong: rgba(23, 43, 77, 0.16);
  --text-main: #18212f;
  --text-subtle: #627086;
  --brand: #0f766e;
  --brand-strong: #115e59;
  --accent: #c2410c;
  --shadow-soft: 0 18px 50px rgba(15, 23, 42, 0.08);
  --shadow-card: 0 12px 30px rgba(15, 23, 42, 0.06);
  --radius-xl: 28px;
  --radius-lg: 22px;
  --radius-md: 16px;
  --hero-text: #f8fffd;
  --hero-text-subtle: rgba(248, 255, 253, 0.84);
  --sidebar-bg: #f3f4f6;
  --sidebar-surface: rgba(255, 255, 255, 0.72);
  --sidebar-border: rgba(15, 23, 42, 0.08);
  --sidebar-text: #1f2937;
  --sidebar-text-subtle: #6b7280;
  --sidebar-menu-hover: rgba(15, 23, 42, 0.05);
  --sidebar-menu-active: rgba(15, 118, 110, 0.12);
 }
 :root[data-theme='dark'] {
  --bg-base: #0f172a;
  --bg-soft: #111827;
  --surface: rgba(17, 24, 39, 0.82);
  --surface-strong: #111827;
  --line-soft: rgba(148, 163, 184, 0.16);
  --line-strong: rgba(148, 163, 184, 0.26);
  --text-main: #e5eef8;
  --text-subtle: #9fb0c7;
  --brand: #34d399;
  --brand-strong: #6ee7b7;
  --accent: #fb923c;
  --shadow-soft: 0 18px 50px rgba(2, 6, 23, 0.4);
  --shadow-card: 0 12px 30px rgba(2, 6, 23, 0.26);
  --hero-text: #f8fafc;
  --hero-text-subtle: rgba(226, 232, 240, 0.84);
  --sidebar-bg: #111827;
  --sidebar-surface: rgba(255, 255, 255, 0.03);
  --sidebar-border: rgba(148, 163, 184, 0.14);
  --sidebar-text: #e5eef8;
  --sidebar-text-subtle: #94a3b8;
  --sidebar-menu-hover: rgba(255, 255, 255, 0.06);
  --sidebar-menu-active: rgba(52, 211, 153, 0.12);
 }
 * {
  box-sizing: border-box;
 }
 html,
 body,
 #app {
  min-height: 100%;
 }
 body {
  margin: 0;
  font-family: "Source Han Sans SC", "PingFang SC", "Microsoft YaHei UI", sans-serif;
  color: var(--text-main);
  background:
    radial-gradient(circle at top left, rgba(15, 118, 110, 0.18), transparent 32%),
    radial-gradient(circle at top right, rgba(194, 65, 12, 0.14), transparent 28%),
    linear-gradient(180deg, var(--bg-soft) 0%, var(--bg-base) 100%);
  transition: background 0.25s ease, color 0.25s ease;
 }
 a {
  color: inherit;
 }
 .page-shell {
  display: flex;
  flex-direction: column;
  gap: 20px;
 }
 .page-hero {
  position: relative;
  overflow: hidden;
  padding: 28px 30px;
  border: 1px solid rgba(255, 255, 255, 0.45);
  border-radius: var(--radius-xl);
  background:
    linear-gradient(135deg, rgba(15, 118, 110, 0.94), rgba(21, 94, 89, 0.92) 52%, rgba(30, 41, 59, 0.92));
  box-shadow: var(--shadow-soft);
  color: var(--hero-text);
 }
 .page-hero::after {
  content: '';
  position: absolute;
  inset: auto -80px -120px auto;
  width: 240px;
  height: 240px;
  border-radius: 50%;
  background: rgba(255, 255, 255, 0.08);
 }
 .page-eyebrow {
  margin-bottom: 10px;
  font-size: 12px;
  letter-spacing: 0.22em;
  text-transform: uppercase;
  opacity: 0.72;
 }
 .page-title {
  margin: 0;
  font-size: 30px;
  line-height: 1.15;
  font-weight: 700;
 }
 .page-description {
  max-width: 720px;
  margin: 12px 0 0;
  font-size: 14px;
  line-height: 1.7;
  color: var(--hero-text-subtle);
 }
 .glass-card.el-card,
 .panel-card.el-card,
 .metric-card.el-card {
  border: 1px solid var(--line-soft);
  border-radius: var(--radius-lg);
  background: var(--surface);
  box-shadow: var(--shadow-card);
  backdrop-filter: blur(14px);
 }
 .glass-card .el-card__body,
 .panel-card .el-card__body,
 .metric-card .el-card__body {
  padding: 22px;
 }
 .section-heading {
  display: flex;
  align-items: center;
  justify-content: space-between;
  gap: 16px;
  margin-bottom: 18px;
 }
 .section-title {
  margin: 0;
  font-size: 18px;
  font-weight: 700;
  color: var(--text-main);
 }
 .section-caption {
  margin: 6px 0 0;
  font-size: 13px;
  color: var(--text-subtle);
 }
 .chart-frame {
  height: 300px;
 }
 .soft-tag {
  display: inline-flex;
  align-items: center;
  gap: 8px;
  padding: 7px 12px;
  border-radius: 999px;
  font-size: 12px;
  color: var(--brand-strong);
  background: rgba(15, 118, 110, 0.1);
 }
 :root[data-theme='dark'] .soft-tag {
  background: rgba(52, 211, 153, 0.14);
 }
 .soft-grid {
  display: grid;
  gap: 18px;
 }
@media (max-width: 960px) {
  .page-hero {
    padding: 22px 20px;
  }
  .page-title {
    font-size: 24px;
  }
  .chart-frame {
    height: 260px;
  }
 }
--- a/frontend/src/views/Clustering.vue
+++ b/frontend/src/views/Clustering.vue
@@ -1,40 +1,58 @@
 <template>
-  <div class="clustering">
+  <div class="page-shell">
-    <el-card>
+    <section class="page-hero cluster-hero">
-      <template #header>
+      <div class="page-eyebrow">Clustering</div>
-        <div style="display: flex; justify-content: space-between; align-items: center">
+      <h1 class="page-title">员工画像与群体切片</h1>
-          <span>员工群体画像</span>
+      <p class="page-description">
-          <el-select v-model="nClusters" @change="loadData" style="width: 120px">
+        将员工划分为不同缺勤画像群体，通过雷达图和散点图形成直观的人群对比展示。
-            <el-option :label="2" :value="2" />
+      </p>
-            <el-option :label="3" :value="3" />
+    </section>
-            <el-option :label="4" :value="4" />
+
-          </el-select>
+    <el-card class="panel-card" shadow="never">
      <div class="section-heading">
        <div>
          <h3 class="section-title">群体雷达画像</h3>
          <p class="section-caption">以年龄、司龄、加班、通勤、BMI 和缺勤水平构建群体轮廓。</p>
        </div>
-      </template>
+        <el-select v-model="nClusters" @change="loadData" class="cluster-select">
-      <div ref="radarChartRef" class="chart"></div>
+          <el-option :label="2" :value="2" />
          <el-option :label="3" :value="3" />
          <el-option :label="4" :value="4" />
        </el-select>
      </div>
      <div ref="radarChartRef" class="chart-frame"></div>
    </el-card>
-    <el-row :gutter="20" style="margin-top: 20px">
+    <el-row :gutter="20">
-      <el-col :span="12">
+      <el-col :xs="24" :xl="11">
-        <el-card>
+        <el-card class="panel-card" shadow="never">
-          <template #header>
+          <div class="section-heading">
-            <span>聚类结果</span>
+            <div>
-          </template>
+              <h3 class="section-title">聚类结果</h3>
-          <el-table :data="clusterData" stripe>
+              <p class="section-caption">便于答辩时逐个介绍群体特征。</p>
-            <el-table-column prop="name" label="群体名称" />
+            </div>
-            <el-table-column prop="member_count" label="人数" />
+            <span class="soft-tag">Profiles</span>
-            <el-table-column prop="percentage" label="占比(%)">
+          </div>
          <el-table :data="clusterData" stripe class="cluster-table">
            <el-table-column prop="name" label="群体名称" min-width="120" />
            <el-table-column prop="member_count" label="人数" width="90" />
            <el-table-column prop="percentage" label="占比(%)" width="90">
              <template #default="{ row }">{{ row.percentage }}%</template>
            </el-table-column>
            <el-table-column prop="description" label="说明" min-width="180" />
          </el-table>
        </el-card>
      </el-col>
-      <el-col :span="12">
+      <el-col :xs="24" :xl="13">
-        <el-card>
+        <el-card class="panel-card" shadow="never">
-          <template #header>
+          <div class="section-heading">
-            <span>聚类散点图</span>
+            <div>
-          </template>
+              <h3 class="section-title">加班与缺勤散点图</h3>
-          <div ref="scatterChartRef" class="chart"></div>
+              <p class="section-caption">展示各聚类在加班强度与缺勤水平上的位置差异。</p>
            </div>
            <span class="soft-tag">Scatter</span>
          </div>
          <div ref="scatterChartRef" class="chart-frame"></div>
        </el-card>
      </el-col>
    </el-row>
@@ -42,7 +60,7 @@
 </template>
 <script setup>
-import { ref, onMounted } from 'vue'
+import { onMounted, ref } from 'vue'
 import * as echarts from 'echarts'
 import request from '@/api/request'
@@ -56,71 +74,59 @@ onMounted(() => {
 })
 async function loadData() {
-  initRadarChart()
+  await Promise.all([initRadarChart(), initScatterChart(), loadClusterResult()])
  initScatterChart()
  await loadClusterResult()
 }
 async function initRadarChart() {
  const chart = echarts.init(radarChartRef.value)
-  try {
+  const data = await request.get(`/cluster/profile?n_clusters=${nClusters.value}`)
-    const data = await request.get(`/cluster/profile?n_clusters=${nClusters.value}`)
+  chart.setOption({
-    chart.setOption({
+    tooltip: {},
-      tooltip: {},
+    legend: { top: 6, data: data.clusters.map(item => item.name) },
-      legend: { data: data.clusters.map(c => c.name) },
+    radar: { indicator: data.dimensions.map(name => ({ name, max: 1 })), radius: '62%' },
-      radar: {
+    series: [{ type: 'radar', data: data.clusters.map(item => ({ value: item.values, name: item.name })) }]
-        indicator: data.dimensions.map(d => ({ name: d, max: 1 }))
+  })
      },
      series: [{
        type: 'radar',
        data: data.clusters.map(c => ({
          value: c.values,
          name: c.name
        }))
      }]
    })
  } catch (e) {
    console.error(e)
  }
 }
 async function initScatterChart() {
  const chart = echarts.init(scatterChartRef.value)
-  try {
+  const data = await request.get(`/cluster/scatter?n_clusters=${nClusters.value}`)
-    const data = await request.get(`/cluster/scatter?n_clusters=${nClusters.value}`)
+  const grouped = {}
-    const grouped = {}
+  data.points.forEach(point => {
-    data.points.forEach(p => {
+    if (!grouped[point.cluster_id]) grouped[point.cluster_id] = []
-      if (!grouped[p.cluster_id]) grouped[p.cluster_id] = []
+    grouped[point.cluster_id].push([point.x, point.y])
-      grouped[p.cluster_id].push([p.x, p.y])
+  })
-    })
+  chart.setOption({
-    
+    tooltip: { trigger: 'item' },
-    chart.setOption({
+    grid: { left: 36, right: 18, top: 20, bottom: 36, containLabel: true },
-      tooltip: { trigger: 'item' },
+    xAxis: { name: data.x_axis_name, splitLine: { lineStyle: { color: '#E5EBF2' } } },
-      xAxis: { name: data.x_axis_name },
+    yAxis: { name: data.y_axis_name, splitLine: { lineStyle: { color: '#E5EBF2' } } },
-      yAxis: { name: data.y_axis_name },
+    series: Object.entries(grouped).map(([id, points]) => ({
-      series: Object.entries(grouped).map(([id, points]) => ({
+      type: 'scatter',
-        type: 'scatter',
+      data: points,
-        data: points,
+      symbolSize: 9,
-        name: `群体${Number(id) + 1}`
+      name: `群体${Number(id) + 1}`
-      }))
+    }))
-    })
+  })
  } catch (e) {
    console.error(e)
  }
 }
 async function loadClusterResult() {
-  try {
+  const data = await request.get(`/cluster/result?n_clusters=${nClusters.value}`)
-    const data = await request.get(`/cluster/result?n_clusters=${nClusters.value}`)
+  clusterData.value = data.clusters
    clusterData.value = data.clusters
  } catch (e) {
    console.error(e)
  }
 }
 </script>
 <style scoped>
-.chart {
+.cluster-hero {
-  height: 350px;
+  background:
    linear-gradient(135deg, rgba(194, 65, 12, 0.92), rgba(124, 58, 237, 0.88) 55%, rgba(30, 41, 59, 0.94));
 }
 .cluster-select {
  width: 100px;
 }
 .cluster-table {
  --el-table-border-color: transparent;
 }
 </style>
--- a/frontend/src/views/Dashboard.vue
+++ b/frontend/src/views/Dashboard.vue
@@ -1,8 +1,17 @@
 <template>
-  <div class="dashboard">
+  <div class="page-shell">
    <section class="page-hero">
      <div class="page-eyebrow">Overview</div>
      <h1 class="page-title">企业缺勤全景概览</h1>
      <p class="page-description">
        通过总量、时序、结构分布三个层面快速识别缺勤风险的整体轮廓，适合作为答辩时的第一屏总览。
      </p>
    </section>
    <el-row :gutter="20" class="kpi-row">
-      <el-col :span="6" v-for="kpi in kpiData" :key="kpi.title">
+      <el-col :xs="24" :sm="12" :lg="6" v-for="kpi in kpiData" :key="kpi.title">
-        <el-card class="kpi-card">
+        <el-card class="metric-card kpi-card" shadow="never">
          <div class="kpi-index">{{ kpi.index }}</div>
          <div class="kpi-title">{{ kpi.title }}</div>
          <div class="kpi-value">{{ kpi.value }}</div>
          <div class="kpi-unit">{{ kpi.unit }}</div>
@@ -11,39 +20,55 @@
    </el-row>
    <el-row :gutter="20">
-      <el-col :span="12">
+      <el-col :xs="24" :xl="12">
-        <el-card class="chart-card">
+        <el-card class="panel-card" shadow="never">
-          <template #header>
+          <div class="section-heading">
-            <span>月度缺勤趋势</span>
+            <div>
-          </template>
+              <h3 class="section-title">月度缺勤事件趋势</h3>
-          <div ref="trendChartRef" class="chart"></div>
+              <p class="section-caption">观察不同月份的事件量与时长波动。</p>
            </div>
            <span class="soft-tag">Trend</span>
          </div>
          <div ref="trendChartRef" class="chart-frame"></div>
        </el-card>
      </el-col>
-      <el-col :span="12">
+      <el-col :xs="24" :xl="12">
-        <el-card class="chart-card">
+        <el-card class="panel-card" shadow="never">
-          <template #header>
+          <div class="section-heading">
-            <span>星期分布</span>
+            <div>
-          </template>
+              <h3 class="section-title">星期分布</h3>
-          <div ref="weekdayChartRef" class="chart"></div>
+              <p class="section-caption">识别工作周内的缺勤集中区间。</p>
            </div>
            <span class="soft-tag">Weekday</span>
          </div>
          <div ref="weekdayChartRef" class="chart-frame"></div>
        </el-card>
      </el-col>
    </el-row>
-    <el-row :gutter="20" style="margin-top: 20px">
+    <el-row :gutter="20">
-      <el-col :span="12">
+      <el-col :xs="24" :xl="12">
-        <el-card class="chart-card">
+        <el-card class="panel-card" shadow="never">
-          <template #header>
+          <div class="section-heading">
-            <span>缺勤原因分布</span>
+            <div>
-          </template>
+              <h3 class="section-title">请假原因大类分布</h3>
-          <div ref="reasonChartRef" class="chart"></div>
+              <p class="section-caption">呈现引发缺勤的主要业务原因结构。</p>
            </div>
            <span class="soft-tag">Reason Mix</span>
          </div>
          <div ref="reasonChartRef" class="chart-frame"></div>
        </el-card>
      </el-col>
-      <el-col :span="12">
+      <el-col :xs="24" :xl="12">
-        <el-card class="chart-card">
+        <el-card class="panel-card" shadow="never">
-          <template #header>
+          <div class="section-heading">
-            <span>季节分布</span>
+            <div>
-          </template>
+              <h3 class="section-title">季节影响分布</h3>
-          <div ref="seasonChartRef" class="chart"></div>
+              <p class="section-caption">展示季节变化与缺勤总量之间的关系。</p>
            </div>
            <span class="soft-tag">Season</span>
          </div>
          <div ref="seasonChartRef" class="chart-frame"></div>
        </el-card>
      </el-col>
    </el-row>
@@ -53,7 +78,7 @@
 <script setup>
 import { ref, onMounted } from 'vue'
 import * as echarts from 'echarts'
-import { getStats, getTrend, getWeekday, getReasons, getSeasons } from '@/api/overview'
+import { getReasons, getSeasons, getStats, getTrend, getWeekday } from '@/api/overview'
 const trendChartRef = ref(null)
 const weekdayChartRef = ref(null)
@@ -61,20 +86,20 @@ const reasonChartRef = ref(null)
 const seasonChartRef = ref(null)
 const kpiData = ref([
-  { title: '总记录数', value: '-', unit: '条' },
+  { index: '01', title: '缺勤事件数', value: '-', unit: '条' },
-  { title: '员工总数', value: '-', unit: '人' },
+  { index: '02', title: '员工覆盖数', value: '-', unit: '人' },
-  { title: '平均缺勤时长', value: '-', unit: '小时' },
+  { index: '03', title: '平均缺勤时长', value: '-', unit: '小时' },
-  { title: '高风险占比', value: '-', unit: '%' }
+  { index: '04', title: '高风险事件占比', value: '-', unit: '%' }
 ])
 onMounted(async () => {
  try {
    const stats = await getStats()
    kpiData.value = [
-      { title: '总记录数', value: stats.total_records, unit: '条' },
+      { index: '01', title: '缺勤事件数', value: stats.total_records, unit: '条' },
-      { title: '员工总数', value: stats.total_employees, unit: '人' },
+      { index: '02', title: '员工覆盖数', value: stats.total_employees, unit: '人' },
-      { title: '平均缺勤时长', value: stats.avg_absent_hours, unit: '小时' },
+      { index: '03', title: '平均缺勤时长', value: stats.avg_absent_hours, unit: '小时' },
-      { title: '高风险占比', value: (stats.high_risk_ratio * 100).toFixed(1), unit: '%' }
+      { index: '04', title: '高风险事件占比', value: (stats.high_risk_ratio * 100).toFixed(1), unit: '%' }
    ]
  } catch (e) {
    console.error('Failed to load stats:', e)
@@ -88,102 +113,105 @@ onMounted(async () => {
 async function initTrendChart() {
  const chart = echarts.init(trendChartRef.value)
-  try {
+  const data = await getTrend()
-    const data = await getTrend()
+  chart.setOption({
-    chart.setOption({
+    tooltip: { trigger: 'axis' },
-      tooltip: { trigger: 'axis' },
+    grid: { left: 32, right: 18, top: 30, bottom: 30, containLabel: true },
-      xAxis: { type: 'category', data: data.months },
+    xAxis: { type: 'category', data: data.months, axisLine: { lineStyle: { color: '#B6C1CE' } } },
-      yAxis: { type: 'value', name: '小时' },
+    yAxis: { type: 'value', name: '小时', splitLine: { lineStyle: { color: '#E5EBF2' } } },
-      series: [{ type: 'line', smooth: true, data: data.total_hours, areaStyle: { opacity: 0.3 } }]
+    series: [{
-    })
+      type: 'line',
-  } catch (e) {
+      smooth: true,
-    console.error(e)
+      data: data.total_hours,
-  }
+      areaStyle: { opacity: 0.18, color: '#0F766E' },
      lineStyle: { width: 3, color: '#0F766E' },
      itemStyle: { color: '#0F766E' }
    }]
  })
 }
 async function initWeekdayChart() {
  const chart = echarts.init(weekdayChartRef.value)
-  try {
+  const data = await getWeekday()
-    const data = await getWeekday()
+  chart.setOption({
-    chart.setOption({
+    tooltip: { trigger: 'axis' },
-      tooltip: { trigger: 'axis' },
+    grid: { left: 32, right: 18, top: 30, bottom: 30, containLabel: true },
-      xAxis: { type: 'category', data: data.weekdays },
+    xAxis: { type: 'category', data: data.weekdays, axisLine: { lineStyle: { color: '#B6C1CE' } } },
-      yAxis: { type: 'value', name: '小时' },
+    yAxis: { type: 'value', name: '小时', splitLine: { lineStyle: { color: '#E5EBF2' } } },
-      series: [{ type: 'bar', data: data.total_hours, itemStyle: { color: '#409EFF' } }]
+    series: [{ type: 'bar', barWidth: 34, data: data.total_hours, itemStyle: { color: '#C2410C', borderRadius: [10, 10, 0, 0] } }]
-    })
+  })
  } catch (e) {
    console.error(e)
  }
 }
 async function initReasonChart() {
  const chart = echarts.init(reasonChartRef.value)
-  try {
+  const data = await getReasons()
-    const data = await getReasons()
+  chart.setOption({
-    const topReasons = data.reasons.slice(0, 8)
+    tooltip: { trigger: 'item' },
-    chart.setOption({
+    legend: { bottom: 0, icon: 'circle' },
-      tooltip: { trigger: 'item' },
+    series: [{
-      legend: { orient: 'vertical', right: 10 },
+      type: 'pie',
-      series: [{
+      radius: ['42%', '72%'],
-        type: 'pie',
+      center: ['50%', '45%'],
-        radius: ['40%', '70%'],
+      data: data.reasons.map(item => ({ value: item.count, name: item.name }))
-        data: topReasons.map(r => ({ value: r.count, name: r.name }))
+    }]
-      }]
+  })
    })
  } catch (e) {
    console.error(e)
  }
 }
 async function initSeasonChart() {
  const chart = echarts.init(seasonChartRef.value)
-  try {
+  const data = await getSeasons()
-    const data = await getSeasons()
+  chart.setOption({
-    chart.setOption({
+    tooltip: { trigger: 'item' },
-      tooltip: { trigger: 'item' },
+    color: ['#0F766E', '#3B82F6', '#F59E0B', '#DC2626'],
-      series: [{
+    series: [{ type: 'pie', radius: ['38%', '70%'], data: data.seasons.map(item => ({ value: item.total_hours, name: item.name })) }]
-        type: 'pie',
+  })
        data: data.seasons.map(s => ({ value: s.total_hours, name: s.name }))
      }]
    })
  } catch (e) {
    console.error(e)
  }
 }
 </script>
 <style scoped>
 .kpi-row {
-  margin-bottom: 20px;
+  margin-bottom: 2px;
 }
 .kpi-card {
-  text-align: center;
+  position: relative;
-  padding: 10px;
+  overflow: hidden;
  min-height: 156px;
 }
 .kpi-card::after {
  content: '';
  position: absolute;
  right: -24px;
  bottom: -24px;
  width: 88px;
  height: 88px;
  border-radius: 24px;
  background: linear-gradient(135deg, rgba(15, 118, 110, 0.12), rgba(194, 65, 12, 0.08));
 }
 .kpi-index {
  margin-bottom: 14px;
  font-size: 12px;
  letter-spacing: 0.18em;
  color: #91a0b5;
 }
 .kpi-title {
  font-size: 14px;
-  color: #909399;
+  color: var(--text-subtle);
 }
 .kpi-value {
-  font-size: 28px;
+  margin-top: 14px;
-  font-weight: bold;
+  font-size: 34px;
-  color: #409EFF;
+  font-weight: 700;
-  margin: 10px 0;
+  color: var(--text-main);
 }
 .kpi-unit {
-  font-size: 12px;
+  margin-top: 8px;
-  color: #909399;
+  font-size: 13px;
-}
+  color: #8a97ab;
 .chart-card {
  height: 350px;
 }
 .chart {
  height: 280px;
 }
 </style>
--- a/frontend/src/views/FactorAnalysis.vue
+++ b/frontend/src/views/FactorAnalysis.vue
@@ -1,32 +1,53 @@
 <template>
-  <div class="factor-analysis">
+  <div class="page-shell">
-    <el-card>
+    <section class="page-hero analysis-hero">
-      <template #header>
+      <div class="page-eyebrow">Analysis</div>
-        <span>特征重要性排序</span>
+      <h1 class="page-title">缺勤驱动因素洞察</h1>
-      </template>
+      <p class="page-description">
-      <div ref="importanceChartRef" class="chart"></div>
+        将模型特征重要性、变量相关关系与群体差异放在同一界面展示，形成更完整的解释链路。
      </p>
    </section>
    <el-card class="panel-card" shadow="never">
      <div class="section-heading">
        <div>
          <h3 class="section-title">缺勤影响因素排序</h3>
          <p class="section-caption">用于展示模型最关注的驱动信号及其主次关系。</p>
        </div>
        <span class="soft-tag">Importance</span>
      </div>
      <div ref="importanceChartRef" class="chart-frame"></div>
    </el-card>
-    <el-row :gutter="20" style="margin-top: 20px">
+    <el-row :gutter="20">
-      <el-col :span="12">
+      <el-col :xs="24" :xl="12">
-        <el-card>
+        <el-card class="panel-card" shadow="never">
-          <template #header>
+          <div class="section-heading">
-            <span>相关性热力图</span>
+            <div>
-          </template>
+              <h3 class="section-title">核心特征相关性</h3>
-          <div ref="correlationChartRef" class="chart"></div>
+              <p class="section-caption">帮助说明关键指标之间的联动关系。</p>
            </div>
            <span class="soft-tag">Correlation</span>
          </div>
          <div ref="correlationChartRef" class="chart-frame"></div>
        </el-card>
      </el-col>
-      <el-col :span="12">
+      <el-col :xs="24" :xl="12">
-        <el-card>
+        <el-card class="panel-card" shadow="never">
-          <template #header>
+          <div class="section-heading">
-            <span>群体对比分析</span>
+            <div>
-          </template>
+              <h3 class="section-title">群体对比分析</h3>
-          <el-select v-model="dimension" @change="loadComparison" style="margin-bottom: 20px">
+              <p class="section-caption">从行业、排班和健康等维度比较平均缺勤时长。</p>
-            <el-option label="饮酒习惯" value="drinker" />
+            </div>
-            <el-option label="吸烟习惯" value="smoker" />
+            <el-select v-model="dimension" @change="loadComparison" class="dimension-select">
-            <el-option label="学历" value="education" />
+              <el-option label="所属行业" value="industry" />
-          </el-select>
+              <el-option label="班次类型" value="shift_type" />
-          <div ref="compareChartRef" class="chart"></div>
+              <el-option label="岗位序列" value="job_family" />
              <el-option label="婚姻状态" value="marital_status" />
              <el-option label="慢性病史" value="chronic_disease" />
            </el-select>
          </div>
          <div ref="compareChartRef" class="chart-frame"></div>
        </el-card>
      </el-col>
    </el-row>
@@ -34,14 +55,14 @@
 </template>
 <script setup>
-import { ref, onMounted } from 'vue'
+import { onMounted, ref } from 'vue'
 import * as echarts from 'echarts'
 import request from '@/api/request'
 const importanceChartRef = ref(null)
 const correlationChartRef = ref(null)
 const compareChartRef = ref(null)
-const dimension = ref('drinker')
+const dimension = ref('industry')
 onMounted(() => {
  initImportanceChart()
@@ -51,39 +72,33 @@ onMounted(() => {
 async function initImportanceChart() {
  const chart = echarts.init(importanceChartRef.value)
-  try {
+  const data = await request.get('/analysis/importance')
-    const data = await request.get('/analysis/importance')
+  const features = data.features || []
-    const features = data.features || []
+  chart.setOption({
-    chart.setOption({
+    tooltip: { trigger: 'axis' },
-      tooltip: { trigger: 'axis' },
+    grid: { left: '24%', right: 18, top: 24, bottom: 16 },
-      grid: { left: '20%' },
+    xAxis: { type: 'value', splitLine: { lineStyle: { color: '#E5EBF2' } } },
-      xAxis: { type: 'value' },
+    yAxis: { type: 'category', data: features.map(item => item.name_cn).reverse(), axisLine: { show: false } },
-      yAxis: { type: 'category', data: features.map(f => f.name_cn).reverse() },
+    series: [{
-      series: [{
+      type: 'bar',
-        type: 'bar',
+      barWidth: 18,
-        data: features.map(f => f.importance).reverse(),
+      data: features.map(item => item.importance).reverse(),
-        itemStyle: { color: '#409EFF' }
+      itemStyle: { color: '#0F766E', borderRadius: [0, 10, 10, 0] }
-      }]
+    }]
-    })
+  })
  } catch (e) {
    console.error(e)
  }
 }
 async function initCorrelationChart() {
  const chart = echarts.init(correlationChartRef.value)
-  try {
+  const data = await request.get('/analysis/correlation')
-    const data = await request.get('/analysis/correlation')
+  chart.setOption({
-    chart.setOption({
+    tooltip: {},
-      tooltip: {},
+    grid: { left: 40, right: 20, top: 20, bottom: 40 },
-      xAxis: { type: 'category', data: data.features },
+    xAxis: { type: 'category', data: data.features },
-      yAxis: { type: 'category', data: data.features },
+    yAxis: { type: 'category', data: data.features },
-      visualMap: { min: -1, max: 1, calculable: true, inRange: { color: ['#313695', '#fff', '#a50026'] } },
+    visualMap: { min: -1, max: 1, calculable: true, orient: 'horizontal', left: 'center', bottom: 0, inRange: { color: ['#14532d', '#f8fafc', '#7f1d1d'] } },
-      series: [{ type: 'heatmap', data: flattenMatrix(data.matrix, data.features) }]
+    series: [{ type: 'heatmap', data: flattenMatrix(data.matrix, data.features) }]
-    })
+  })
  } catch (e) {
    console.error(e)
  }
 }
 function flattenMatrix(matrix, features) {
@@ -98,22 +113,24 @@ function flattenMatrix(matrix, features) {
 async function loadComparison() {
  const chart = echarts.init(compareChartRef.value)
-  try {
+  const data = await request.get(`/analysis/compare?dimension=${dimension.value}`)
-    const data = await request.get(`/analysis/compare?dimension=${dimension.value}`)
+  chart.setOption({
-    chart.setOption({
+    tooltip: { trigger: 'axis' },
-      tooltip: { trigger: 'axis' },
+    grid: { left: 32, right: 18, top: 30, bottom: 48, containLabel: true },
-      xAxis: { type: 'category', data: data.groups.map(g => g.name) },
+    xAxis: { type: 'category', data: data.groups.map(item => item.name), axisLabel: { interval: 0, rotate: 18 } },
-      yAxis: { type: 'value', name: '平均缺勤时长(小时)' },
+    yAxis: { type: 'value', name: '平均缺勤时长(小时)', splitLine: { lineStyle: { color: '#E5EBF2' } } },
-      series: [{ type: 'bar', data: data.groups.map(g => g.avg_hours), itemStyle: { color: '#67C23A' } }]
+    series: [{ type: 'bar', data: data.groups.map(item => item.avg_hours), itemStyle: { color: '#C2410C', borderRadius: [10, 10, 0, 0] } }]
-    })
+  })
  } catch (e) {
    console.error(e)
  }
 }
 </script>
 <style scoped>
-.chart {
+.analysis-hero {
-  height: 300px;
+  background:
    linear-gradient(135deg, rgba(30, 64, 175, 0.95), rgba(15, 118, 110, 0.92) 58%, rgba(30, 41, 59, 0.94));
 }
 .dimension-select {
  width: 180px;
 }
 </style>
--- a/frontend/src/views/Prediction.vue
+++ b/frontend/src/views/Prediction.vue
@@ -1,221 +1,284 @@
 <template>
-  <div class="prediction">
+  <div class="page-shell prediction">
    <section class="page-hero prediction-hero">
      <div class="page-eyebrow">Prediction</div>
      <h1 class="page-title">核心因子驱动的缺勤预测</h1>
      <p class="page-description">
        仅保留对结果最关键的输入项，让演示流程更聚焦，也让答辩老师更容易理解模型的业务逻辑。
      </p>
    </section>
    <el-row :gutter="20">
-      <el-col :span="14">
+      <el-col :xs="24" :xl="15">
-        <el-card>
+        <div class="prediction-input-grid">
-          <template #header>
+          <el-card class="panel-card intro-card" shadow="never">
-            <div style="display: flex; justify-content: space-between; align-items: center">
+            <div class="section-heading" style="margin-bottom: 0">
-              <span>参数输入</span>
+              <div>
                <h3 class="section-title">中国企业缺勤风险输入</h3>
                <p class="section-caption">使用卡片分区组织核心因子，演示时更清晰。</p>
              </div>
              <el-button size="small" @click="resetForm">重置</el-button>
            </div>
-          </template>
+            <div class="form-tip">
-          <el-form :model="form" label-width="120px" size="small">
+              系统会自动补齐企业背景、健康生活与组织属性等次级信息，页面仅保留对预测结果影响最大的核心字段。
-            <el-divider content-position="left">时间信息</el-divider>
+            </div>
-            <el-row :gutter="20">
+          </el-card>
-              <el-col :span="12">
+
          <el-card class="panel-card factor-card" shadow="never">
            <div class="section-heading">
              <div>
                <h3 class="section-title">缺勤事件核心信息</h3>
                <p class="section-caption">决定本次缺勤时长的直接事件属性。</p>
              </div>
              <span class="soft-tag">Event</span>
            </div>
            <el-form :model="form" label-width="118px" size="small">
              <el-row :gutter="18">
                <el-col :span="12">
                <el-form-item label="请假类型">
                  <el-select v-model="form.leave_type" style="width: 100%">
                    <el-option v-for="item in leaveTypes" :key="item" :label="item" :value="item" />
                  </el-select>
                </el-form-item>
              </el-col>
                <el-col :span="12">
                <el-form-item label="原因大类">
                  <el-select v-model="form.leave_reason_category" style="width: 100%">
                    <el-option v-for="item in leaveReasons" :key="item" :label="item" :value="item" />
                  </el-select>
                </el-form-item>
                </el-col>
                <el-col :span="12">
                <el-form-item label="缺勤月份">
-                  <el-select v-model="form.month_of_absence" style="width: 100%">
+                  <el-select v-model="form.absence_month" style="width: 100%">
-                    <el-option v-for="m in 12" :key="m" :label="m + '月'" :value="m" />
+                    <el-option v-for="month in 12" :key="month" :label="`${month}月`" :value="month" />
                  </el-select>
                </el-form-item>
-              </el-col>
+                </el-col>
-              <el-col :span="12">
+                <el-col :span="12">
-                <el-form-item label="星期几">
+                <el-form-item label="星期">
-                  <el-select v-model="form.day_of_week" style="width: 100%">
+                  <el-select v-model="form.weekday" style="width: 100%">
-                    <el-option label="周一" :value="2" />
+                    <el-option v-for="item in weekdays" :key="item.value" :label="item.label" :value="item.value" />
                    <el-option label="周二" :value="3" />
                    <el-option label="周三" :value="4" />
                    <el-option label="周四" :value="5" />
                    <el-option label="周五" :value="6" />
                  </el-select>
                </el-form-item>
-              </el-col>
+                </el-col>
-              <el-col :span="12">
+                <el-col :span="12">
-                <el-form-item label="季节">
+                <el-form-item label="节假日前后">
-                  <el-select v-model="form.seasons" style="width: 100%">
+                  <el-radio-group v-model="form.near_holiday_flag">
-                    <el-option label="夏季" :value="1" />
+                    <el-radio :value="1">是</el-radio>
-                    <el-option label="秋季" :value="2" />
+                    <el-radio :value="0">否</el-radio>
-                    <el-option label="冬季" :value="3" />
+                  </el-radio-group>
                    <el-option label="春季" :value="4" />
                  </el-select>
                </el-form-item>
-              </el-col>
+                </el-col>
-              <el-col :span="12">
+                <el-col :span="12">
-                <el-form-item label="缺勤原因">
+                <el-form-item label="医院证明">
-                  <el-select v-model="form.reason_for_absence" style="width: 100%">
+                  <el-radio-group v-model="form.medical_certificate_flag">
                    <el-option label="医疗咨询" :value="23" />
                    <el-option label="牙科咨询" :value="28" />
                    <el-option label="理疗" :value="27" />
                    <el-option label="医疗随访" :value="22" />
                    <el-option label="实验室检查" :value="25" />
                    <el-option label="无故缺勤" :value="26" />
                    <el-option label="献血" :value="24" />
                    <el-option label="传染病" :value="1" />
                    <el-option label="呼吸系统疾病" :value="10" />
                    <el-option label="消化系统疾病" :value="11" />
                    <el-option label="肌肉骨骼疾病" :value="13" />
                  </el-select>
                </el-form-item>
              </el-col>
            </el-row>
            <el-divider content-position="left">个人信息</el-divider>
            <el-row :gutter="20">
              <el-col :span="12">
                <el-form-item label="年龄">
                  <el-input-number v-model="form.age" :min="18" :max="60" style="width: 100%" />
                </el-form-item>
              </el-col>
              <el-col :span="12">
                <el-form-item label="工龄">
                  <el-input-number v-model="form.service_time" :min="1" :max="30" style="width: 100%" />
                </el-form-item>
              </el-col>
              <el-col :span="12">
                <el-form-item label="学历">
                  <el-select v-model="form.education" style="width: 100%">
                    <el-option label="高中" :value="1" />
                    <el-option label="本科" :value="2" />
                    <el-option label="研究生" :value="3" />
                    <el-option label="博士" :value="4" />
                  </el-select>
                </el-form-item>
              </el-col>
              <el-col :span="12">
                <el-form-item label="BMI指数">
                  <el-input-number v-model="form.bmi" :min="18" :max="40" :precision="1" style="width: 100%" />
                </el-form-item>
              </el-col>
            </el-row>
            <el-divider content-position="left">工作信息</el-divider>
            <el-row :gutter="20">
              <el-col :span="12">
                <el-form-item label="交通费用">
                  <el-input-number v-model="form.transportation_expense" :min="100" :max="400" style="width: 100%" />
                </el-form-item>
              </el-col>
              <el-col :span="12">
                <el-form-item label="通勤距离">
                  <el-input-number v-model="form.distance" :min="1" :max="60" style="width: 100%" />
                </el-form-item>
              </el-col>
              <el-col :span="12">
                <el-form-item label="工作负荷">
                  <el-input-number v-model="form.work_load" :min="200" :max="350" style="width: 100%" />
                </el-form-item>
              </el-col>
              <el-col :span="12">
                <el-form-item label="达标率">
                  <el-input-number v-model="form.hit_target" :min="80" :max="100" style="width: 100%" />
                </el-form-item>
              </el-col>
              <el-col :span="12">
                <el-form-item label="违纪记录">
                  <el-radio-group v-model="form.disciplinary_failure">
                    <el-radio :value="0">无</el-radio>
                    <el-radio :value="1">有</el-radio>
                    <el-radio :value="0">无</el-radio>
                  </el-radio-group>
                </el-form-item>
-              </el-col>
+                </el-col>
-            </el-row>
+              </el-row>
-            
+            </el-form>
-            <el-divider content-position="left">生活习惯</el-divider>
+          </el-card>
-            <el-row :gutter="20">
+
-              <el-col :span="12">
+          <el-card class="panel-card factor-card" shadow="never">
-                <el-form-item label="饮酒习惯">
+            <div class="section-heading">
-                  <el-radio-group v-model="form.social_drinker">
+              <div>
-                    <el-radio :value="0">否</el-radio>
+                <h3 class="section-title">工作压力与排班</h3>
                <p class="section-caption">体现通勤、加班和排班对缺勤的影响。</p>
              </div>
              <span class="soft-tag">Workload</span>
            </div>
            <el-form :model="form" label-width="118px" size="small">
              <el-row :gutter="18">
                <el-col :span="12">
                <el-form-item label="班次类型">
                  <el-select v-model="form.shift_type" style="width: 100%">
                    <el-option v-for="item in shiftTypes" :key="item" :label="item" :value="item" />
                  </el-select>
                </el-form-item>
                </el-col>
                <el-col :span="12">
                <el-form-item label="夜班岗位">
                  <el-radio-group v-model="form.is_night_shift">
                    <el-radio :value="1">是</el-radio>
                  </el-radio-group>
                </el-form-item>
              </el-col>
              <el-col :span="12">
                <el-form-item label="吸烟习惯">
                  <el-radio-group v-model="form.social_smoker">
                    <el-radio :value="0">否</el-radio>
                    <el-radio :value="1">是</el-radio>
                  </el-radio-group>
                </el-form-item>
-              </el-col>
+                </el-col>
-              <el-col :span="12">
+                <el-col :span="12">
                <el-form-item label="月均加班时长">
                  <el-input-number v-model="form.monthly_overtime_hours" :min="0" :max="100" style="width: 100%" />
                </el-form-item>
                </el-col>
                <el-col :span="12">
                <el-form-item label="通勤时长(分钟)">
                  <el-input-number v-model="form.commute_minutes" :min="5" :max="150" style="width: 100%" />
                </el-form-item>
                </el-col>
                <el-col :span="12">
                <el-form-item label="慢性病史">
                  <el-radio-group v-model="form.chronic_disease_flag">
                    <el-radio :value="1">有</el-radio>
                    <el-radio :value="0">无</el-radio>
                  </el-radio-group>
                </el-form-item>
                </el-col>
              </el-row>
            </el-form>
          </el-card>
          <el-card class="panel-card factor-card" shadow="never">
            <div class="section-heading">
              <div>
                <h3 class="section-title">家庭与补充因素</h3>
                <p class="section-caption">作为结果修正项，为预测增加业务语境。</p>
              </div>
              <span class="soft-tag">Context</span>
            </div>
            <el-form :model="form" label-width="118px" size="small">
              <el-row :gutter="18">
                <el-col :span="12">
                <el-form-item label="子女数量">
-                  <el-input-number v-model="form.son" :min="0" :max="5" style="width: 100%" />
+                  <el-input-number v-model="form.children_count" :min="0" :max="3" style="width: 100%" />
                </el-form-item>
-              </el-col>
+                </el-col>
-              <el-col :span="12">
+                <el-col :span="12">
-                <el-form-item label="宠物数量">
+                <el-form-item label="所属行业">
-                  <el-input-number v-model="form.pet" :min="0" :max="10" style="width: 100%" />
+                  <el-select v-model="form.industry" style="width: 100%">
                    <el-option v-for="item in industries" :key="item" :label="item" :value="item" />
                  </el-select>
                </el-form-item>
-              </el-col>
+                </el-col>
-            </el-row>
+                <el-col :span="12">
-            
+                <el-form-item label="婚姻状态">
-            <el-divider content-position="left">预测设置</el-divider>
+                  <el-select v-model="form.marital_status" style="width: 100%">
-            <el-row :gutter="20">
+                    <el-option v-for="item in maritalStatuses" :key="item" :label="item" :value="item" />
-              <el-col :span="12">
+                  </el-select>
                </el-form-item>
                </el-col>
              </el-row>
            </el-form>
          </el-card>
          <el-card class="panel-card factor-card action-card" shadow="never">
            <div class="section-heading">
              <div>
                <h3 class="section-title">预测设置</h3>
                <p class="section-caption">支持自动选择最优模型或查看模型对比。</p>
              </div>
              <span class="soft-tag">Action</span>
            </div>
            <el-form :model="form" label-width="118px" size="small">
              <el-row :gutter="18">
                <el-col :span="12">
                <el-form-item label="选择模型">
                  <el-select v-model="selectedModel" style="width: 100%" :loading="modelsLoading">
                    <el-option label="自动选择最优" value="" />
-                    <el-option 
+                    <el-option v-for="model in availableModels" :key="model.name" :label="model.name_cn" :value="model.name">
                      v-for="model in availableModels" 
                      :key="model.name" 
                      :label="model.name_cn" 
                      :value="model.name"
                    >
                      <span>{{ model.name_cn }}</span>
-                      <span style="float: right; color: #909399; font-size: 12px">
+                      <span style="float: right; color: #909399; font-size: 12px">R²: {{ model.metrics?.r2?.toFixed(2) || '-' }}</span>
                        R²: {{ model.metrics?.r2?.toFixed(2) || '-' }}
                      </span>
                    </el-option>
                  </el-select>
                </el-form-item>
-              </el-col>
+                </el-col>
-              <el-col :span="12">
+                <el-col :span="12">
                <el-form-item label="模型对比">
                  <el-switch v-model="showCompare" active-text="显示" inactive-text="隐藏" />
                </el-form-item>
-              </el-col>
+                </el-col>
-            </el-row>
+              </el-row>
-            
+
-            <el-form-item style="margin-top: 20px">
+              <div class="action-row">
-              <el-button type="primary" @click="handlePredict" :loading="loading" size="default">
+                <el-button type="primary" @click="handlePredict" :loading="loading">开始预测</el-button>
-                开始预测
+                <el-button @click="handleCompare" :loading="compareLoading">模型对比</el-button>
-              </el-button>
+              </div>
-              <el-button @click="handleCompare" :loading="compareLoading" size="default">
+            </el-form>
-                模型对比
+          </el-card>
-              </el-button>
+        </div>
            </el-form-item>
          </el-form>
        </el-card>
      </el-col>
-      <el-col :span="10">
+      <el-col :xs="24" :xl="9">
-        <el-card>
+        <el-card class="panel-card result-card merged-result-card" shadow="never">
          <template #header>
-            <span>预测结果</span>
+            <div class="section-heading" style="margin-bottom: 0">
-          </template>
+              <div>
-          <div v-if="result" class="result-container">
+                <h3 class="section-title">预测结果与风险说明</h3>
-            <div class="result-value">{{ result.predicted_hours }}</div>
+                <p class="section-caption">在同一张卡片内查看预测值、模型信息和风险区间说明。</p>
-            <div class="result-unit">小时</div>
+              </div>
-            <el-tag :type="riskTagType" size="large" style="margin-top: 20px">
+              <span class="soft-tag">Result</span>
              {{ result.risk_label }}
            </el-tag>
            <div style="margin-top: 20px; color: #909399">
              模型: {{ result.model_name_cn }}
            </div>
-            <div style="margin-top: 8px; color: #909399; font-size: 12px">
+          </template>
-              置信度: {{ (result.confidence * 100).toFixed(0) }}%
+          <div class="merged-result-grid">
            <div>
              <div v-if="result" class="result-container">
                <div class="result-stack">
                  <div class="result-info-card result-info-primary">
                    <div class="mini-label">预测缺勤时长</div>
                    <div class="result-value">{{ result.predicted_hours }}</div>
                    <div class="result-unit">小时</div>
                  </div>
                  <div class="result-info-card">
                    <div class="mini-label">风险等级</div>
                    <el-tag :type="riskTagType" size="large">{{ result.risk_label }}</el-tag>
                  </div>
                  <div class="result-info-card">
                    <div class="mini-label">使用模型</div>
                    <div class="mini-value">{{ result.model_name_cn }}</div>
                  </div>
                  <div class="result-info-card">
                    <div class="mini-label">置信度</div>
                    <div class="mini-value">{{ (result.confidence * 100).toFixed(0) }}%</div>
                  </div>
                </div>
              </div>
              <el-empty v-else description="输入中国企业员工场景后开始预测" />
            </div>
            <div class="risk-legend-cards">
              <div class="risk-level-card risk-level-low">
                <div class="risk-card-head">
                  <el-tag type="success" size="small">低风险</el-tag>
                </div>
                <div class="risk-card-rule">缺勤时长 &lt; 4 小时</div>
                <div class="risk-card-desc">通常为短时请假或轻度波动。</div>
              </div>
              <div class="risk-level-card risk-level-medium">
                <div class="risk-card-head">
                  <el-tag type="warning" size="small">中风险</el-tag>
                </div>
                <div class="risk-card-rule">缺勤时长 4 - 8 小时</div>
                <div class="risk-card-desc">属于需要关注的常规风险区间。</div>
              </div>
              <div class="risk-level-card risk-level-high">
                <div class="risk-card-head">
                  <el-tag type="danger" size="small">高风险</el-tag>
                </div>
                <div class="risk-card-rule">缺勤时长 &gt; 8 小时</div>
                <div class="risk-card-desc">通常对应较强事件驱动或持续性风险。</div>
              </div>
            </div>
          </div>
          <el-empty v-else description="请输入参数后点击预测" />
        </el-card>
-        
+
-        <el-card v-if="compareResults.length > 0" style="margin-top: 20px">
+        <el-card v-if="compareResults.length > 0" class="panel-card compare-card" shadow="never">
          <template #header>
-            <span>模型对比结果</span>
+            <div class="section-heading" style="margin-bottom: 0">
              <div>
                <h3 class="section-title">模型对比结果</h3>
                <p class="section-caption">选择最适合展示的候选模型。</p>
              </div>
              <span class="soft-tag">Compare</span>
            </div>
          </template>
          <el-table :data="compareResults" size="small" :row-class-name="getRowClass">
            <el-table-column prop="model_name_cn" label="模型" width="100" />
-            <el-table-column prop="predicted_hours" label="预测时长" width="80">
+            <el-table-column prop="predicted_hours" label="预测时长" width="90">
              <template #default="{ row }">{{ row.predicted_hours }}h</template>
            </el-table-column>
            <el-table-column prop="risk_label" label="风险等级" width="80">
@@ -233,54 +296,47 @@
            </el-table-column>
          </el-table>
        </el-card>
-        
+
        <el-card style="margin-top: 20px">
          <template #header>
            <span>风险等级说明</span>
          </template>
          <div class="risk-legend">
            <div class="risk-item">
              <el-tag type="success" size="small">低风险</el-tag>
              <span>缺勤时长 &lt; 4小时</span>
            </div>
            <div class="risk-item">
              <el-tag type="warning" size="small">中风险</el-tag>
              <span>缺勤时长 4-8小时</span>
            </div>
            <div class="risk-item">
              <el-tag type="danger" size="small">高风险</el-tag>
              <span>缺勤时长 &gt; 8小时</span>
            </div>
          </div>
        </el-card>
      </el-col>
    </el-row>
  </div>
 </template>
 <script setup>
-import { ref, computed, onMounted } from 'vue'
+import { computed, onMounted, ref } from 'vue'
 import request from '@/api/request'
 import { ElMessage } from 'element-plus'
 import request from '@/api/request'
 const industries = ['制造业', '互联网', '零售连锁', '物流运输', '金融服务', '医药健康', '建筑工程']
 const shiftTypes = ['标准白班', '两班倒', '三班倒', '弹性班']
 const maritalStatuses = ['未婚', '已婚', '离异/其他']
 const leaveTypes = ['病假', '事假', '年假', '调休', '婚假', '丧假', '产检育儿假', '工伤假', '其他']
 const leaveReasons = ['身体不适', '家庭事务', '子女照护', '交通受阻', '突发事件', '职业疲劳', '就医复查']
 const weekdays = [
  { label: '周一', value: 1 },
  { label: '周二', value: 2 },
  { label: '周三', value: 3 },
  { label: '周四', value: 4 },
  { label: '周五', value: 5 },
  { label: '周六', value: 6 },
  { label: '周日', value: 7 }
 ]
 const defaultForm = {
-  reason_for_absence: 23,
+  industry: '制造业',
-  month_of_absence: 7,
+  shift_type: '标准白班',
-  day_of_week: 3,
+  marital_status: '已婚',
-  seasons: 1,
+  children_count: 1,
-  transportation_expense: 200,
+  monthly_overtime_hours: 26,
-  distance: 20,
+  commute_minutes: 42,
-  service_time: 5,
+  is_night_shift: 0,
-  age: 30,
+  chronic_disease_flag: 0,
-  work_load: 250,
+  absence_month: 5,
-  hit_target: 95,
+  weekday: 2,
-  disciplinary_failure: 0,
+  leave_type: '病假',
-  education: 1,
+  leave_reason_category: '身体不适',
-  son: 0,
+  near_holiday_flag: 0,
-  pet: 0,
+  medical_certificate_flag: 1
  bmi: 25,
  social_drinker: 0,
  social_smoker: 0
 }
 const form = ref({ ...defaultForm })
@@ -317,8 +373,6 @@ async function loadModels() {
  try {
    const res = await request.get('/predict/models')
    availableModels.value = res.models || []
  } catch (e) {
    console.error('Failed to load models:', e)
  } finally {
    modelsLoading.value = false
  }
@@ -328,16 +382,11 @@ async function handlePredict() {
  loading.value = true
  try {
    const params = { ...form.value }
-    if (selectedModel.value) {
+    if (selectedModel.value) params.model_type = selectedModel.value
      params.model_type = selectedModel.value
    }
    result.value = await request.post('/predict/single', params)
-    
+    if (showCompare.value) await handleCompare()
    if (showCompare.value) {
      await handleCompare()
    }
  } catch (e) {
-    ElMessage.error('预测失败: ' + e.message)
+    ElMessage.error(`预测失败: ${e.message}`)
  } finally {
    loading.value = false
  }
@@ -349,7 +398,7 @@ async function handleCompare() {
    const res = await request.post('/predict/compare', form.value)
    compareResults.value = res.results || []
  } catch (e) {
-    ElMessage.error('对比失败: ' + e.message)
+    ElMessage.error(`对比失败: ${e.message}`)
  } finally {
    compareLoading.value = false
  }
@@ -362,32 +411,156 @@ onMounted(() => {
 <style scoped>
 .result-container {
  display: block;
  padding: 8px 0 4px;
 }
 .prediction-hero {
  background:
    linear-gradient(135deg, rgba(15, 23, 42, 0.96), rgba(15, 118, 110, 0.92) 50%, rgba(194, 65, 12, 0.88));
 }
 .prediction-input-grid {
  display: grid;
  grid-template-columns: repeat(2, minmax(0, 1fr));
  gap: 20px;
 }
 .intro-card {
  grid-column: 1 / -1;
 }
 .action-card {
  grid-column: 1 / -1;
 }
 .result-card,
 .compare-card {
  height: 100%;
 }
 .compare-card {
  margin-top: 20px;
 }
 .merged-result-card {
  min-height: 100%;
 }
 .merged-result-grid {
  display: grid;
  grid-template-columns: repeat(2, minmax(0, 1fr));
  gap: 14px;
  align-items: start;
 }
 .result-stack {
  display: grid;
  grid-template-columns: repeat(2, minmax(0, 1fr));
  gap: 12px;
 }
 .result-info-card {
  padding: 18px 14px;
  border: 1px solid var(--line-soft);
  border-radius: 18px;
  background: rgba(255, 255, 255, 0.76);
  text-align: center;
-  padding: 30px 0;
+}
 .result-info-primary {
  grid-column: 1 / -1;
  padding: 24px 18px;
  border: 1px solid rgba(15, 118, 110, 0.14);
  background: linear-gradient(135deg, rgba(15, 118, 110, 0.12), rgba(58, 122, 254, 0.08));
 }
 .mini-label {
  margin-bottom: 10px;
  font-size: 12px;
  color: var(--text-subtle);
 }
 .mini-value {
  font-size: 16px;
  font-weight: 700;
  color: var(--text-main);
 }
 .form-tip {
  margin-top: 14px;
  padding: 12px 14px;
  font-size: 13px;
  line-height: 1.6;
  color: #606266;
  background: #f4f8ff;
  border-left: 3px solid #3A7AFE;
  border-radius: 6px;
 }
 .factor-card :deep(.el-form-item) {
  margin-bottom: 18px;
 }
 .action-row {
  display: flex;
  gap: 12px;
  margin-top: 14px;
 }
 .result-value {
  font-size: 48px;
  font-weight: bold;
-  color: #409EFF;
+  color: #3A7AFE;
 }
 .result-unit {
-  font-size: 16px;
+  margin-top: 8px;
  font-size: 14px;
  color: #909399;
 }
-.risk-legend {
+.risk-legend-cards {
-  font-size: 13px;
+  display: grid;
  gap: 12px;
 }
-.risk-item {
+.risk-level-card {
-  display: flex;
+  padding: 16px;
-  align-items: center;
+  border-radius: 18px;
-  gap: 10px;
+  border: 1px solid var(--line-soft);
  background: rgba(255, 255, 255, 0.76);
 }
 .risk-level-low {
  background: linear-gradient(135deg, rgba(34, 197, 94, 0.08), rgba(255, 255, 255, 0.9));
 }
 .risk-level-medium {
  background: linear-gradient(135deg, rgba(245, 158, 11, 0.1), rgba(255, 255, 255, 0.9));
 }
 .risk-level-high {
  background: linear-gradient(135deg, rgba(239, 68, 68, 0.1), rgba(255, 255, 255, 0.9));
 }
 .risk-card-head {
  margin-bottom: 10px;
 }
 .risk-card-rule {
  font-size: 15px;
  font-weight: 700;
  color: var(--text-main);
 }
 .risk-card-desc {
  margin-top: 6px;
  font-size: 13px;
  line-height: 1.6;
  color: var(--text-subtle);
 }
 .el-divider {
  margin: 15px 0;
 }
@@ -395,4 +568,29 @@ onMounted(() => {
 :deep(.recommended-row) {
  background-color: #f0f9eb;
 }
 :deep(.intro-card .el-card__header),
 :deep(.result-card .el-card__header),
 :deep(.compare-card .el-card__header) {
  padding-bottom: 0;
  border-bottom: none;
 }
 :deep(.factor-card .el-card__header) {
  border-bottom: none;
  padding-bottom: 0;
 }
@media (max-width: 1200px) {
  .prediction-input-grid {
    grid-template-columns: 1fr;
  }
 }
@media (max-width: 768px) {
  .merged-result-grid,
  .result-stack {
    grid-template-columns: 1fr;
  }
 }
 </style>
Author	SHA1	Message	Date
shenjianZ	eab1a62ffb	fix: backend/core/generate_evaluation_plots.py	2026-03-20 17:05:02 +08:00
shenjianZ	6d42d9dac3	fix: 脚本入口路径	2026-03-20 17:03:27 +08:00
shenjianZ	1e1d4b0d17	fix:generate_evaluation_plots	2026-03-20 17:01:30 +08:00
shenjianZ	cc85e3807a	fix:评估对齐	2026-03-20 16:52:24 +08:00
shenjianZ	77e38fd15b	feat: 升级深度学习模型为 Temporal Fusion Transformer 架构 - 将 LSTMMLPRegressor 重构为 TemporalFusionRegressor，采用 Transformer Encoder 替代 LSTM - 新增 LearnedAttentionPooling 和 GatedResidualBlock 模块增强模型表达能力 - 优化训练策略，使用 OneCycleLR 调度器和样本加权机制 - 改进缺勤事件采样算法，基于压力、健康、家庭等维度更精确地计算缺勤时长 - 更新 .gitignore 排除原始数据文件，删除不再使用的原始 CSV 文件	2026-03-20 16:30:08 +08:00
shenjianZ	ff0fbf96f7	docs: 添加题目名称技术路线预期结果文档 - 新增毕业设计题目说明、技术路线规划和预期结果描述 - 优化深度学习模型代码，支持 PyTorch 可选依赖	2026-03-20 16:14:11 +08:00
shenjianZ	844cf9a130	feat(training): strengthen lstm-mlp with embeddings and early stopping	2026-03-12 18:56:06 +08:00
shenjianZ	d70bd54c41	fix(training): patch lightgbm sklearn compatibility	2026-03-12 18:15:09 +08:00
shenjianZ	d7c8019f96	feat: fix doc	2026-03-11 10:47:15 +08:00
shenjianZ	e63267cef6	feat: 将数据集从国外员工缺勤数据替换为中国企业缺勤模拟数据 - 新增中国企业员工缺勤模拟数据集生成脚本(generate_dataset.py)，覆盖7个行业、180家企业、2600名员工 - 重构 config.py，更新特征字段为中文名称，调整目标列、员工ID、行业类型等配置 - 重构 clustering.py，简化聚类逻辑，更新聚类特征和群体命名（高压通勤型、健康波动型等） - 重构 feature_mining.py，更新相关性分析和群体比较维度（按行业、班次、婚姻状态等） - 新增 model_features.py 定义模型训练特征 - 更新 preprocessing.py 和 train_model.py 适配新数据结构 - 更新各 API 路由默认参数（model: random_forest, dimension: industry） - 前端更新主题样式和各视图组件适配中文字段 - 更新系统名称为 China Enterprise Absence Analysis System	2026-03-11 10:46:58 +08:00