news-classifier/ml-module/database.md

52 lines
1.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

### 新闻分类表
```sql
CREATE TABLE news_category (
id INT NOT NULL AUTO_INCREMENT COMMENT '分类ID',
name VARCHAR(50) NOT NULL COMMENT '分类名称',
PRIMARY KEY (id),
UNIQUE KEY uk_name (name)
) ENGINE=InnoDB
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_0900_ai_ci
COMMENT='新闻分类表';
```
数据:
```text
1 娱乐
2 体育
3 财经
4 科技
5 军事
6 汽车
7 政务
8 健康
9 AI
```
### 新闻表
```sql
CREATE TABLE news (
id BIGINT NOT NULL AUTO_INCREMENT COMMENT '自增主键',
url VARCHAR(500) NOT NULL COMMENT '新闻原始URL',
title VARCHAR(255) NOT NULL COMMENT '新闻标题',
category_id INT NULL COMMENT '新闻分类ID',
publish_time DATETIME NULL COMMENT '发布时间',
author VARCHAR(100) NULL COMMENT '作者/来源',
source VARCHAR(50) NULL COMMENT '新闻来源(网易/36kr',
content LONGTEXT NOT NULL COMMENT '新闻正文',
content_hash CHAR(64) NOT NULL COMMENT '正文内容hash用于去重',
created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP COMMENT '入库时间',
PRIMARY KEY (id),
UNIQUE KEY uk_url (url),
UNIQUE KEY uk_content_hash (content_hash),
KEY idx_category_id (category_id),
KEY idx_source (source)
) ENGINE=InnoDB
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_0900_ai_ci
COMMENT='新闻表';
```