HiveSQL → Python → 可视化
摘要:本文演示了如何用PyHive连接Hive数据库获取销售数据,并通过Python进行可视化分析。流程包括:1)环境准备;2)Hive建表造数;3)Python连接Hive拉取数据到Pandas;4)使用Matplotlib绘制基础折线图;5)用Seaborn美化柱状图;6)通过Plotly生成交互式图表。核心工具链为PyHive获取数据、Pandas处理数据、Matplotlib/Seabor
·
1 环境准备
pip install pyhive pandas matplotlib seaborn
2 Hive 建表 & 造数(Hive 里执行一次即可)
CREATE TABLE IF NOT EXISTS sales (
dt string,
amount int
);
INSERT INTO sales VALUES
('2024-01',120),('2024-02',150),('2024-03',130),('2024-04',180);
3 Python 连接 Hive 并拉数
from pyhive import hive
import pandas as pd
conn = hive.Connection(
host='hive_host', # 换成你的 HiveServer2 IP
port=10000,
username='hive_user',
database='default',
auth='NONE')
sql = """
SELECT substr(dt,1,7) AS month,
SUM(amount) AS total_sales
FROM sales
WHERE dt BETWEEN '2024-01' AND '2024-12'
GROUP BY substr(dt,1,7)
ORDER BY month
"""
df = pd.read_sql(sql, conn)
conn.close()
print(df.head())
4 可视化(Matplotlib 版)
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')
plt.figure(figsize=(6,4))
plt.plot(df['month'], df['total_sales'], marker='o')
plt.title('2024 Monthly Sales (from Hive)')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.tight_layout()
plt.savefig('hive_sales.png', dpi=200)
plt.show()
5 进阶:Seaborn 一键美化
import seaborn as sns
sns.barplot(x='month', y='total_sales', data=df, palette='Blues_d')
plt.title('2024 Monthly Sales (Bar)')
plt.savefig('hive_sales_bar.png', dpi=200)
6 交互式(Plotly)
import plotly.express as px
fig = px.line(df, x='month', y='total_sales', markers=True)
fig.update_layout(title='Interactive Hive Sales')
fig.write_html('hive_sales_interactive.html') # 双击即可在浏览器查看
一句话总结
- PyHive 负责把 Hive 数据拉成 Pandas DataFrame
- Pandas 做清洗、聚合
- Matplotlib / Seaborn / Plotly 根据场景做静态、美化或交互式可视化
更多推荐


所有评论(0)