1 环境准备

pip install pyhive pandas matplotlib seaborn

2 Hive 建表 & 造数(Hive 里执行一次即可)

CREATE TABLE IF NOT EXISTS sales (
    dt string,
    amount int
);

INSERT INTO sales VALUES
('2024-01',120),('2024-02',150),('2024-03',130),('2024-04',180);

3 Python 连接 Hive 并拉数

from pyhive import hive
import pandas as pd

conn = hive.Connection(
        host='hive_host',        # 换成你的 HiveServer2 IP
        port=10000,
        username='hive_user',
        database='default',
        auth='NONE')

sql = """
SELECT substr(dt,1,7) AS month,
       SUM(amount)     AS total_sales
FROM   sales
WHERE  dt BETWEEN '2024-01' AND '2024-12'
GROUP  BY substr(dt,1,7)
ORDER  BY month
"""

df = pd.read_sql(sql, conn)
conn.close()
print(df.head())

4 可视化(Matplotlib 版)

import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')

plt.figure(figsize=(6,4))
plt.plot(df['month'], df['total_sales'], marker='o')
plt.title('2024 Monthly Sales (from Hive)')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.tight_layout()
plt.savefig('hive_sales.png', dpi=200)
plt.show()

5 进阶:Seaborn 一键美化

import seaborn as sns
sns.barplot(x='month', y='total_sales', data=df, palette='Blues_d')
plt.title('2024 Monthly Sales (Bar)')
plt.savefig('hive_sales_bar.png', dpi=200)

6 交互式(Plotly)

import plotly.express as px
fig = px.line(df, x='month', y='total_sales', markers=True)
fig.update_layout(title='Interactive Hive Sales')
fig.write_html('hive_sales_interactive.html')   # 双击即可在浏览器查看

一句话总结

  • PyHive 负责把 Hive 数据拉成 Pandas DataFrame
  • Pandas 做清洗、聚合
  • Matplotlib / Seaborn / Plotly 根据场景做静态、美化或交互式可视化
Logo

开源鸿蒙跨平台开发社区汇聚开发者与厂商,共建“一次开发,多端部署”的开源生态,致力于降低跨端开发门槛,推动万物智联创新。

更多推荐