数据准备

本示例需要您从 OSS 上下载数据,并上传到您自己的 OSS bucket上。数据包含

分别上传到您 OSS bucket 指定目录的 userinfo 子目录,videoinfo 目录,playvideo 目录。例如 bucket example 下的 demo/userinfo 目录。

将下面创建表的 sql 中 [bucketname] 替换成您的 bucket 名字例如 example,[region] 替换成您用的 OSS 地域名如hangzhou,[bucketpath] 替换成您 OSS 的指定的路径前缀例如 demo。

段落 1 创建用户表

%hive
CREATE EXTERNAL TABLE user_info(id int,sex int,age int, marital_status int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 'oss://[bucketname].oss-cn-[region]-internal.aliyuncs.com/[bucketpath]/userinfo'

段落 2 创建视频表

%hive
CREATE EXTERNAL TABLE video_info(id int,title string,type string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 'oss://[bucketname].oss-cn-[region]-internal.aliyuncs.com/[bucketpath]/videoinfo'

段落 3 创建播放表

%hive
CREATE EXTERNAL TABLE play_video(user_id int,video_id int, play_time bigint) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 'oss://[bucketname].oss-cn-[region]-internal.aliyuncs.com/[bucketpath]/playvideo'

段落 4 用户表计数

%sql select count(*) from user_info

段落 5 视频表计数

%sql select count(*) from video_info

段落 6 播放表计数

%sql select count(*) from play_video

段落 7 统计各类型视频播放数

%sql select video.type, count(video.type) as count from play_video play join video_info video on (play.video_id = video.id) group by video.type order by count desc

段落 8 播放数 top 10 的视频信息

%sql select video.id, video.title, video.type, video_count.count from (select video_id, count(video_id) as count from play_video group by video_id order by count desc limit 10) video_count join video_info video on (video_count.video_id = video.id) order by count desc

段落 9 播放数最高视频观看者的年龄分布

%sql select age , count(*) as count  from (select distinct(user_id) from play_video  where video_id =49 ) play join user_info userinfo on (play.user_id = userinfo.id) group by userinfo.age

段落 10 播放数最高视频观看者的性别,年龄段,婚姻状态分布汇总

%sql select if(sex=0,'女','男') as title, count(*) as count, '性别' as type from (select distinct(user_id) from play_video  where video_id =49 ) play join user_info userinfo on (play.user_id = userinfo.id) group by userinfo.sex
union all
select case when userinfo.age<15 then '小于15' when age<25 then '15-25' when age<35 then '25-35' else '大于35' end  , count(*) as count, '年龄段' as type from (select distinct(user_id) from play_video  where video_id =49) play join user_info userinfo on (play.user_id = userinfo.id) group by case when userinfo.age<15 then '小于15' when age<25 then '15-25' when age<35 then '25-35' else '大于35' end 
union all
select if(marital_status=0,'未婚','已婚') as title, count(*) as count, '婚否' as type from (select distinct(user_id) from play_video  where video_id =49 ) play join user_info userinfo on (play.user_id = userinfo.id) group by marital_status