数据分析

实验步骤
使用hive对数据进行操作

修改hosts文件

  • 首先修改三个节点的hosts文件,在配置文件后保存并退出:

  • 命令 vim /etc/hosts

  • 加入三台机器的ip(以具体实验中机器的ip地址为准)

开启zookeeper

  • 接下来进入zookeeper的安装目录,开启zookeeper(三台机器都需要启动):

  • 命令(打开zookeeper安装目录):cd /usr/zookeeper/zookeeper-3.4.10/

  • 命令(开启zookeeper服务):bin/zkServer.sh start

开启hadoop

  • 首先打开hadoop安装目录(仅在master上执行):

  • 命令(打开hadoop安装目录):cd /usr/hadoop/hadoop-2.7.3/

  • 命令(开启hadoop)sbin/start-all.sh

开启hbase

  • 进入到hbase安装目录,开启hbase(仅在master上执行)

  • 命令(打开hbase安装目录):cd /usr/hbase/hbase-1.2.4/

  • 命令(开启hbase):bin/start-hbase.sh

开启hive

  • 首先进入到hive的安装目录,修改配置文件,最后启动hive server(在slave1中执行):

  • 命令(打开slave1节点上的hive安装目录):cd /usr/hive/apache-hive-2.1.1-bin/conf/

  • 命令(编辑slave1中的hive-site.xml配置文件):vim hive-site.xml

  • 操作结果如下图所示:

  • 接下来修改master中的hive-site.xml文件:

  • 命令(打开master节点上的hive安装目录):cd /usr/hive/apache-hive-2.1.1-bin/conf/

  • 命令(编辑master中的hive-site.xml配置文件):vim hive-site.xml

  • 将原有配置文件中的所有内容删除(可以使用快捷键 dd 去删除整行配置文件)

  • 然后输入一下配置文件:

<configuration>  <!-- Hive产生的元数据存放位置--><property>    <name>hive.metastore.warehouse.dir</name>    <value>/user/hive_remote/warehouse</value></property>    <!--- 使用本地服务连接Hive,默认为true--><property>    <name>hive.metastore.local</name>    <value>false</value></property>    <!-- 连接服务器--><property>    <name>hive.metastore.uris</name><value>thrift://slave1:9083</value></property></configuration>
  • 最后保存退出即可,然后在slave1节点的hive安装目录下输入:bin/hive --service metastore 开启hive服务

  • 接下来在master节点中的hive安装目录下输入:bin/hive准备开启hive客户端

  • 但是此时程序报错,接下来将带大家解决一下错误:

  • 可以看到此错误为hadoop进入安全模式,所以我们需要将hadoop退出安全模式,此错误很可能是集群意外中断所导致,我们可以通过一下命令去退出hadoop的安全模式:

  • 命令(进入master的hive安装目录):cd /usr/hive/apache-hive-2.1.1-bin/conf/

  • 命令(退出hadoop安全模式):bin/hadoop dfsadmin -safemode leave、

  • 最后hive成功开启:

1.查看数据库:

create database hongya;
show databases;
use hongya;

create table match_data(user_id int,age_range int,gender int, merchant_id int, label int, activity_log varchar(1000)) row format delimited fields terminated by ',';

load data local inpath '/root/train_format2.csv' overwrite into table match_data;
select * from match_data limit 100;

CREATE TABLE RESULT AS
SELECT USER_ID,
SPLIT(LOG_SPLIT,':')[0] AS ITEM_ID,
SPLIT(LOG_SPLIT,':')[2] AS BRAND_ID,
SPLIT(LOG_SPLIT,':')[4] AS ATIION_TYPE
FROM (SELECT USER_ID,LOG_SPLIT
FROM match_data
LATERAL VIEW EXPLODE(SPLIT(ACTIVITY_LOG,'#')) ACTIVITY_LOG AS LOG_SPLIT ) T1
;

select * from RESULT limit 100;

CREATE TABLE CLICK AS
SELECT ITEM_ID,COUNT(1) COUNT_1
FROM RESULT
WHERE ATIION_TYPE = '0'
GROUP BY ITEM_ID
ORDER BY COUNT_1 DESC
LIMIT 100;

select * from click;

CREATE TABLE ADD_TO_CART AS
SELECT ITEM_ID,COUNT(1) COUNT_1
FROM RESULT
WHERE ATIION_TYPE = '1'
GROUP BY ITEM_ID
ORDER BY COUNT_1 DESC
LIMIT 100 ;

select * from ADD_TO_CART;

CREATE TABLE COLLECT AS
SELECT ITEM_ID,COUNT(1) COUNT_1
FROM RESULT
WHERE ATIION_TYPE = '3'
GROUP BY ITEM_ID
ORDER BY COUNT_1 DESC
LIMIT 100;
select * from COLLECT;

CREATE TABLE EMPTION AS
SELECT ITEM_ID,COUNT(1) COUNT_1
FROM RESULT
WHERE ATIION_TYPE = '2'
GROUP BY ITEM_ID
ORDER BY COUNT_1 DESC
LIMIT 100;

select * from EMPTION;

  1.计算转化率
  创建click_emp,写入商品点击购买转化率top10(点击转化率较高,建议top1000)
CREATE TABLE CLICK_EMP AS
SELECT ITEM_ID,SUM(IF(ATIION_TYPE = '0',1,0))/COUNT(1) CLICK_EMP_RATE
FROM RESULT T1
GROUP BY ITEM_ID
ORDER BY CLICK_EMP_RATE DESC;

select * from CLICK_EMP limit 100;

  2.创建表add_emp,写入商品加入购物车-购买转化率top10
CREATE TABLE ADD_EMP AS
SELECT ITEM_ID,SUM(IF(ATIION_TYPE = '1',1,0))/COUNT(1) CLICK_EMP_RATE
FROM RESULT T1
GROUP BY ITEM_ID
ORDER BY CLICK_EMP_RATE DESC;

select * from ADD_EMP limit 100;

  3.创建表collect_emp,写入商品收藏-购买转化率top10
CREATE TABLE COLLECT_EMP AS
SELECT ITEM_ID,SUM(IF(ATIION_TYPE = '1',1,0))/COUNT(1) CLICK_EMP_RATE
FROM RESULT T1
GROUP BY ITEM_ID
ORDER BY CLICK_EMP_RATE DESC;

select * from COLLECT_EMP limit 100;


Leave a Comment