项目目标
通过自动生成OLTP的数据,让同学们能够自己写数据迁移脚本,把这些数据导入到准备好的数据仓库当中去,考虑数据的增量更新,而不要使用全量更新。
项目要求
每个人都需要创建另外一个OLAP用户,准备从OLTA数据库中导入实际数据,老师给的数据都是CSV格式的,CSV文件每20分钟会生成一份全新的文件,分别是0分钟,20分钟,40分钟,生成的文件位于
192.168.10.7:/backup/orders.YYYYMMDDHHMMSS.csv
192.168.10.7:/backup/order_detail.YYYYMMDDHHMMSS.csv
访问的用户是 backup/123456
其中HH是00-23
项目最后的接口需求
1)生成销售量最近一年排名前10的,并给出销售额的最近一年的环比数据,做成帆软的这些图。选择不同的产品,就显示不同的折线图。
2)生成连续三年销售额都是增长的产品,并且取出销售额前10的产品名称,帆软报表中给出产品名称,年份,销售总额,每个产品统计最近5年。
3)帆软开发人员只需要查询接口表就能够获取到数据,因此这些表的数据每天晚上0点都需要更新。
学生需要自己创建OLAP用户
1) 创建贴源层BASE层完全接受所有数据,贴源层,表结构和OLTP一样
2)创建DWD层,暂时不汇总,但是更新DWD层时候,要求是增量更新。
2)每5,25,45分自动执行导入数据的动作
ZS用户下 建表生成数据表的方法
这里面的内容不需要学生做。已经做好了。
建立二张表格,
建订单表
CREATE TABLE "ZS"."ORDERS" (
"ORDER_ID" NUMBER NOT NULL ,
"ORDER_DATE" DATE ,
"CUSTOMER_ID" NUMBER(10)
)
##建订单明细表
DROP TABLE "ZS"."ORDER_DETAIL";
CREATE TABLE "ZS"."ORDER_DETAIL" (
"ORDER_ID" NUMBER ,
"QUANTITY" NUMBER ,
"PRICE" NUMBER ,
"PRODUCT_ID" NUMBER(10)
)
-- ----------------------------
-- Foreign Keys structure for table ORDER_DETAIL
-- ----------------------------
ALTER TABLE "ZS"."ORDER_DETAIL" ADD CONSTRAINT "SYS_C0011542" FOREIGN KEY ("ORDER_ID") REFERENCES "ORDERS" ("ORDER_ID")
自动生成数据
创建序列
(MARKDOWN )
CREATE SEQUENCE ORDERS_ORDERID_SEQ
初始化数据10000条订单数据
INSERT INTO ORDERS
SELECT ORDERS_ORDERID_SEQ.NEXTVAL ORDER_ID ,
TRUNC(ADD_MONTHS(SYSDATE, -DBMS_RANDOM.VALUE(0, 240 ))) + DBMS_RANDOM.VALUE(0,31)+ DBMS_RANDOM.VALUE(0,1) AS random_date ,
cast(dbms_random.value(1,100) as Number(10) ) CUSTOMER_ID
from dual
connect by level<10000
初始化数据100000条订单明细数据
insert into ORDER_DETAIL
SELECT
cast ( DBMS_RANDOM.VALUE((SELECT MIN(ORDER_ID) FROM ORDERS),(SELECT MAX(ORDER_ID) FROM ORDERS)) as Number(20)),
cast ( DBMS_RANDOM.VALUE(1,2000) as Number(20)),
cast ( DBMS_RANDOM.VALUE(1,500) as Number(20)),
cast ( DBMS_RANDOM.VALUE(1,(SELECT MAX(PRODUCT_ID) FROM PRODUCT)) as Number(20))
FROM DUAL
connect by level<200000
创建20万条记录的时候发现出现错误
ORA-01653: unable to extend table ZS.ORDER_DETAIL by 128 in tablespace SYSTEM
应该是表空间不足,理论上system不放用户数据
ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/XE/system.dbf' AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED;
再次执行20万条记录,成功
insert into ORDER_DETAIL
SELECT
cast ( DBMS_RANDOM.VALUE((SELECT MIN(ORDER_ID) FROM ORDERS),(SELECT MAX(ORDER_ID) FROM ORDERS)) as Number(20)),
cast ( DBMS_RANDOM.VALUE(1,2000) as Number(20)),
cast ( DBMS_RANDOM.VALUE(1,500) as Number(20)),
cast ( DBMS_RANDOM.VALUE(1,(SELECT MAX(PRODUCT_ID) FROM PRODUCT)) as Number(20))
FROM DUAL
connect by level<200000
> Affected rows: 199999
> 时间: 4.723s
-- 如果出现插入错误,可以使用下面的语法保证插入成功
insert into ORDER_DETAIL
SELECT
(
SELECT ORDER_ID FROM (
SELECT ORDER_ID
FROM (
SELECT ORDER_ID,
ROWNUM AS row_num
FROM ORDERS
ORDER BY DBMS_RANDOM.RANDOM()
) t
WHERE row_num >= (SELECT COUNT(*) - 1000 FROM ORDERS)
) WHERE ROWNUM =1
),
cast ( DBMS_RANDOM.VALUE(1,2000) as Number(20)),
cast ( DBMS_RANDOM.VALUE(1,500) as Number(20)),
cast ( DBMS_RANDOM.VALUE(1,(SELECT MAX(PRODUCT_ID) FROM PRODUCT)) as Number(20))
FROM DUAL
connect by level<=10000;
生成product数据
CREATE TABLE "ZS"."PRODUCT"
( "PRODUCT_ID" NUMBER,
"PRODUCT_NAME" VARCHAR2(255),
PRIMARY KEY ("PRODUCT_ID")
)
网上找一个excel 产品名称的文件倒入就可以了。
测试数据
select product_name 产品名称 , to_char(order_date,'YYYY')销售年份 ,sum(price*QUANTITY) total,count(*) 销售次数, sum(QUANTITY) 销售数量
from ORDER_DETAIL
inner JOIN product on product.product_id=ORDER_DETAIL.product_id
inner join orders on orders.ORDER_ID= ORDER_DETAIL.ORDER_ID
group by product_name, to_char(order_date,'YYYY')
order by product_name, to_char(order_date,'YYYY')
结果集如下
产品名称 销售年份 TOTAL 销售次数 销售数量
1,10-癸二醇 2004 1734073 6 6428
1,10-癸二醇 2005 2438717 7 7302
1,10-癸二醇 2006 1386903 6 4490
1,10-癸二醇 2007 577281 5 4408
1,10-癸二醇 2008 2101808 8 8893
1,10-癸二醇 2009 2061650 8 8734
1,10-癸二醇 2010 1999125 5 6004
1,10-癸二醇 2011 2426822 12 10541
模拟插入数据
每次插入100条数据,每5分钟插入一条。这里要写一个crontab就可以了
#!/bin/bash
DBCONN="zs/123456@192.168.10.7:1521/xe"
# 生成 500 条订单
echo $DBCONN
sqlplus -S $DBCONN <<EOF
INSERT INTO ORDERS
SELECT ORDERS_ORDERID_SEQ.NEXTVAL ORDER_ID ,
TRUNC(ADD_MONTHS(SYSDATE, -DBMS_RANDOM.VALUE(0, 240 ))) + DBMS_RANDOM.VALUE(0,31)+ DBMS_RANDOM.VALUE(0,1) AS random_date ,
cast(dbms_random.value(1,100) as Number(10) ) CUSTOMER_ID
from dual
connect by level<500;
EOF
sqlplus -S $DBCONN <<EOF
insert into ORDER_DETAIL
SELECT
(
SELECT ORDER_ID FROM (
SELECT ORDER_ID
FROM (
SELECT ORDER_ID,
ROWNUM AS row_num
FROM ORDERS
ORDER BY DBMS_RANDOM.RANDOM()
) t
WHERE row_num >= (SELECT COUNT(*) - 1000 FROM ORDERS)
) WHERE ROWNUM =1
),
cast ( DBMS_RANDOM.VALUE(1,2000) as Number(20)),
cast ( DBMS_RANDOM.VALUE(1,500) as Number(20)),
cast ( DBMS_RANDOM.VALUE(1,(SELECT MAX(PRODUCT_ID) FROM PRODUCT)) as Number(20))
FROM DUAL
connect by level<=10000;
EOF
自动导出全量的csv文件
在/root/bin目录下创建一个导出一次的shell脚本,如下,一次性导出3个csv文件。
csv文件放在/backup目录下,文件都是最新的文件,旧文件系统会自动删除。
#!/bin/bash
DBCONN="zs/123456@192.168.10.7:1521/xe"
# 生成 50 条订单
function genorder()
{
sqlplus $DBCONN <<EOF
sqlplus -S $DBCONN <<EOF
INSERT INTO ORDERS
SELECT ORDERS_ORDERID_SEQ.NEXTVAL ORDER_ID ,
TRUNC(ADD_MONTHS(SYSDATE, -DBMS_RANDOM.VALUE(0, 240 ))) + DBMS_RANDOM.VALUE(0,31)+ DBMS_RANDOM.VALUE(0,1) AS random_date ,
cast(dbms_random.value(1,100) as Number(10) ) CUSTOMER_ID
from dual
connect by level<50;
EOF
}
function genorderdetail()
{
sqlplus -S $DBCONN <<EOF
insert into ORDER_DETAIL
SELECT
(
SELECT ORDER_ID FROM (
SELECT ORDER_ID
FROM (
SELECT ORDER_ID,
ROWNUM AS row_num
FROM ORDERS
ORDER BY DBMS_RANDOM.RANDOM()
) t
WHERE row_num >= (SELECT COUNT(*) - 100 FROM ORDERS)
) WHERE ROWNUM =1
),
cast( DBMS_RANDOM.VALUE(1,2000) as Number(20)),
cast( DBMS_RANDOM.VALUE(1,500) as Number(20)),
cast( DBMS_RANDOM.VALUE(1,(SELECT MAX(PRODUCT_ID) FROM PRODUCT)) as Number(20))
FROM DUAL
connect by level<=1000;
EOF
}
genorder
genorderdetail
全量导出定时脚本
0,20,40 * * * * /root/bin/gen_data.sql >>/backup/gen_data.log 2>&1
2,22,42 * * * * /root/bin/oltp.sql >>/backup/oltp.log 2>&1 ~
最后编辑:严锋 更新时间:2024-04-12 13:29