2021.10
Collibra DIC Integration
Powered By GitBook
DQ Job 43M rows
Owl commonly benchmarks on large daily datasets. In this case a 43 million row table with 12 columns completes in under 6 mins (5:30). The best balance for this dataset was 3 executors each with 10G of ram.
1
./owlcheck \
2
-u user -p password \
3
-c jdbc:mysql://owldatalake.chzid9w0hpyi.us-east-1.rds.amazonaws.com:3306 \
4
-q "select * from silo.account_large where acc_upd_ts > '2018-02-01 05:0:00'" \
5
-rd 2019-02-02 \
6
-ds account_large \
7
-dc acc_upd_ts \
8
-corroff \
9
-histoff \
10
-driver com.mysql.cj.jdbc.Driver \
11
-lib "/home/ec2-user/owl/drivers/mysql/" \
12
-master yarn \
13
-deploymode client \
14
-numexecutors 3 \
15
-executormemory 10g \
16
-histoff -corroff -loglevel DEBUG -readonly
Copied!
note: not all Owl features were turned on during this run. On large datasets it is worth it to consider limiting the columns, owl-features, or lookbacks if they are not of interest.
Last modified 2yr ago
Copy link