2021.10
Collibra DIC Integration
Powered By GitBook
Performance Tests

Cells Per Second Performance Theory (9.5M CPS)

Load and Profile

Dataset
Name
GBs in
Memory
Rows
Cols
Cells
Num
Execs
Num
Cores
Exec
Memory
Network
Time
Total
Time
NYSE
0.1G
103K
9
816K
1
1
1G
00:00:15
00:00:48
AUM
14G
9M
48
432M
5
1
4G
00:01:20
00:03:50
ENERGY
5G
43M
6
258M
8
3
3G
00:00:00
00:04:35
INVEST_DATA
20G
3.8M
158
590M
3
2
3G
00:00:40
00:03:32

NYSE

Postgres database call, no concurrent processing, simple case, small data.
1
-bhtimeoff -numexecutors 1
2
-lib "/opt/owl/drivers/postgres"
3
-executormemory 1g
4
-h metastore01.us-east1-b.c.owl-hadoop-cdh.internal:5432/dev?currentSchema=public
5
-drivermemory 1g -master k8s:// -ds public.nyse_128 -deploymode cluster
6
-q "select * from public.nyse" -bhlb 10 -rd "2020-10-26"
7
-driver "org.postgresql.Driver" -bhminoff
8
-loglevel INFO -cxn postgres-gcp -bhmaxoff
Copied!

AUM

Postgres database call uses parallel JDBC, split on aum_id serial id.
1
-owluser kirk
2
-lib "/opt/owl/drivers/postgres" -datashapeoff
3
-numpartitions 6 -ds public.aum_dt2_50
4
-deploymode cluster -bhlb 10 -bhminoff
5
-cxn postgres-gcp -bhmaxoff -bhtimeoff
6
-numexecutors 6
7
-executormemory 4g -semanticoff
8
-h metastore01.us-east1-b.c.owl-hadoop-cdh.internal:5432/dev?currentSchema=public
9
-columnname aum_id -corroff -drivermemory 4g -master k8s://
10
-q "select * from public.aum_dt2" -histoff -rd "2020-10-27"
11
-driver "org.postgresql.Driver" -loglevel INFO -agentjobid 7664
Copied!

ENERGY

HDFS file with 43 million rows, converting a string date to date type, deploy mode client.
1
-f "hdfs:///demo/owl_usage_all.csv" \
2
-rd "2019-02-02" \
3
-ds energy_file \
4
-loglevel DEBUG -readonly \
5
-d "," -df dd-MMM-yy \
6
-master yarn \
7
-deploymode client \
8
-numexecutors 3 \
9
-executormemory 10g
Copied!

Load Profile Outliers

NYSE - 1:10 total runtime. 20 seconds for outliers

1
-bhtimeoff -owluser kirk -numexecutors 1
2
-lib "/opt/owl/drivers/postgres" -executormemory 1g
3
-dl -h metastore01.us-east1-b.c.owl-hadoop-cdh.internal:5432/dev?currentSchema=public
4
-drivermemory 1g -master k8s:// -ds public.nyse_128 -deploymode cluster
5
-q "select * from public.nyse" -bhlb 10
6
-rd "2020-10-27" -driver "org.postgresql.Driver"
7
-bhminoff -loglevel INFO -cxn postgres-gcp -bhmaxoff -agentjobid 7721
Copied!
Last modified 10mo ago