2021.10
Collibra DIC Integration
Powered By GitBook
DQ Job S3
S3 permissions need to be setup appropriately.
S3 connections should be defined using the root bucket. Nested S3 connections are not supported.

Example Minimum Permissions

1
{
2
"Version": "2012-10-17",
3
"Statement": [
4
{
5
"Sid": "VisualEditor0",
6
"Effect": "Allow",
7
"Action": [
8
"s3:ListBucketMultipartUploads",
9
"s3:ListBucket",
10
"s3:ListMultipartUploadParts",
11
"s3:PutObject",
12
"s3:GetObject",
13
"s3:GetBucketLocation"
14
],
15
"Resource": [
16
"arn:aws:athena:*:<AWSAccountID>:workgroup/primary",
17
"arn:aws:s3:::<S3 bucket name>/*",
18
"arn:aws:s3:::<S3 bucket name>",
19
"arn:aws:glue:*:<AWSAccountID>:catalog",
20
"arn:aws:glue:*:<AWSAccountID>:database/<database name>",
21
"arn:aws:glue:*:<AWSAccountID>:table/<database name>/*"
22
]
23
}
24
]
25
}
Copied!
(Needs appropriate driver) http://central.maven.org/maven2/org/apache/hadoop/hadoop-aws/ Hadoop AWS Driver hadoop-aws-2.7.3.2.6.5.0-292.jar
1
-f "s3a://s3-location/testfile.csv" \
2
-d "," \
3
-rd "2018-01-08" \
4
-ds "salary_data_s3" \
5
-deploymode client \
6
-lib /home/ec2-user/owl/drivers/aws/
Copied!

Databricks Utils Or Spark Conf

1
val AccessKey = "xxx"
2
val SecretKey = "xxxyyyzzz"
3
//val EncodedSecretKey = SecretKey.replace("/", "%2F")
4
val AwsBucketName = "s3-location"
5
val MountName = "kirk"
6
7
dbutils.fs.unmount(s"/mnt/$MountName")
8
9
dbutils.fs.mount(s"s3a://${AccessKey}:${SecretKey}@${AwsBucketName}", s"/mnt/$MountName")
10
//display(dbutils.fs.ls(s"/mnt/$MountName"))
11
12
//sse-s3 example
13
dbutils.fs.mount(s"s3a://$AccessKey:$SecretKey@$AwsBucketName", s"/mnt/$MountName", "sse-s3")
Copied!

Databricks Notebooks using S3 buckets

1
val AccessKey = "ABCDED"
2
val SecretKey = "aaasdfwerwerasdfB"
3
val EncodedSecretKey = SecretKey.replace("/", "%2F")
4
val AwsBucketName = "s3-location"
5
val MountName = "abc"
6
7
// bug if you don't unmount first
8
dbutils.fs.unmount(s"/mnt/$MountName")
9
10
// mount the s3 bucket
11
dbutils.fs.mount(s"s3a://${AccessKey}:${EncodedSecretKey}@${AwsBucketName}", s"/mnt/$MountName")
12
display(dbutils.fs.ls(s"/mnt/$MountName"))
13
14
// read the dataframe
15
val df = spark.read.text(s"/mnt/$MountName/atm_customer/atm_customer_2019_01_28.csv")
Copied!
Last modified 14d ago