DQ Job S3
S3 permissions need to be setup appropriately.
S3 connections should be defined using the root bucket. Nested S3 connections are not supported.

Example Minimum Permissions

1
{
2
"Version": "2012-10-17",
3
"Statement": [
4
{
5
"Sid": "VisualEditor0",
6
"Effect": "Allow",
7
"Action": [
8
"s3:ListBucketMultipartUploads",
9
"s3:ListBucket",
10
"s3:ListMultipartUploadParts",
11
"s3:PutObject",
12
"s3:GetObject",
13
"s3:GetBucketLocation"
14
],
15
"Resource": [
16
"arn:aws:athena:*:<AWSAccountID>:workgroup/primary",
17
"arn:aws:s3:::<S3 bucket name>/*",
18
"arn:aws:s3:::<S3 bucket name>",
19
"arn:aws:glue:*:<AWSAccountID>:catalog",
20
"arn:aws:glue:*:<AWSAccountID>:database/<database name>",
21
"arn:aws:glue:*:<AWSAccountID>:table/<database name>/*"
22
]
23
}
24
]
25
}
Copied!
(Needs appropriate driver) http://central.maven.org/maven2/org/apache/hadoop/hadoop-aws/ Hadoop AWS Driver hadoop-aws-2.7.3.2.6.5.0-292.jar
1
-f "s3a://s3-location/testfile.csv" \
2
-d "," \
3
-rd "2018-01-08" \
4
-ds "salary_data_s3" \
5
-deploymode client \
6
-lib /home/ec2-user/owl/drivers/aws/
Copied!

Databricks Utils Or Spark Conf

1
val AccessKey = "xxx"
2
val SecretKey = "xxxyyyzzz"
3
//val EncodedSecretKey = SecretKey.replace("/", "%2F")
4
val AwsBucketName = "s3-location"
5
val MountName = "kirk"
6
7
dbutils.fs.unmount(s"/mnt/$MountName")
8
9
dbutils.fs.mount(s"s3a://${AccessKey}:${SecretKey}@${AwsBucketName}", s"/mnt/$MountName")
10
//display(dbutils.fs.ls(s"/mnt/$MountName"))
11
12
//sse-s3 example
13
dbutils.fs.mount(s"s3a://$AccessKey:$SecretKey@$AwsBucketName", s"/mnt/$MountName", "sse-s3")
Copied!

Databricks Notebooks using S3 buckets

1
val AccessKey = "ABCDED"
2
val SecretKey = "aaasdfwerwerasdfB"
3
val EncodedSecretKey = SecretKey.replace("/", "%2F")
4
val AwsBucketName = "s3-location"
5
val MountName = "abc"
6
7
// bug if you don't unmount first
8
dbutils.fs.unmount(s"/mnt/$MountName")
9
10
// mount the s3 bucket
11
dbutils.fs.mount(s"s3a://${AccessKey}:${EncodedSecretKey}@${AwsBucketName}", s"/mnt/$MountName")
12
display(dbutils.fs.ls(s"/mnt/$MountName"))
13
14
// read the dataframe
15
val df = spark.read.text(s"/mnt/$MountName/atm_customer/atm_customer_2019_01_28.csv")
Copied!
Last modified 20d ago