Spark-shell Sample
./bin/spark-shell --jars /opt/owl/bin/owl-core-trunk-jar-with-dependencies.jar,/opt/owl/drivers/postgres/postgresql-42.2.5.jar --deploy-mode client --master local[*]
Import lib’s, if you get a dependency error, please import a second time.
import com.owl.core.util.{OwlUtils, Util}
import com.owl.common.domain2.OwlCheckQ
import com.owl.common.options._
Set up connection parameters to the database we want to scan if you don’t already have a dataframe
val url = "jdbc:postgresql://xxx.xxx.xxx.xxx:xxxx/db?currentSchema=schema" val connProps = Map( "driver" -> "org.postgresql.Driver", "user" -> "user", "password" -> "pwd", "url" -> url, "dbtable" -> "db.table" )
Create a new OwlOptions object so we can assign properties
val opt = new OwlOptions()
Set up variables for ease of re-use
val dataset = "nyse_notebook_test_final"
val runId = "2017-12-18"
var date = runId
var query = s"""select * from <table> where <date_col> = '$date' """
val pgDatabase = "dev" val pgSchema = "public"
Set OwlOptions values to the metastore
opt.dataset = dataset opt.runId = runId opt.host = "xxx.xxx.xxx.xxx" opt.pgUser = "xxxxx" opt.pgPassword = "xxxxx" opt.port = s"5432/$pgDatabase?currentSchema=$pgSchema"
Create a connection, build the dataframe, register and run
With inline processing you will already have a dataframe so you can skip down to setting the OwlContext
val conn = connProps + ("dbtable" -> s"($query) $dataset") val df = spark.read.format("jdbc").options(conn).load
val owl = OwlUtils.OwlContext(df, opt) owl.register(opt) owl.owlCheck
Last modified 20d ago
Copy link