Release Notes

2022.09

Enhancements

Rules

  • The Conditions column on the Rules tab now displays SQLG and SQLF rule definitions on hover.

DQ Job

  • The Jobs chart now shows a dotted gray line to represent jobs in Submitted status.
  • The Jobs chart now supports an hourly view option.
  • When you run a Pushdown Job that has a data set that returns 0 rows, an unclear message displays.

Schema

  • From the Config tab in Explorer, a Check Header checkbox under DQ Job is now available for when column names contain special characters. The Check Header checkbox is checked by default.
    • When checked, schema findings do not display when detected.
    • When unchecked, schema findings display when detected.

Behavior

  • Mean values are now rounded on the Findings page.

Explorer

  • SOH delimiters for files are now supported.
  • The Only checkbox on all Build Layer tabs is now removed.
  • The Profile activity is now always enabled and no longer has an on/off switch.

Alerts

  • Only one email per alert is now sent when alerts are set up for a scheduled job.
  • You can now check the logs to see when an alert does not send in order to resend the email.

Scheduler

  • The findings page now displays a green indicator next to the Schedule icon when you schedule a job to run automatically. If Scheduler is inactive, a red indicator displays.

API

  • The v2/gethoot API now properly returns rule dimension information for data sets. (ticket #89973)

Connections

  • The Databricks connection template has changed, due to an upgrade of the driver. Any existing connection that uses the old driver must be updated. Refer to the new template. (ticket #19950)
  • The drivers for Athena, BigQuery, MongoDB, GCS, Hive/Impala were also upgraded but no connection change is required.

Spark

  • The 2022.09 release uses Spark 3.2.2.
We recommend using Spark 3.x for standalone installs/upgrades.

Fixes

Explorer

  • Fixed an issue that prevented the Job Estimator from properly displaying row estimates when the run date was modified during a new job run. (ticket #90860)
  • Fixed an issue that prevented DQ jobs created using NFS connection types from displaying under the Remote File Connections dropdown. (ticket #92479)
  • Fixed an issue that caused the file type parser to throw an error message when the default comma delimiter was not detected. The parser now detects a file's delimiter and updates the delimiter type in the UI automatically. (ticket #89489, 92480)

Files

  • The error message for Failed Merging Schema now has extra logging to clarify the cause of failed schema merges for both Livy sessions and non-Livy paths. (ticket #92694)

Security

  • Fixed an issue with the v2/getcatalogtableshasrulesfromcxn API that triggered a 403 status code when Dataset Security was enabled. (ticket #93298, 94258)

Agent

  • Fixed an issue that caused the Agent Check to no longer attempt check-ins to the metastore on K8s deployments, which resulted in red (unhealthy) status. (ticket #92055, 92963)
  • Fixed an issue that prevented concurrent users from properly running Livy sessions. (ticket #92963, 90432)

Known Limitations

Rules

  • The Rule Builder page becomes unusable if the user creates, validates, saves a new rule and then re-edits.
    • The workaround for this limitation is to do a full page refresh.
  • When a user attempts to validate a rule that contains a stat, an exception error is returned.

Security

  • The Assignments Queue feature is only available for local users. Support for externally connected users, such as SAML and AD connector, is not currently available.

Alerts

  • When alert recipient email addresses are separated by semicolons ;, alerts emails are not sent to the intended recipients.
    • A workaround for this limitation is to separate alert recipient email addresses with commas , instead of semicolons.

Snowflake Pushdown

  • When a Job is run, which has a data set that returns 0 rows, an unclear message displays.
  • When a native rule is created that contains an embedded stat, its calculated value will not display on the Job results page.
  • Data Set security is not supported.
  • Disabling autometrics will not take effect, therefore, all autometrics are executed.
  • Creating a DQ job using only "SQL Query" workflow doesn't allow you to set the rundate value.

DQ Security Metrics

DQ security vulnerabilities over 5 months
Critical security vulnerabilities over 5 months

2022.08

New Features

Rules

Enhancements

Connections

  • You can now authenticate Oracle JDBC connections with Kerberos TGT, Keytab, and Password. (tickets #75267, 76030)
  • You can now authenticate SQL Server JDBC connections with Kerberos Keytab in addition to basic authentication.

Rules

  • Rule Summary enhancements:
    • You can now select different time periods for analysis.
    • You can now view charts from three different pages, including Rule Detail Summary, Rule Breaks, and Rule Dimension Summary.

Security

  • Vulnerabilities identified by Jfrog
    • Vulns 0, criticals 0, high severity 7
    • The majority of the current mediums are due to merging the dq-streaming module into core.
    • For a visual readout, see the DQ Security Metrics section below.

Agent

  • You can now optionally configure individual time zones of DQ Job, Web, and Agent. You should only use this configuration when your instance and containers run in different system time zones. (tickets #87024, 87155)

Behavior

  • The Behavior tab now has a new column, Delta Percent Change (Δ % Change).
  • You can now hover over new tooltips in the following columns:
    • Baseline
    • % Change
    • Δ % Change
    • Zscore
    • Score

Outliers

  • Outlier checks are now optimized to skip in certain circumstances. Outlier checks are only skipped when the history load of a specified date column is empty.
  • You can now update and modify record flags from the command line with -rc, -rcKeys, -rcDateCol, and -rcTbin.

API

  • The v2/gethoot API now properly returns rule dimension information for data sets.
  • The v3/jobs/run API now has improvements to the 400 Bad Request error messages in specific circumstances.

Reports

  • The PDF option is now removed from the Data Set Finding page. To print dynamic column tables, use CSV or Excel options instead. (ticket #89739)

DQ Connector

  • The version of Collibra Integration Library is now updated to 2.4.12.

Fixes

Connections

  • The new GCS jars are required to use GCS spark-history-server. (ticket #90623)

DQ Job

  • Fixed an issue that caused jobs using .TXT files to incorrectly render custom column names. (ticket #81808)
    • Files with .TXT extensions are now treated as delimited files. Files with .TXT extensions that are not delimited files should use their respective file type from the file type dropdown.
  • Fixed an issue with deployments on K8s where jobs failed when the volume name exceeded 63 characters. (ticket #85372)

Agent

  • Fixed an issue that caused the v2/updateagent API to fail when numCores was empty. (tickets #89737, 92404, 92680)
    • The numCores field is no longer a required field.

Validate Source

  • Fixed an issue that caused validate source jobs to fail when the pkey was mapped to different column names. (ticket #88778)

Rules

  • When using Freeform SQL rules with wild-card operators, rules again correctly pass validation. (ticket #89644)
  • Fixed an issue with regex rules that use the characters ), , , and ; in the rlike, which caused DQ to append spaces to those characters and prevented the regex from operating correctly. (tickets #89417, 92958)
  • Fixed an issue that caused rules with column values containing parentheses ( ) to break due to the addition of padding before and after closing parentheses. (ticket #85176)
  • Fixed an issue that caused rules with special characters such as @ to display incorrectly on the Rules page, Conditions tab, and when exported to Excel.
  • Fixed an issue that prevented data sets with attached rules and roles from being renamed. (tickets #85731, 92059, 94315)

Profile

  • Fixed an issue where certain results in TopN Values and Data Preview displayed in scientific notation. Scientific notation is now removed from the display. (tickets #82163, 89738)

Explorer

  • Fixed an issue that allowed CLOB data types to be visible in the Drag Columns to Target map in the Source tab. (ticket #86902)

API

  • The REST API endpoint v2/updateRoleDatasets again correctly saves roles to data sets.

Known Limitations

Rules

  • The Findings page displays results from computational stat rules on mean as a single-quote string. For example, '573523.87' > 6763
  • Column-level sorting for the Rule Summary feature is not currently available.

Admin

  • When adding a Sensitive Label or a Data Category, the Edit and Update functions do not display the selected record. To properly display the record, you must first refresh the page before editing or updating.

Session Activity

  • While the application UI is being redesigned, it is possible that when the application times out on the legacy side of the application, you might not be able to see it on the new React MUI side. This can happen when you have the DQ application open on multiple tabs.
    • We are not currently tracking session timeout from the legacy UI to React.

Beta features

DQ Job

  • Collibra is proud to launch a brand new feature, Snowflake Pushdown. Snowflake Pushdown allows for even faster processing and removes the need to set up a separate Spark compute platform to run Collibra Data Quality. Snowflake Pushdown is a private beta feature only available by request. Since this is a beta feature, some limitations are expected as we continue to improve its functionality. Contact your CSM to learn more about this feature.

DQ Security Metrics

There is a critical CVE CVE-2016-1000027 that shows up in the image scan due to Spring version. This is a false positive and should be added to the exception list of the customer scan tools. We don’t use HttpInvokerServiceExporter anywhere in the application and are not impacted by it.
DQ security vulnerabilities over 5 months
Critical security vulnerabilities over 5 months

2022.07

Standalone packages for the 2022.07 release have a version naming convention of -RC. This will revert back to the standard naming convention with the 2022.08 release, and has no impact on the safety or stability of standalone packages.

Fixes / Enhancements

  • DQ Job
    • Fixed an issue that prevented data from appearing in the Source tab when Source Observation RunID was clicked from the Assignments page.
    • Fixed an issue that caused Annotations with special characters to be truncated in the Labels tab.
    • Fixed an issue that caused the Column (name) column of the Rules tab to display incorrectly when Run Discovery was used.
    • Fixed an issue where the Retrain button on the Record tab was disabled.
    • You can again invalidate observations with single quotes ' from the Shapes tab.
    • The Hints tab now displays any available data.
    • You can no longer change agents from the Scheduler modal.
  • Rules
    • SQLF is now supported for Generic rules.
    • When running a custom rule through Rule Discovery, the column names Repo and Column again display correctly.
  • Alerts
    • You can now send emails using unauthenticated SMTP servers.
  • Security
    • Vulnerabilities identified by Jfrog
      • Vulns 0, criticals 0, high severity 7
      • For a visual readout, see the DQ Security Metrics section below.
    • Fixed an issue that allowed jobs to be run from the command line regardless of connection permissions.
      • When Connection Security is enabled, lock the SQL Editor to prevent unauthorized access to other connections. (#87916)
    • Fixed an issue that allowed View Only users to access some profile results and export the data to a CSV file.
      • Added an authorization check for data set access to the profile export feature, which allows only users with data set access to export the profile. (#87720)
    • Backslashes \ are no longer supported characters for AD usernames without disabling XSS for the /v2/updateadsecurityconfiguration API. (#88499)
    • Fixed an issue that prevented navigation back to the log in page when tenant access was denied. (#89024)
  • Profile
    • From the Labels tab, backslashes are now stripped from annotations when they are used for separation within strings.
  • Admin
    • From Audit Trail, when administrators modify roles mapped to data sets or data sets mapped to roles, changes are now documented automatically, and display original and updated values.
    • The Agent Group (H/A) and its associated endpoints are now deprecated.
    • From Usage, you can now access a table and tiles reflective of your monthly usage metrics.
    • Salesforce account ID can now be configured for use with Pendo logs.
    • *Tech Preview* [TP] ServiceNow integration
      • You can now assign incidents (validate action) to ServiceNow groups and users with the following fields included in the same request: caller_id, description, short_description, cmdb_ci.
  • Explorer
    • Fixed an issue with date range on Oracle connections, which caused end date to change to start date when Transform was selected.
    • The Job Estimate modal again displays the correct number of rows for Sybase connections.
    • Fixed an issue with Source to Target where double quotes " were removed from the source file in database to file targets.
  • Scorecards
    • Enhanced the layout of the Assignment Queues page.
  • API
    • v2/getallscheduledjobs is now available as an enhancement of the original, v2getscheduledjobs.
      • A UI integration is planned for a future release.
  • Schedule
    • Added an Active column to the scheduler export.
      • The RunJob column was removed. (#88799)
  • Reporting
    • Fixed an issue that created misalignment of column headers in PDF exports. (#89739)

Known Limitations

  • Rules
    • To use the new SQLF feature for Generic rules, you must manually update the Generic rule type from SQLG to SQLF.
      • A UI feature for this is planned for a future release.
    • Stat rules such as $rowCount do not work for secondary data sets or previous runId of the same data set via @t1 syntax.
      • To work around this limitation, run a subquery to select count(*) from the secondary data set or the previous runId.
  • Explorer
    • Drill-ins and jobs on Sybase connections run successfully, but connections to Sybase with encrypted passwords are currently unsupported.
  • Files
    • When using CSV files, you cannot use a comma , in the name.
  • Admin
    • *Tech Preview* [TP] ServiceNow integration
      • Special characters [email protected]#$%^&*()in the description are not supported and will not persist to the ServiceNow assignment queue at this time.
      • Empty or invalid ServiceNow group name does not return an error in CDQ.
        • As a result, the ServiceNow assignment is generated with the default admin account as the owner if left empty or invalid.
        • You must have a valid ServiceNow group name or its related sys_id.
      • The new REACT UI is not yet supported for the ServiceNow Group integration.

DQ Security Metrics

There is a critical CVE CVE-2016-1000027 that shows up in the image scan due to Spring version. This is a false positive and should be added to the exception list of the customer scan tools. We don’t use HttpInvokerServiceExporter anywhere in the application and are not impacted by it.
Vulns over time
Criticals table

2022.06

Fixes / Enhancements

  • DQ Job
    • Fixed an issue with the Learning Phase in the Behavior feature. (ticket #82907)
      • Once CDQ has the minimum number of completed successful scans, the learning status now changes to PASSING or BREAKING based on the results.
  • Outliers
    • Fixed an issue where file lookback did not identify expected outliers. (#87967)
  • Alerts
    • When configuring email alerts, SMTP Username and SMTP password fields are still required fields. (#86033)
      • Validation relaxation is planned for the 2022.07 release.
  • Rules
    • Fixed an issue which caused rule breaks to report the opposite of what was defined when a Generic Rule utilizing regex/rlike was created. (#86977)
    • Fixed an issue where Data Classes with Date column types selected did not detect timestamps. (#83000)
    • Fixed an issue where Data Classes using the operators <, > or = caused the inverse rule created from this process to throw exceptions. (#83000)
    • When switching a data class from a regex to expression and then editing again, the regex checkbox is now correctly checked.
  • Agent
    • The Explorer page and Scheduler modal now display the same agents. (#86175)
  • Security
    • Vulnerabilities identified by Jfrog
      • Vulns 0, criticals 0, high severity 8
      • For a visual readout, see the DQ Security Metrics section below.
    • General advisory:
    • Major vulnerabilities related to Spring, ESAPI, and Swagger have been addressed.
    • No cross DB reference is allowed in explorer while accessing SQL database connections.
    • Sensitive UI fields such as username no longer allow autocomplete.
    • If configured, the ENV variable XSS_CANONICALIZE_INPUT_ENABLED should be removed from configmap or owl-env.sh.
    • When dataset security is turned on, you can now add role based authorization for editing existing datasets. (#87720)
    • You can now override the following mail settings from the App Config page within the Configuration section of the Admin Console:
      • "mail.transport.protocol" -- default = smtp
      • "mail.smtp.auth" -- default = true: If true, attempt to authenticate the user using the AUTH command
      • "mail.smtp.auth.login.disable" -- default = false: If true, prevents use of the AUTH LOGIN command
      • "mail.smtp.starttls.enable" -- default = true: If true, enables the use of the STARTTLS command (if supported by the server) to switch the connection to a TLS-protected connection before issuing any login commands.
      • "mail.smtp.ssl.enable" -- default = false: If set to true, use SSL to connect and use the SSL port by default. Defaults to false for the "smtp" protocol and true for the "smtps" protocol.
      • "mail.smtp.ehlo" -- default = true
      • "mail.debug" -- default = true
      • "mail.smtp.ssl.trust" -- default = : If set, and a socket factory hasn't been specified, enables use of a MailSSLSocketFactory. If set to "*", all hosts are trusted. If set to a whitespace separated list of hosts, those hosts are trusted. Otherwise, trust depends on the certificate the server presents. (#76775, 88089)
  • Profile
    • Mean value is now rounded appropriately within the Profile page.
      • For example: The value 2.4334334343345 is now rounded to 2.434.
  • Connections
    • From the Athena driver, you can now use MetadataRetrievalMethod=Query for database queries from the Connection URL. (#86139)
    • Fixed an issue where error messages on failed connections did not display informational text. (#85527)
    • Fixed an issue where NFS file connections under Remote File connections caused jobs to fail. (#88156)
      • Added File protocol for Spark load for NFS file system.
      • Added nfs:// prefix wile adding a NFS connection.
        • This will prepend the URI with the file:// protocol when an NFS file connection is loaded via Spark.
  • Catalog
    • The Graph option is no longer available in Quick links.
  • Admin
    • The Pendo integration is now active by default.
      • No sensitive information is collected; only high-level usage stats are collected.
      • All new customers starting with 2022.06 onward will receive a new license.
      • If you install a standalone environment, modify the <install-dir>/config/owl-env.sh file by adding your license name export DQ_INTEGRATION_PENDO_ACCOUNTID=<your-license-name>
      • This new integration will not block or impair the functionality of the app in any way.
      • For more information on Collibra's subprocessors, please review Collibra's Subprocessors page.
    • The Agent Group (H/A) and its associated endpoints are now deprecated. (#83086)
    • Fixed an issue where the "Add Data Category" button was missing without required permissions. (#86625)
    • When a session expires on an Admin page, you are now redirected to the login page.
    • The Admin Limits page now displays informational text indicating that only limits of Tenant - Admin type are displayed on the page.
    • Fixed an issue when editing an existing data category which caused the 'Add new' modal to open instead of the 'Edit' modal. (#89617)
    • From Configuration Settings, DB Limits is now called Data Retention Policy.
  • Explorer
    • You can now view calculated views for SAP Hana when creating a DQ Job on the Explorer page. (#83147, 84328)
    • Fixed an issue which caused the Date range condition to incorrectly display results when using an Oracle connection. (#85802)
    • Fixed an issue which threw an error message when Transform was checked with Date Range condition when using a Postgres connection. (#85802)
    • Fixed an issue where an equals sign = used in a -transform expression from Run CMD caused jobs to fail. (#71547)
    • Fixed an issue where schema and table names containing underscores _ were not accepted.
    • Fixed an issue that allowed jobs to run with a row limit of less than 1.
    • Fixed an issue where incorrect files loaded for preview from BLOB containers with Livy enabled.
    • CLOB data types are unsupported. (#86902)
    • Improved performance and logic when drilling into a database and schema from the Explorer page.
  • API
    • You can now access API quick links page from the Admin Console React page.
    • When using Swagger, UI text now indicates when a field is case sensitive.
  • Reporting
    • *Tech Preview* [TP] Rule Summary page enhancements
      • You can now filter rule breaks by most frequent violations, most severe violations, and least violations.
      • You can now view interactive pie charts with rules and dimension summaries.
  • UI
    • The styling of the expandable legacy navigation pane and the react menu are now updated.
  • Legal

Known Limitations

  • Validate Source
    • When comparing JDBC (target) to remote files such as S3 (source), there is a known Spark bug for "Recursive view detected".
      • This validate source combination is not possible in 2022.06 using Spark 3.2.
    • When using Bigquery as the source, the -libsrc needs to be manually modified to include the core (Spark Bigquery connector) directory.
      • For example, /home/centos/owl/drivers/bigquery**/core**
  • Profile
    • Spark does not currently support varchar data types. All varchar data types are converted to String. Other unsupported data types may also be converted incorrectly.
  • Security
    • Permissions on the Export task have not yet been addressed when dataset security is turned on and you add a role based authorization for editing existing datasets. (#87720)

DQ Security Metrics

There is a critical CVE CVE-2016-1000027 that shows up in the image scan due to Spring version. This is a false positive and should be added to the exception list of the customer scan tools. We don’t use HttpInvokerServiceExporter anywhere in the application and are not impacted by it.
Vulns over time
Criticals table

2022.05

Fixes / Enhancements

  • DQ Job
    • You can no longer update the dataset name (-ds) from the command line.
      • A helpful error message now appears if changes are made to -ds.
    • Stop Job action is no longer enabled for K8s.
    • Fixed an issue for Dremio jobs where jobs hang when editing or cloning an existing dataset.
  • Outliers
    • Added "username" to outlier boundary table to track who creates the boundary.
      • The Outlier boundary again saves correctly after the addition of a username.
    • Fixed an issue that caused jobs to fail when Day from By dropdown was selected.
  • Rules
    • Rules Preview drill-in capabilities are now improved:
      • You can now configure Preview Limits based on the individual rule.
        • Freeform and Simple rules are currently supported for the Preview Limit feature.
      • You can now set any positive number as the Rules Preview Limit.
        • When you update a Preview Limit value, you must re-run to apply the updated limit value.
      • On the DQ Job page, the details of an individual rule now displays a paginated sub-table of all the break records.
      • When a rule is labeled as BREAKING for rule types other than Freeform and SQL, UI text now displays, "Data preview records are only available for Freeform and Simple rules."
    • You can now hover over stat rules to see their conditions.
    • Data Concepts is renamed Data Categories.
    • Semantics is renamed Data Classes.
    • When a Data Class is assigned to a dataset via Profile controls, a rule is now created.
  • Security
    • Vulnerabilities identified by Jfrog
      • Vulns 0, criticals 0, high severity 9
      • For a visual readout, see the DQ Security Metrics section below.
    • The OS vulnerabilities from the images of Collibra DQ 2022.04 have been resolved by using the base image of RHEL8 to build the images for Collibra DQ 2022.05. The following OS utilities will not be available in the 2022.05 release images:
      • Unified, OpenSSL crypto/stack
      • Full YUM stack
      • OS tools, including tar, gzip, and vi
    • AD users can again use auth/signin REST API.
    • The Highcharts CVSS2: 9.3/CVSS3: 9.8 vulnerability is resolved.
    • The LOGJAM (CVE-2015-400) SSL/TLS vulnerability is resolved.
    • The SpringShell (CVE-2022-22965) vulnerability is resolved.
    • TLS < 1.2 is no longer supported.
    • When Azure AD SSO sends a groups.link assertion, the application now tries to resolve the groups via the link.
      • You can now activate this setting by using the property, SAML_GROUP_LINK_PROP.
  • Profile
    • You can now edit or delete semantics by clicking anywhere in the semantics cell of the Profile column table.
    • You can now save annotations with special characters.
      • Special characters that are not currently supported include percent sign %, backslash \, and caret ^.
    • Fixed an issue where columns of broken rules were not highlighted.
  • Connections
    • You can now view a list of all packaged and optionally packaged drivers on our new Builds page.
    • The Databricks JDBC driver is now available.
    • You can now add Databricks datasets using the Databricks Simba driver.
  • Catalog
    • Fixed an issue where the deletion of a dataset caused orphaned links to datasets in other areas of Collibra DQ.
  • Admin
    • *Tech Preview* [TP] You can now use the ServiceNow integration through a proxy server from the Assignment Queues screen.
    • You can now access the new Usage page to view monthly historical usage statistics.
    • AD users with Admin privileges can now add Business Units.
    • AD users with Admin privileges can now manage local users.
    • The Agent Groups (H/A) feature is marked for deprecation and will be removed from the app in the 2022.06 release.
  • Explorer
    • You can again edit schema and table name from the Catalog page.
    • You can now navigate to a specific behavior tab directly from the Assignments page.
    • Fixed an issue when viewing Schemas in View Data wizard.
  • Scorecard
    • Single-space `` , underscore _, and period . are now supported characters when saving Scorecard name.
  • API
    • Improved API calls for the UserManagement Save function.
  • Reporting
    • *Tech Preview* [TP] Rule Summary page enhancements
      • You can now filter rule breaks by a specified date range and view charts for Most Used Rule Types, Dataset with Most Rule, and Top Rules Run.

Known Limitations

Delta Files A bug was introduced as a result of removing CVEs in 2022.05. If you use Delta files -delta it is not advised to upgrade until an update is available.
  • Explorer
    • Except for underscore _, special characters are not currently supported in schema or table names.
  • Admin
    • *Tech Preview* [TP] ServiceNow integration
      • Only the local Docker container proxy has been tested and verified.
      • The Test Connection button's validating credentials capabilities is currently limited if the ServiceNow URL is valid.
      • The Validate All Rules function currently results in a failure.
      • You cannot edit an active ServiceNow assignment.
        • Invalidate/Validate or Resolve actions result in a failure.
      • You can assign a ServiceNow ticket with an embedded URL when escaped with double quotes.
        • No assignment is sent without this process.
  • Multi-Tenant
    • Tenant names should be lower case. Use lower case characters, when creating a tenant from the multi tenant admin page. The current limitation is around the schema that is generated
  • Reporting
    • *Tech Preview* Rule Summary page enhancements
      • Sorting any column returns an error.
      • User must use date picker as manual date entry is not honored.
      • The start and end date are out of order when navigating to the page.
      • The last page on the paginated list does not change when date criteria is updated.

DQ Security Metrics

Vulns over time
Criticals table

2022.04

Install

For standalone installations, within the setup.sh script find/replace the variable for spark_package.
Change spark-3.0.1-bin-hadoop3.2.tgz to spark-3.1.2-bin-hadoop3.2.tgz
spark_package=${SPARK_PACKAGE:-"spark-3.0.1-bin-hadoop3.2.tgz"}
# replace with
spark_package=${SPARK_PACKAGE:-"spark-3.1.2-bin-hadoop3.2.tgz"}

Fixes / Enhancements

  • DQ Job
    • Entering negative values for the downscore is no longer supported and will now produce an error message.
    • You can now invalidate schema with special characters.
    • Spark table names of historical dataset loaded and other spark tables are now available on Jobs Log table.
    • Long type values larger than Integer.Max no longer breaks the Profile.
    • View Findings now displays user's full name, if applicable, in Validate Modal. Assignment queue page also displays the full name of user, if applicable.
  • Alerts
    • You can once again use the Cancel action button on the Alerts page.
    • You can now set up alerts to reach multiple email recipients.
    • If email_server table is not yet configured, a helpful message will now display in the Description column in the job log directing you to register an email Server under Admin - Alerts. The job will still run successfully.
  • Rules
    • You can now modify Rules definitions from the primary DQ Job dashboard without loading the Rules page.
    • Mean value check once again triggers correctly for Integer and Long columns.
      • This fix triggers the mean value check for Integer and Long columns and shows an infinity percentage change in behavior for a period, depending on -bhlb. After this period, it should disappear.
    • For Native SQL rules, jobs now behave the same whether or not a semicolon ";" is included in the SQL query.
    • You can now use a hyphen "-" in a dataset name.
      • Acceptable special characters now include a hyphen "-", period ".", and underscore "_".
    • Added a tooltip that displays which condition is being checked in a DQ Job when using a Stat rule when you hover your cursor over a condition in the Condition column.
    • Improved the exception message for when there are no values for a specific column while using a Stat rule.
    • The WebUI passing boundaries range has been updated to ().
    • For Freeform rules, IS Null and IS NOT NULL no longer return invalid results in the Validation tab.
    • Added a pop-up success message for when the correct syntax rule passes for Freeform rules with secondary datasets after the Validate button is clicked.
  • Security
    • Vulnerabilities identified by Jfrog
      • Vulns 2, criticals 2, high vulnerabilities
      • For a visual readout, see the DQ Security Metrics section below.
    • Authorization restriction is now enforced for the following endpoints:
      • /v2/deletefiledir
      • /v2/getRunIdsByDataset
      • /v2/putDatasetWeight
      • /v2/checkListofFilesPath
      • /v2/getlistagents
      • /v2/checkDriver
      • /v2/getconnectionssensitive
      • /v2/getemailgroups
      • /v2/getemailserver
      • /v2/addemailgroup
      • /v2/validateEmailAddress
      • /v2/getlistoffiles
      • /v2/getlistoffilespath
      • /v2/getlistoffiles
      • /v2/getDriverDir
      • /v2/getlistrolesbydataset
      • /v2/getlistrolesbydistnctdatasets
      • /v2/getlistrolesbyfunctiontypename
      • /v2/getlistusersbyauthority
      • /v2/getlocalDBRoles
      • /v2/getsecuritysettingsbytype
      • /v2/getowlcheckinventory
      • /v2/getconnectionspwdmgrsensitive
      • /v2/getsecuritysettingsbycoltype
      • /v2/getdbuserlist
      • /v2/getdbuserdetailsbyuser
      • /v2/getexternaladgroupstointernalroles
      • /v2/getlistdatasets
      • /v2/getlistdatasetsbyrole
      • /v2/getaudittrailitems
      • /v2/get-all-audit
      • /v2/get-datasets-audit-trail-items
      • /v2/get-all-dataset-audit
      • /v2/getactivityaudit
      • /v2/getallactivityaudit
      • /v2/getlocaldbrolesbyuser
      • /v2/getdatasetaclsecurity
      • /v2/getexternaladgrouplist
      • /v2/getexternaladuserlist
      • /v2//external-service-configuration
    • Local user accounts now have an account lockout feature implemented with the following restrictions:
      • A user's account will be locked if a password is entered incorrectly more than 10 times (configurable via app config).
      • The locked account can only be unlocked by Admin user in user management screen.
      • If an Admin is locked, another Admin can unlock their account.
      • If all the Admins are locked, enable the account via DB (ubdate users table "accountNonLocked" colun to "1").
      • User cannot use forgot password to reset password while the account is locked.
    • CORS restriction is now enforced for SAML and multi-tenancy.
      • This breaks SAML unless the IDP is configured as a trusted origin in DQ, so the following property must be added to environment variables in order for DQ and SAML to work: CORS_ALLOWED_ORIGINS=${IDP-BASE-URL},${DQ-BASE-URL}
        • Replace ${IDP-BASE-URL} with the value of the actual IDP URL (For example: https://ping.auth.com)
        • Replace ${DQ-BASE-URL} with the value of the actual DQ Base URL (For example: https://dq-env.com)
    • SAML login no longer automatically triggers on the login page during an existing session when accessing DQ base URL. For SAML login, you should instead use /saml/login.
      • API requests (v2/v3) return proper JSON response in case of failures.
      • auth/signin API is updated to provide JWT token for MT & local users.
  • Profile
    • Mean value once again displays in the Volume column.
    • When connecting to MSSQL server on Windows from a Linux DQ environment, the connection no longer fails.
      • We recommend (not required) a TLS connection for MSSQL connections from a DQ Linux environment with a properly signed certificate setup on MSSQL server to connect only via TLS.
    • You can now edit annotations in the Labels tab.
  • S3
    • Added an enhancement for -addlib flag.
  • Connections
    • Added new Jconn4 driver for encrypted connections.
    • Tech Preview - You can now save a local (NFS) file directory as a connection type.
    • See our newest connections page for a definitive guide to driver support.
    • BigQuery is now certified for production, but removed from packaged install for K8s docker.
  • Explorer
    • When toggling between fullfile and Union LookBack options, -fullfile and -fllb flags can no longer be generated together in the DQ Job command line.
    • Data Preview for Temp files loading in Explorer now correctly shows the order of columns of the original Temp file.
    • You can now drill in and search files within the connection.
    • You can now browse multiple local (NFS) file connections.
  • Scorecard
    • You can now create scorecards with special characters "^[A-Za-z0-9]+quot; in their names.
  • Dupes
    • Added linkID column for exact match in both UI and REST API. linkID can now be either included or excluded from Dupes for exact match.
    • linkID is now shown at the aggregate level for Exact Match.
      • We recommend using this feature from a primary key perspective for its first iteration.
      • The aggregate function used is min().
        • For example: if you have 6 occurrences, you will get 1 example linkID, the min.
  • API
    • Updated the /v2/getlistdataschemapreviewdbtablebycols API call method from GET to POST to support the long query (-q) or very large columns table.
    • Added a new SAML load balancer so the syestem picks the appropriate schema and SAML server URL for Swagger.

Known Limitations

  • Profile
    • Special characters are not currently supported in annotations in the Label tab.
  • Scorecard
    • Space " ", underscore "_", and period "." are not yet supported for scorecard edit.

DQ Security Metrics

Vulnerabilities over time
Criticals table

2022.03

Fixes / Enhancements

  • DQ Job
    • The -validatevaluesshowmissingkeys options now allows the extrapolation of missing keys between target and source.
    • Newly created jobs will no longer be marked incorrectly with enclosing double quotes.
    • File names with spaces are now handled with double quotes within the application.
  • Alerts
    • Email notifications now have Collibra branding and terminology.
    • Fixed Cancel Action for Delete functionality on Alert page.
  • Outliers
    • Fixed the issue where Numerical Outlier drill in graph wasn't displaying when perChange is NaN.
  • Rules
    • Added additional HealthCare Data Classes to Rule Library.
    • Fixed input validation rule of POST - /v3/rules/ endpoints. The following validation rules have been applied to RuleDTO.ruleName field:
      • Maximum size is 100.
      • Must comply with the following regular expression: ^[a-zA-Z0-9_]+$
    • The rules on the Hoot page now show the correct exception data when expanded if there are two or more rules with exceptions attached to the dataset.
  • Security
    • Vulnerabilities identified by Jfrog
      • Vulns 0, critical, 6 high vulnerabilities
    • Password length has increased to a maximum of 72 characters.
    • Forgot password screen will now always show success message in UI regardless of success or failure.
    • Fixed an issue of a throwing error message when adding/editing user roles.
    • Added error checks if the password manager script throws any errors.