Collibra DQ User Guide
2022.10
Search
⌃K

Release Notes

We've moved! To improve customer experience, the Collibra Data Quality User Guide has moved to the Collibra Documentation Center as part of the Collibra Data Quality 2022.11 release. To ensure a seamless transition, dq-docs.collibra.com will remain accessible, but the DQ User Guide is now maintained exclusively in the Documentation Center.

2022.12 (Coming Soon)

2022.11

The MS SQL driver that comes with JDK11 standalone packages does not currently work in the JDK11 environment. MSSQL requires a separate JAR for JDK11. Please contact your Customer Success Manager for the compatible driver. Dremio is not currently supported for JDK11 standalone packages. If you plan to run JDK11, add -Dcdjd.io.netty.tryReflectionSetAccessible=true to owlmanage.sh as a JVM option for your Web and Spark instances. Please contact your Customer Success Manager for assistance. As of October 18, 2022, all images for the 2022.10 release have a Critical CVE (CVE-2022-42889). If you picked up the 2022.10 release before October 18, 2022, there should be no issue with your scans. If issues persist, please contact your Customer Success Manager for a new build.
After you complete an upgrade or a new installation of Collibra DQ, you are now required to enter a license name by following either a one-time prompt on the login page, entering theLICENSE_NAME environment variable in the environment variable file (owl-env.sh), or by entering the global.configMap.data.license_name Helm chart variable.

2022.10

New Features

For the Collibra Data Quality 2022.10 release, all Docker images run on JDK11. Standalone packages contain JDK8 and JDK11 options. If you are an existing customer who requires JDK11, please upgrade your runtime before upgrading to 2022.10. Most Hadoop environment versions (EMR/HDP/CDH) still run on JDK8, so customers using these environments can upgrade with the JDK8 packages. If you prefer to upgrade to JDK11, you must follow the documentation of your respective Hadoop environment to upgrade to JDK11 before deploying the 2022.10 release.
The MS SQL driver that comes with JDK11 standalone packages does not currently work in the JDK11 environment. MSSQL requires a seperate JAR for JDK11. Please contact your Customer Success Manager for the compatible driver. Dremio is not currently supported for JDK11 standalone packages. If you plan to run JDK11, add -Dcdjd.io.netty.tryReflectionSetAccessible=true to owlmanage.sh as a JVM option for your Web and Spark instances. Please contact your Customer Success Manager for assistance. As of October 18, 2022, all images for the 2022.10 release have a Critical CVE (CVE-2022-42889). If you picked up the 2022.10 release before October 18, 2022, there should be no issue with your scans. If issues persist, please contact your Customer Success Manager for a new build.

Rules

  • You can now define a rule to detect the number of days a job runs without data by using $daysWithoutData.
  • You can now define a rule to detect the number of days a job runs with 0 rows by using $runsWithoutData.
  • You can now define a rule to detect the number of days since a job last ran by using $daysSinceLastRun.

Profile

  • You can now use a string length feature by toggling the Profile String Length checkbox when you create a data set.
    • When Profile String Length is checked, the min/max length of a string column is saved to table dataset_field

Validate Source

  • You can now write rules against a loaded source data frame when -postclearcache is configured in the agent.
The DQ UI will be converted to the React MUI framework with the 2022.11 release. Prior to the 2022.11 release, you can turn the React flag on, but note that some features may be temporarily limited.

Enhancements

DQ Job

  • Start Time and Update Time are now based on the server time zone of the DQ Web App.

Scheduler

  • The Job Schedule page now has pagination.

Scorecards

  • From Pulse View, you can now view missing runs, runs with 0 rows, and runs with failed scores.

Admin/Catalog

  • Connection details are now masked when non-admin users attempt to view or modify database connection details from the Catalog page. Only users with role_admin or role_connection_manager have the ability to view connection details on this page. (ticket #94430)

API

  • The /v2/getRunIdDetailsByDataset endpoint now provides the following:
    • The RunIDs for a given data set.
    • All completed DQ Jobs for a given data set.

Snowflake Pushdown (beta)

  • You can now detect shapes that do not conform to a data field. Pushdown jobs scan all columns for shapes by default.
  • You can now view Histogram and Data Preview details for the Profile activity.

Connections

  • The Snowflake JDBC driver is now updated to 3.13.14.

Fixes

Rules

  • Fixed an issue with the Rule Validator that resulted in missing table errors. The Validator now correctly detects columns. (ticket #93430)

DQ Job

  • Fixed an issue that caused queries with joins to fail on the load activity when Full Profile Pushdown was enabled. Pushdown profiling now supports SQL joins. (ticket #92409)
  • Fixed an issue that caused jobs to fail at the load activity when using the CTE query. Please note that CTE support is currently limited to Postgres connections. (ticket #88287, 89150)
  • Fixed an issue that caused inconsistencies between the time zones represented in the Start Time and Update Time columns.

Behavior

  • Fixed an issue that caused jobs with Adaptive Rules to become stuck in Unknown status as a result of an unsupported Time metric for Adaptive Rules. (ticket #95936)

Agent

  • Fixed the loadBalancerSourceRanges for web and spark_history services in EKS environments. (ticket #95398)
    • The helm property global.ingress.* has been removed to separate the config for web and spark_history. Please update the property as follows: global.web.ingress.* global.spark_history.ingress.*
  • Added support to specify the inbound CIDRs for the Ingress using the property .global.web.service.loadBalancerSourceRanges. (ticket #95398)
    • Though Ingress is supported as part of Helm charts, we recommend attaching your own Ingress to the deployment if you need further customization.
    • This requires a new helm chart.
  • Fixed an issue that caused Livy file estimates to fail for GCS on K8s deployments.
  • Fixed an issue that caused jobs to fail for GCS on K8s deployments.

Validate Source

  • The Add Column Names feature is scheduled for removal with the upcoming 2022.11 release. (ticket #96066)
    • This was a previous functionality before being able to limit the query directly (srcq) and Update Scope was added.
    • Use the query to edit/limit columns and also use Update Scope.
  • Fixed an issue that caused the incorrect message to display for [VALUE_THRESHOLD] when validate source was specified for a matched case. (ticket #94435)

Dupes

  • The Advanced Filter is scheduled for removal from the Dupes page with the upcoming 2022.11 release. (ticket #96065)

Explorer

  • Fixed an issue that caused BigQuery connections to incorrectly update the library (-lib) path when a subset of columns was selected. (ticket #96768)

Scheduler

  • Fixed an issue that prevented the scheduler from running certain scheduled jobs in multi-tenancy setups. Email server information is now captured from the correct tenant. (ticket #92898)

Known Limitations

Rules

  • When a data set has 0 rows returned, stat rules applied to the data set are not executed. While a full fix is planned for a future release, this limitation is only partially fixed as of 2022.10.

DQ Job

  • CTE query support is currently limited to Postgres connections. DB2 and MSSQL are currently unsupported.

Catalog

  • When using the new bulk actions feature, updates to your job are not immediately visible in the UI. Once you apply a rule, run a DQ Job against that data set. From the Rules tab, a row with the newly applied rule is visible.

Snowflake Pushdown (beta)

  • Freeform (SQLF) rules cannot use a data set name but instead must use @dataset because Snowflake does not explicitly understand data set names.
  • When using the SQL Query workflow, selecting a subset of columns in your SQL query must be enclosed in double quotes to prevent the job from running infinitely and without failing.
  • Min/Max precision and scale are only calculated for double datatypes. All other datatypes are currently out of scope.

DQ Security Metrics

DQ security vulnerabilities over 5 months
Critical security vulnerabilities over 5 months

2022.09

Enhancements

Rules

  • The Conditions column on the Rules tab now displays SQLG and SQLF rule definitions on hover.

DQ Job

  • The Jobs chart now shows a dotted gray line to represent jobs in Submitted status.
  • The Jobs chart now supports an hourly view option.
  • When you run a Pushdown Job that has a data set that returns 0 rows, an unclear message displays.

Schema

  • From the Config tab in Explorer, a Check Header checkbox under DQ Job is now available for when column names contain special characters. The Check Header checkbox is checked by default.
    • When checked, schema findings do not display when detected.
    • When unchecked, schema findings display when detected.

Behavior

  • Mean values are now rounded on the Findings page.

Explorer

  • SOH delimiters for files are now supported.
  • The Only checkbox on all Build Layer tabs is now removed.
  • The Profile activity is now always enabled and no longer has an on/off switch.

Alerts

  • Only one email per alert is now sent when alerts are set up for a scheduled job.
  • You can now check the logs to see when an alert does not send in order to resend the email.

Scheduler

  • The findings page now displays a green indicator next to the Schedule icon when you schedule a job to run automatically. If Scheduler is inactive, a red indicator displays.

API

  • The v2/gethoot API now properly returns rule dimension information for data sets. (ticket #89973)

Connections

  • The Databricks connection template has changed, due to an upgrade of the driver. Any existing connection that uses the old driver must be updated. Refer to the new template. (ticket #19950)
  • The drivers for Athena, BigQuery, MongoDB, GCS, Hive/Impala were also upgraded but no connection change is required.

Spark

  • The 2022.09 release uses Spark 3.2.2.
We recommend using Spark 3.x for standalone installs/upgrades.

Fixes

Explorer

  • Fixed an issue that prevented the Job Estimator from properly displaying row estimates when the run date was modified during a new job run. (ticket #90860)
  • Fixed an issue that prevented DQ jobs created using NFS connection types from displaying under the Remote File Connections dropdown. (ticket #92479)
  • Fixed an issue that caused the file type parser to throw an error message when the default comma delimiter was not detected. The parser now detects a file's delimiter and updates the delimiter type in the UI automatically. (ticket #89489, 92480)

Files

  • The error message for Failed Merging Schema now has extra logging to clarify the cause of failed schema merges for both Livy sessions and non-Livy paths. (ticket #92694)

Security

  • Fixed an issue with the v2/getcatalogtableshasrulesfromcxn API that triggered a 403 status code when Dataset Security was enabled. (ticket #93298, 94258)

Agent

  • Fixed an issue that caused the Agent Check to no longer attempt check-ins to the metastore on K8s deployments, which resulted in red (unhealthy) status. (ticket #92055, 92963)
  • Fixed an issue that prevented concurrent users from properly running Livy sessions. (ticket #92963, 90432)

Known Limitations

Rules

  • The Rule Builder page becomes unusable if the user creates, validates, saves a new rule and then re-edits.
    • The workaround for this limitation is to do a full page refresh.
  • When a user attempts to validate a rule that contains a stat, an exception error is returned.

Security

  • The Assignments Queue feature is only available for local users. Support for externally connected users, such as SAML and AD connector, is not currently available.

Alerts

  • When alert recipient email addresses are separated by semicolons ;, alerts emails are not sent to the intended recipients.
    • A workaround for this limitation is to separate alert recipient email addresses with commas , instead of semicolons.

Snowflake Pushdown

  • When a Job is run, which has a data set that returns 0 rows, an unclear message displays.
  • When a native rule is created that contains an embedded stat, its calculated value will not display on the Job results page.
  • Data Set security is not supported.
  • Disabling autometrics will not take effect, therefore, all autometrics are executed.
  • Creating a DQ job using only "SQL Query" workflow doesn't allow you to set the rundate value.

DQ Security Metrics

DQ security vulnerabilities over 5 months
Critical security vulnerabilities over 5 months

2022.08

New Features

Rules

Enhancements

Connections

  • You can now authenticate Oracle JDBC connections with Kerberos TGT, Keytab, and Password. (tickets #75267, 76030)
  • You can now authenticate SQL Server JDBC connections with Kerberos Keytab in addition to basic authentication.

Rules

  • Rule Summary enhancements:
    • You can now select different time periods for analysis.
    • You can now view charts from three different pages, including Rule Detail Summary, Rule Breaks, and Rule Dimension Summary.

Security

  • Vulnerabilities identified by Jfrog
    • Vulns 0, criticals 0, high severity 7
    • The majority of the current mediums are due to merging the dq-streaming module into core.
    • For a visual readout, see the DQ Security Metrics section below.

Agent

  • You can now optionally configure individual time zones of DQ Job, Web, and Agent. You should only use this configuration when your instance and containers run in different system time zones. (tickets #87024, 87155)

Behavior

  • The Behavior tab now has a new column, Delta Percent Change (Δ % Change).
  • You can now hover over new tooltips in the following columns:
    • Baseline
    • % Change
    • Δ % Change
    • Zscore
    • Score

Outliers

  • Outlier checks are now optimized to skip in certain circumstances. Outlier checks are only skipped when the history load of a specified date column is empty.
  • You can now update and modify record flags from the command line with -rc, -rcKeys, -rcDateCol, and -rcTbin.

API

  • The v2/gethoot API now properly returns rule dimension information for data sets.
  • The v3/jobs/run API now has improvements to the 400 Bad Request error messages in specific circumstances.

Reports

  • The PDF option is now removed from the Data Set Finding page. To print dynamic column tables, use CSV or Excel options instead. (ticket #89739)

DQ Connector

  • The version of Collibra Integration Library is now updated to 2.4.12.

Fixes

Connections

  • The new GCS jars are required to use GCS spark-history-server. (ticket #90623)

DQ Job

  • Fixed an issue that caused jobs using .TXT files to incorrectly render custom column names. (ticket #81808)
    • Files with .TXT extensions are now treated as delimited files. Files with .TXT extensions that are not delimited files should use their respective file type from the file type dropdown.
  • Fixed an issue with deployments on K8s where jobs failed when the volume name exceeded 63 characters. (ticket #85372)

Agent

  • Fixed an issue that caused the v2/updateagent API to fail when numCores was empty. (tickets #89737, 92404, 92680)
    • The numCores field is no longer a required field.

Validate Source

  • Fixed an issue that caused validate source jobs to fail when the pkey was mapped to different column names. (ticket #88778)

Rules

  • When using Freeform SQL rules with wild-card operators, rules again correctly pass validation. (ticket #89644)
  • Fixed an issue with regex rules that use the characters ), , , and ; in the rlike, which caused DQ to append spaces to those characters and prevented the regex from operating correctly. (tickets #89417, 92958)
  • Fixed an issue that caused rules with column values containing parentheses ( ) to break due to the addition of padding before and after closing parentheses. (ticket #85176)
  • Fixed an issue that caused rules with special characters such as @ to display incorrectly on the Rules page, Conditions tab, and when exported to Excel.
  • Fixed an issue that prevented data sets with attached rules and roles from being renamed. (tickets #85731, 92059, 94315)

Profile

  • Fixed an issue where certain results in TopN Values and Data Preview displayed in scientific notation. Scientific notation is now removed from the display. (tickets #82163, 89738)

Explorer

  • Fixed an issue that allowed CLOB data types to be visible in the Drag Columns to Target map in the Source tab. (ticket #86902)

API

  • The REST API endpoint v2/updateRoleDatasets again correctly saves roles to data sets.

Known Limitations

Rules

  • The Findings page displays results from computational stat rules on mean as a single-quote string. For example, '573523.87' > 6763
  • Column-level sorting for the Rule Summary feature is not currently available.

Admin

  • When adding a Sensitive Label or a Data Category, the Edit and Update functions do not display the selected record. To properly display the record, you must first refresh the page before editing or updating.

Session Activity

  • While the application UI is being redesigned, it is possible that when the application times out on the legacy side of the application, you might not be able to see it on the new React MUI side. This can happen when you have the DQ application open on multiple tabs.
    • We are not currently tracking session timeout from the legacy UI to React.

Beta features

DQ Job

  • Collibra is proud to launch a brand new feature, Snowflake Pushdown. Snowflake Pushdown allows for even faster processing and removes the need to set up a separate Spark compute platform to run Collibra Data Quality. Snowflake Pushdown is a private beta feature only available by request. Since this is a beta feature, some limitations are expected as we continue to improve its functionality. Contact your CSM to learn more about this feature.

DQ Security Metrics

There is a critical CVE CVE-2016-1000027 that shows up in the image scan due to Spring version. This is a false positive and should be added to the exception list of the customer scan tools. We don’t use HttpInvokerServiceExporter anywhere in the application and are not impacted by it.
DQ security vulnerabilities over 5 months
Critical security vulnerabilities over 5 months

2022.07

Standalone packages for the 2022.07 release have a version naming convention of -RC. This will revert back to the standard naming convention with the 2022.08 release, and has no impact on the safety or stability of standalone packages.

Fixes / Enhancements

  • DQ Job
    • Fixed an issue that prevented data from appearing in the Source tab when Source Observation RunID was clicked from the Assignments page.
    • Fixed an issue that caused Annotations with special characters to be truncated in the Labels tab.
    • Fixed an issue that caused the Column (name) column of the Rules tab to display incorrectly when Run Discovery was used.
    • Fixed an issue where the Retrain button on the Record tab was disabled.
    • You can again invalidate observations with single quotes ' from the Shapes tab.
    • The Hints tab now displays any available data.
    • You can no longer change agents from the Scheduler modal.
  • Rules
    • SQLF is now supported for Generic rules.
    • When running a custom rule through Rule Discovery, the column names Repo and Column again display correctly.
  • Alerts
    • You can now send emails using unauthenticated SMTP servers.
  • Security
    • Vulnerabilities identified by Jfrog
      • Vulns 0, criticals 0, high severity 7
      • For a visual readout, see the DQ Security Metrics section below.
    • Fixed an issue that allowed jobs to be run from the command line regardless of connection permissions.
      • When Connection Security is enabled, lock the SQL Editor to prevent unauthorized access to other connections. (#87916)
    • Fixed an issue that allowed View Only users to access some profile results and export the data to a CSV file.
      • Added an authorization check for data set access to the profile export feature, which allows only users with data set access to export the profile. (#87720)
    • Backslashes \ are no longer supported characters for AD usernames without disabling XSS for the /v2/updateadsecurityconfiguration API. (#88499)
    • Fixed an issue that prevented navigation back to the log in page when tenant access was denied. (#89024)
  • Profile
    • From the Labels tab, backslashes are now stripped from annotations when they are used for separation within strings.
  • Admin
    • From Audit Trail, when administrators modify roles mapped to data sets or data sets mapped to roles, changes are now documented automatically, and display original and updated values.
    • The Agent Group (H/A) and its associated endpoints are now deprecated.
    • From Usage, you can now access a table and tiles reflective of your monthly usage metrics.
    • Salesforce account ID can now be configured for use with Pendo logs.
    • *Tech Preview* [TP] ServiceNow integration
      • You can now assign incidents (validate action) to ServiceNow groups and users with the following fields included in the same request: caller_id, description, short_description, cmdb_ci.
  • Explorer
    • Fixed an issue with date range on Oracle connections, which caused end date to change to start date when Transform was selected.
    • The Job Estimate modal again displays the correct number of rows for Sybase connections.
    • Fixed an issue with Source to Target where double quotes " were removed from the source file in database to file targets.
  • Scorecards
    • Enhanced the layout of the Assignment Queues page.
  • API
    • v2/getallscheduledjobs is now available as an enhancement of the original, v2getscheduledjobs.
      • A UI integration is planned for a future release.
  • Schedule
    • Added an Active column to the scheduler export.
      • The RunJob column was removed. (#88799)
  • Reporting
    • Fixed an issue that created misalignment of column headers in PDF exports. (#89739)

Known Limitations

  • Rules
    • To use the new SQLF feature for Generic rules, you must manually update the Generic rule type from SQLG to SQLF.
      • A UI feature for this is planned for a future release.
    • Stat rules such as $rowCount do not work for secondary data sets or previous runId of the same data set via @t1 syntax.
      • To work around this limitation, run a subquery to select count(*) from the secondary data set or the previous runId.
  • Explorer
    • Drill-ins and jobs on Sybase connections run successfully, but connections to Sybase with encrypted passwords are currently unsupported.
  • Files
    • When using CSV files, you cannot use a comma , in the name.
  • Admin
    • *Tech Preview* [TP] ServiceNow integration
      • Special characters [email protected]#$%^&*()in the description are not supported and will not persist to the ServiceNow assignment queue at this time.
      • Empty or invalid ServiceNow group name does not return an error in CDQ.
        • As a result, the ServiceNow assignment is generated with the default admin account as the owner if left empty or invalid.
        • You must have a valid ServiceNow group name or its related sys_id.
      • The new REACT UI is not yet supported for the ServiceNow Group integration.

DQ Security Metrics

There is a critical CVE CVE-2016-1000027 that shows up in the image scan due to Spring version. This is a false positive and should be added to the exception list of the customer scan tools. We don’t use HttpInvokerServiceExporter anywhere in the application and are not impacted by it.
Vulns over time
Criticals table

2022.06

Fixes / Enhancements

  • DQ Job
    • Fixed an issue with the Learning Phase in the Behavior feature. (ticket #82907)
      • Once CDQ has the minimum number of completed successful scans, the learning status now changes to PASSING or BREAKING based on the results.
  • Outliers
    • Fixed an issue where file lookback did not identify expected outliers. (#87967)
  • Alerts
    • When configuring email alerts, SMTP Username and SMTP password fields are still required fields. (#86033)
      • Validation relaxation is planned for the 2022.07 release.
  • Rules
    • Fixed an issue which caused rule breaks to report the opposite of what was defined when a Generic Rule utilizing regex/rlike was created. (#86977)
    • Fixed an issue where Data Classes with Date column types selected did not detect timestamps. (#83000)
    • Fixed an issue where Data Classes using the operators <, > or = caused the inverse rule created from this process to throw exceptions. (#83000)
    • When switching a data class from a regex to expression and then editing again, the regex checkbox is now correctly checked.
  • Agent
    • The Explorer page and Scheduler modal now display the same agents. (#86175)
  • Security
    • Vulnerabilities identified by Jfrog
      • Vulns 0, criticals 0, high severity 8
      • For a visual readout, see the DQ Security Metrics section below.
    • General advisory:
    • Major vulnerabilities related to Spring, ESAPI, and Swagger have been addressed.
    • No cross DB reference is allowed in explorer while accessing SQL database connections.
    • Sensitive UI fields such as username no longer allow autocomplete.
    • If configured, the ENV variable XSS_CANONICALIZE_INPUT_ENABLED should be removed from configmap or owl-env.sh.
    • When dataset security is turned on, you can now add role based authorization for editing existing datasets. (#87720)
    • You can now override the following mail settings from the App Config page within the Configuration section of the Admin Console:
      • "mail.transport.protocol" -- default = smtp
      • "mail.smtp.auth" -- default = true: If true, attempt to authenticate the user using the AUTH command
      • "mail.smtp.auth.login.disable" -- default = false: If true, prevents use of the AUTH LOGIN command
      • "mail.smtp.starttls.enable" -- default = true: If true, enables the use of the STARTTLS command (if supported by the server) to switch the connection to a TLS-protected connection before issuing any login commands.
      • "mail.smtp.ssl.enable" -- default = false: If set to true, use SSL to connect and use the SSL port by default. Defaults to false for the "smtp" protocol and true for the "smtps" protocol.
      • "mail.smtp.ehlo" -- default = true
      • "mail.debug" -- default = true
      • "mail.smtp.ssl.trust" -- default = : If set, and a socket factory hasn't been specified, enables use of a MailSSLSocketFactory. If set to "*", all hosts are trusted. If set to a whitespace separated list of hosts, those hosts are trusted. Otherwise, trust depends on the certificate the server presents. (#76775, 88089)
  • Profile
    • Mean value is now rounded appropriately within the Profile page.
      • For example: The value 2.4334334343345 is now rounded to 2.434.
  • Connections
    • From the Athena driver, you can now use MetadataRetrievalMethod=Query for database queries from the Connection URL. (#86139)
    • Fixed an issue where error messages on failed connections did not display informational text. (#85527)
    • Fixed an issue where NFS file connections under Remote File connections caused jobs to fail. (#88156)
      • Added File protocol for Spark load for NFS file system.
      • Added nfs:// prefix wile adding a NFS connection.
        • This will prepend the URI with the file:// protocol when an NFS file connection is loaded via Spark.
  • Catalog
    • The Graph option is no longer available in Quick links.
  • Admin
    • The Pendo integration is now active by default.
      • No sensitive information is collected; only high-level usage stats are collected.
      • All new customers starting with 2022.06 onward will receive a new license.
      • If you install a standalone environment, modify the <install-dir>/config/owl-env.sh file by adding your license name export DQ_INTEGRATION_PENDO_ACCOUNTID=<your-license-name>
      • This new integration will not block or impair the functionality of the app in any way.
      • For more information on Collibra's subprocessors, please review Collibra's Subprocessors page.
    • The Agent Group (H/A) and its associated endpoints are now deprecated. (#83086)
    • Fixed an issue where the "Add Data Category" button was missing without required permissions. (#86625)
    • When a session expires on an Admin page, you are now redirected to the login page.
    • The Admin Limits page now displays informational text indicating that only limits of Tenant - Admin type are displayed on the page.
    • Fixed an issue when editing an existing data category which caused the 'Add new' modal to open instead of the 'Edit' modal. (#89617)
    • From Configuration Settings, DB Limits is now called Data Retention Policy.
  • Explorer
    • You can now view calculated views for SAP Hana when creating a DQ Job on the Explorer page. (#83147, 84328)
    • Fixed an issue which caused the Date range condition to incorrectly display results when using an Oracle connection. (#85802)
    • Fixed an issue which threw an error message when Transform was checked with Date Range condition when using a Postgres connection. (#85802)
    • Fixed an issue where an equals sign = used in a -transform expression from Run CMD caused jobs to fail. (#71547)
    • Fixed an issue where schema and table names containing underscores _ were not accepted.
    • Fixed an issue that allowed jobs to run with a row limit of less than 1.
    • Fixed an issue where incorrect files loaded for preview from BLOB containers with Livy enabled.
    • CLOB data types are unsupported. (#86902)
    • Improved performance and logic when drilling into a database and schema from the Explorer page.
  • API
    • You can now access API quick links page from the Admin Console React page.
    • When using Swagger, UI text now indicates when a field is case sensitive.
  • Reporting
    • *Tech Preview* [TP] Rule Summary page enhancements
      • You can now filter rule breaks by most frequent violations, most severe violations, and least violations.
      • You can now view interactive pie charts with rules and dimension summaries.
  • UI
    • The styling of the expandable legacy navigation pane and the react menu are now updated.
  • Legal

Known Limitations

  • Validate Source
    • When comparing JDBC (target) to remote files such as S3 (source), there is a known Spark bug for "Recursive view detected".
      • This validate source combination is not possible in 2022.06 using Spark 3.2.
    • When using Bigquery as the source, the -libsrc needs to be manually modified to include the core (Spark Bigquery connector) directory.
      • For example, /home/centos/owl/drivers/bigquery**/core**
  • Profile
    • Spark does not currently support varchar data types. All varchar data types are converted to String. Other unsupported data types may also be converted incorrectly.
  • Security
    • Permissions on the Export task have not yet been addressed when dataset security is turned on and you add a role based authorization for editing existing datasets. (#87720)

DQ Security Metrics

There is a critical CVE CVE-2016-1000027 that shows up in the image scan due to Spring version. This is a false positive and should be added to the exception list of the customer scan tools. We don’t use HttpInvokerServiceExporter anywhere in the application and are not impacted by it.
Vulns over time
Criticals table

2022.05

Fixes / Enhancements

  • DQ Job
    • You can no longer update the dataset name (-ds) from the command line.
      • A helpful error message now appears if changes are made to -ds.
    • Stop Job action is no longer enabled for K8s.
    • Fixed an issue for Dremio jobs where jobs hang when editing or cloning an existing dataset.
  • Outliers
    • Added "username" to outlier boundary table to track who creates the boundary.
      • The Outlier boundary again saves correctly after the addition of a username.
    • Fixed an issue that caused jobs to fail when Day from By dropdown was selected.
  • Rules
    • Rules Preview drill-in capabilities are now improved:
      • You can now configure Preview Limits based on the individual rule.
        • Freeform and Simple rules are currently supported for the Preview Limit feature.
      • You can now set any positive number as the Rules Preview Limit.
        • When you update a Preview Limit value, you must re-run to apply the updated limit value.
      • On the DQ Job page, the details of an individual rule now displays a paginated sub-table of all the break records.
      • When a rule is labeled as BREAKING for rule types other than Freeform and SQL, UI text now displays, "Data preview records are only available for Freeform and Simple rules."
    • You can now hover over stat rules to see their conditions.
    • Data Concepts is renamed Data Categories.
    • Semantics is renamed Data Classes.
    • When a Data Class is assigned to a dataset via Profile controls, a rule is now created.
  • Security
    • Vulnerabilities identified by Jfrog
      • Vulns 0, criticals 0, high severity 9
      • For a visual readout, see the DQ Security Metrics section below.
    • The OS vulnerabilities from the images of Collibra DQ 2022.04 have been resolved by using the base image of RHEL8 to build the images for Collibra DQ 2022.05. The following OS utilities will not be available in the 2022.05 release images:
      • Unified, OpenSSL crypto/stack
      • Full YUM stack
      • OS tools, including tar, gzip, and vi
    • AD users can again use auth/signin REST API.
    • The Highcharts CVSS2: 9.3/CVSS3: 9.8 vulnerability is resolved.
    • The LOGJAM (CVE-2015-400) SSL/TLS vulnerability is resolved.
    • The SpringShell (CVE-2022-22965) vulnerability is resolved.
    • TLS < 1.2 is no longer supported.
    • When Azure AD SSO sends a groups.link assertion, the application now tries to resolve the groups via the link.
      • You can now activate this setting by using the property, SAML_GROUP_LINK_PROP.
  • Profile
    • You can now edit or delete semantics by clicking anywhere in the semantics cell of the Profile column table.
    • You can now save annotations with special characters.
      • Special characters that are not currently supported include percent sign %, backslash \, and caret ^.
    • Fixed an issue where columns of broken rules were not highlighted.
  • Connections
    • You can now view a list of all packaged and optionally packaged drivers on our new Builds page.
    • The Databricks JDBC driver is now available.
    • You can now add Databricks datasets using the Databricks Simba driver.
  • Catalog
    • Fixed an issue where the deletion of a dataset caused orphaned links to datasets in other areas of Collibra DQ.
  • Admin
    • *Tech Preview* [TP] You can now use the ServiceNow integration through a proxy server from the Assignment Queues screen.
    • You can now access the new Usage page to view monthly historical usage statistics.
    • AD users with Admin privileges can now add Business Units.
    • AD users with Admin privileges can now manage local users.
    • The Agent Groups (H/A) feature is marked for deprecation and will be removed from the app in the 2022.06 release.
  • Explorer
    • You can again edit schema and table name from the Catalog page.
    • You can now navigate to a specific behavior tab directly from the Assignments page.
    • Fixed an issue when viewing Schemas in View Data wizard.
  • Scorecard
    • Single-space `` , underscore _, and period . are now supported characters when saving Scorecard name.
  • API
    • Improved API calls for the UserManagement Save function.
  • Reporting
    • *Tech Preview* [TP] Rule Summary page enhancements
      • You can now filter rule breaks by a specified date range and view charts for Most Used Rule Types, Dataset with Most Rule, and Top Rules Run.

Known Limitations

Delta Files A bug was introduced as a result of removing CVEs in 2022.05. If you use Delta files -delta it is not advised to upgrade until an update is available.