Duplicates (advanced)
This is an advanced opt-in feature

General Ledger. Accounting use-case

https://owl-analytics.com/general-ledger
owl-analytics.com
Whether you're looking for a fuzzy matching percent or single client cleanup, Owl's duplicate detection can help you sort and rank the likelihood of duplicate data.
1
-f "file:///home/ec2-user/single_customer.csv" \
2
-d "," \
3
-ds customers \
4
-rd 2018-01-08 \
5
-dupe \
6
-dupenocase \
7
-depth 4
Copied!

User Table has duplicate user entry

Carrisa Rimmer vs Carrissa Rimer

ATM customer data with only a 88% match

As you can see below, less than a 90% match in most cases is a false positive. Each dataset is a bit different, but in many cases you should tune your duplicates to roughly a 90+% match for interesting findings.

Simple DataFrame Example

Last modified 20d ago