Sample Header Ad - 728x90

How to group by with similar group_name in sql

2 votes
1 answer
57 views
How can I perform a GROUP BY in *SQL* when the group_name values are similar but not exactly the same? In my dataset, the group_name values may differ slightly (e.g., "Apple Inc.", "AAPL", "Apple"), but conceptually they refer to the same entity. The similarity might not be obvious or consistent, so I might need to define a custom rule or function like is_similar() to cluster them. For simple cases, I can extract a common pattern using regex or string functions (e.g., strip suffixes, lowercase, take prefixes). But how should I handle more complex scenarios, like fuzzy or semantic similarity? Case: group_name | val ---------------|----- 'Apple Inc.' | 100 'AAPL' | 50 'Apple' | 30 'Microsoft' | 80 'MSFT' | 70 What I want to achieve: new_group_name | total_val ----------------|---------- 'Apple' | 180 'Microsoft' | 150 What are the best approaches to achieve this in *SQL*? And how would I write a query like this: SELECT some_characteristic(group_name) AS new_group_name, SUM(val) FROM tb1 GROUP BY new_group_name;
Asked by Ahamad (1 rep)
May 14, 2025, 08:59 AM
Last activity: May 15, 2025, 05:31 AM