How to group by with similar group_name in sql

2 votes

1 answer

57 views

                          How can I perform a 

     GROUP BY

 in *SQL* when the group_name values are similar but not exactly the same?

In my dataset, the group_name values may differ slightly (e.g., "Apple Inc.", "AAPL", "Apple"), but conceptually they refer to the same entity. The similarity might not be obvious or consistent, so I might need to define a custom rule or function like is_similar() to cluster them.

For simple cases, I can extract a common pattern using regex or string functions (e.g., strip suffixes, lowercase, take prefixes). But how should I handle more complex scenarios, like fuzzy or semantic similarity?

Case: 

group_name     | val
---------------|-----
'Apple Inc.'   | 100
'AAPL'         | 50
'Apple'        | 30
'Microsoft'    | 80
'MSFT'         | 70

What I want to achieve: 

new_group_name | total_val
----------------|----------
'Apple'         | 180
'Microsoft'     | 150

What are the best approaches to achieve this in *SQL*?
And how would I write a query like this:

    SELECT some_characteristic(group_name) AS new_group_name,
           SUM(val)
    FROM tb1
    GROUP BY new_group_name;

Asked by Ahamad (1 rep)

May 14, 2025, 08:59 AM
Last activity: May 15, 2025, 05:31 AM

How to group by with similar group_name in sql

Related Questions