Sample Header Ad - 728x90

How to fuzzy query a directory structure in PostgreSQL?

0 votes
1 answer
27 views
I was able to put together a rough idea for a PostgreSQL query to query over a node table, which contains id, parent__id, slug, and file_url (optional). It's considered a file if it has file_url, otherwise it's a directory. -- Enable the pg_trgm extension for fuzzy matching CREATE EXTENSION IF NOT EXISTS pg_trgm; WITH RECURSIVE path_cte AS ( -- Base case: Start with root nodes SELECT id, slug, parent__id, slug AS path FROM node WHERE parent__id IS NULL -- Start from root nodes UNION ALL -- Recursive case: find children using similarity matching with parent path SELECT n.id, n.slug, n.parent__id, CONCAT(pc.path, '/', n.slug) AS path FROM node n JOIN path_cte pc ON n.parent__id = pc.id AND similarity(n.slug, pc.slug) > 0.3 -- Adjust the threshold as needed ) -- Select the final paths for each node SELECT id, path FROM path_cte ORDER BY path; This is where the AI and I got. However, we duly note: > Keep in mind that the similarity condition might introduce cases where nodes do not match their parent due to the similarity threshold. If no similar match is found at a given level, recursion will stop for that branch. In VSCode, I can search like this: enter image description here Or even: enter image description here How can I replicate that in PostgreSQL? Given my hierarchical nodes table: CREATE TABLE nodes ( id SERIAL PRIMARY KEY, -- Unique identifier for each node slug VARCHAR(255) NOT NULL, -- Name or slug of the node parent__id INT REFERENCES nodes(id) ON DELETE CASCADE, -- Parent node ID, referencing the same table file_url VARCHAR(255) -- Optional URL for file associated with the node ); Or if that is not a good table structure, then what is a good one to be able to search like this in PostgreSQL? Ideally a user can pass in a path like a Unix file path, and perhaps it splits it at the / slashes (or not), and then does a fuzzy search on each segment relative to the parent, recursively, without finding things that don't match, against a PostgreSQL schema. Can it be done? If so, how? If not, where does it become impossible, and what is possible close to this? Otherwise, this is what I'm doing in JS: export async function search({ searchTerm }: { searchTerm: string }) { return await db .selectFrom('image') .select('path') .where( sql`similarity(path, ${sql.lit( searchTerm, )}::text) > ${sql.lit(similarityThreshold)}::float`, ) .orderBy( sqlsimilarity(path, ${sql.lit(searchTerm)}::text), 'desc', ) .execute() } Doesn't seem like that will cut it.
Asked by Lance Pollard (221 rep)
Nov 5, 2024, 01:32 PM
Last activity: Nov 5, 2024, 03:29 PM