-
Notifications
You must be signed in to change notification settings - Fork 468
Add vertex and edge composite types with direct field access optimization #2303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request introduces vertex and edge as PostgreSQL composite types to optimize graph query performance. The changes enable direct field access for properties and accessor functions instead of rebuilding agtype structures.
Changes:
- Introduces vertex (id, label, properties) and edge (id, label, start_id, end_id, properties) composite types
- Optimizes property access to use direct field selection instead of agtype rebuild
- Changes _label_name to return agtype instead of cstring for type consistency
- Adds implicit casts from vertex/edge to agtype and json
- Fixes MERGE clause type mismatches with volatile wrapper
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/include/utils/agtype.h | Added OID accessors for VERTEXOID and EDGEOID composite types |
| src/include/parser/cypher_expr.h | Added helper functions for entity type checking and field extraction |
| src/include/parser/cypher_clause.h | Added field access helper functions for composite types |
| src/include/catalog/ag_label.h | Added get_label_name function signature |
| src/backend/utils/adt/agtype.c | Implemented composite type OID caching, vertex/edge to agtype casts, label parameter type change |
| src/backend/parser/cypher_expr.c | Implemented accessor function optimization and entity type coercion |
| src/backend/parser/cypher_clause.c | Updated entity expression builders to use RowExpr, added field selection helpers |
| src/backend/executor/*.c | Updated executors to use agtype labels instead of cstrings |
| src/backend/catalog/ag_label.c | Changed _label_name to return agtype, added cache-based get_label_name |
| sql/agtype_graphid.sql | Defined vertex/edge composite types and cast functions |
| sql/age_scalar.sql | Added _get_vertex_by_graphid helper function |
| sql/age_main.sql | Removed duplicate _label_name definition |
| regress/sql/*.sql | Added comprehensive tests for accessor optimizations and composite types |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4dab54d to
9f4904e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/backend/utils/adt/agtype.c:2394
- Potential memory leak: pnstrdup allocates memory for label at line 2377, but if an error occurs after this point (e.g., at line 2391 when properties validation fails), the allocated memory for label is not freed before the error is thrown. Consider using a PG_TRY/PG_CATCH block or allocating in the current memory context that will be cleaned up on error.
label = pnstrdup(label_value->val.string.val, label_value->val.string.len);
if (fcinfo->args[2].isnull)
{
bstate = init_agtype_build_state(0, AGT_FOBJECT);
properties = build_agtype(bstate);
pfree_agtype_build_state(bstate);
}
else
{
properties = AG_GET_ARG_AGTYPE_P(2);
if (!AGT_ROOT_IS_OBJECT(properties))
{
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("_agtype_build_vertex() properties argument must be an object")));
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /* For startNode/endNode functions on edges, look up the vertex */ | ||
| if (is_node_func && entity_type == EDGEOID) | ||
| { | ||
| Oid func_oid; | ||
| Node *graph_name; | ||
|
|
||
| graph_name = (Node *) makeConst(TEXTOID, -1, InvalidOid, -1, | ||
| CStringGetTextDatum(cpstate->graph_name), | ||
| false, false); | ||
|
|
||
| /* Call _get_vertex_by_graphid(graph_name, vertex_id) */ | ||
| func_oid = get_ag_func_oid("_get_vertex_by_graphid", 2, TEXTOID, GRAPHIDOID); | ||
| result = (Node *)makeFuncExpr(func_oid, AGTYPEOID, | ||
| list_make2(graph_name, field), | ||
| InvalidOid, InvalidOid, | ||
| COERCE_EXPLICIT_CALL); | ||
| } | ||
| else | ||
| { | ||
| result = field; | ||
| } |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition at line 2176 checks for is_node_func AND EDGEOID, but startNode/endNode can only be called on edges, never on vertices. When entity_type is VERTEXOID and is_node_func is true, neither the if block (line 2176) nor the else block (line 2192) is entered, so result remains uninitialized. The logic should handle VERTEXOID with is_node_func as an error case.
| STABLE | ||
| RETURNS NULL ON NULL INPUT | ||
| PARALLEL SAFE | ||
| AS 'MODULE_PATHNAME'; |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new function _get_vertex_by_graphid(text, graphid) exposes a low-level vertex lookup to any caller without performing any privilege or RLS checks on the underlying label tables. Its C implementation uses get_vertex, which directly calls table_open / table_beginscan on the graph’s label tables by name, bypassing PostgreSQL GRANT and row-level security, so any role with EXECUTE on _get_vertex_by_graphid can fetch vertex data from graphs whose schemas/tables they may not otherwise be allowed to SELECT. To prevent unauthorized graph data access, restrict EXECUTE on _get_vertex_by_graphid (and similar helpers) to trusted roles only, or add explicit ACL/RLS checks before opening the target relation (e.g., verifying the caller has appropriate privileges on the resolved label table).
| AS 'MODULE_PATHNAME'; | |
| AS 'MODULE_PATHNAME'; | |
| REVOKE ALL ON FUNCTION ag_catalog._get_vertex_by_graphid(text, graphid) FROM PUBLIC; |
|
@MuhammadTahaNaveed Cursory thoughts so far -
|
|
@MuhammadTahaNaveed Ambiguous operator added - Current master branch PR 2303 Edge too |
…tion - Introduce vertex and edge as pg composite types vertex: (id, label, properties) edge: (id, label, start_id, end_id, properties) - Property access (a.name) now directly uses a.properties for agtype_access_operator instead of rebuilding via _agtype_build_vertex/edge - Optimize accessor functions (id, properties, label, type, start_id, end_id) to use direct FieldSelect on composite types instead of agtype functions - Add casts: vertex/edge to agtype, vertex/edge to json/jsonb - Add eq and not eq ops for vertex/edge composite types. - Fix label_name specific routine to use cache instead of ag_label scan - Write/update clauses have executors strictly tied to agtype, due to which the variables after any write/update clause are carried forward as agtype. - Allows users to completely skip agtype build functions and return vertex/edge for pure read queries. - Change _label_name to return agtype since record comparisons are not allowed with cstring. Consequently, _agtype_build_vertex/edge now accept agtype as label. - Fix MERGE clause type mismatch when accessing properties from previous MATCH clauses by wrapping columns with agtype_volatile_wrapper before namespace lookup. - Update expression index in pgvector.sql, since now it uses raw properties column instead of _agtype_build_vertex/edge. - Add regression tests Assisted-by AI
9f4904e to
9d6a08b
Compare
Updated
The changed functions (_label_name, _agtype_build_vertex, _agtype_build_edge) are intended for internal use only. They are invoked by the parser/executor and are not meant to be called directly by users. The upgrade script in age--1.6.0--y.y.y.sql manages the transition by dropping the old function signatures and creating the new ones. No data migration is required, as the underlying table structure remains unchanged. From a user perspective, existing queries that return agtype continue to work without any changes. The composite types are introduced to improve internal query transformation and serve as an additional output option, rather than a breaking change.
The composite types don’t increase storage, as they are only a query-time representation. The underlying tables continue to store the same data as before. The agtype representation is still required for paths (which are agtype arrays of vertices/edges), for storing entities in agtype containers, and for compatibility with existing agtype operators. When agtype is needed, the implicit casts(backed by functions vertex_to_agtype and edge_to_agtype) handle the conversion.
Good catch. Fixed it. |
vertex: (id, label, properties)
edge: (id, label, end_id, start_id, properties)
Assisted-by AI