You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a tracking issue for xvc data and subcommands.
This is to add metadata and labels to files, make queries.
xvc data label KEY=VALUE <targets>: Add a label to a set of file targets. This one adds a single label. There may be multiple labels for each file at the end. Labeling may be an event like store events. There may be implicit labels, like updated=<timestamp> for the metadata we track with XvcMetadata.
xvc data attach <text-file> <targets>: Attach a file containing metadata (or labels, annotations, searchable text) to targets. If the text file is JSON, YAML, or TOML, it can be parsed as labels. It must have a definite structure though. A dictionary in any of these formats is OK.
xvc data query --select [path, name, label,...] --from <targets> --where QUERY: This lists the asked info about targets that satisfy query.
QUERY can be a complex query that satisfies AND, OR, NOT operators, (), ==, =~ (regex match) operators. For numerics, we can also add numerical operations.
--select and --from are optional in xvc data query. It's possible to write xvc data query QUERY to run a query over all data.
xvc data query --name can be used to give a name to the query and run it later as a target, with --from. e.g. xvc data query --where 'class ~= .*berry' --name berries and can be used later as xvc data query --select name --from berries.
xvc data operations can be run from data by supplying an additional --query option that runs the query. xvc data move --query berries berries/
We also need a xvc data/file export <targets> command to copy a set of files to outside of the workspace. This can be used to create directories that contain subsets of data.
We may also need move, copy, remove commands to xvc data/file to make subsets of the dataset and update its metadata.
xvc data commands will run xvc file operations after running the query. xvc file won't operate with queries or metadata, xvc data will. The difference of these commands are this.
xvc data query
--select should have some implicit columns. ['path', 'filename', 'created', 'updated', ...]. labels will show all labels.
--from should accept files, directories, and globs. It will walk select similar to xvc file list, and run the query on these elements.
--where accepts a single query. The query language can be:
jaq can be embedded as a query language. In this case, json documents can be more complex, but I'm not sure if this complexity is necessary.
A simple home made query language similar to Bash / Python or something.
labels IN [strawberry, blueberry]
created < 2020-12-12 12:12:12
changed > 2022-10-31 00:00:00
name ~= .*berry
path *= images/*/*berry.png
(labels ~= .*berry) AND (created >= 2020-12-12 00:00:00)
I think this last option may be more flexible. It can contain queries in sql or jql or some other ql with other operators.
(attached JAQ '.[][key] == value')
an MVP version can be built by field OP value and other features (parens, logical ops, etc.) can be added later.
This discussion was converted from issue #144 on January 24, 2023 06:59.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is a tracking issue for
xvc dataand subcommands.This is to add metadata and labels to files, make queries.
xvc data label KEY=VALUE <targets>: Add a label to a set of file targets. This one adds a single label. There may be multiple labels for each file at the end. Labeling may be an event like store events. There may be implicit labels, likeupdated=<timestamp>for the metadata we track withXvcMetadata.xvc data attach <text-file> <targets>: Attach a file containing metadata (or labels, annotations, searchable text) to targets. If the text file is JSON, YAML, or TOML, it can be parsed aslabels. It must have a definite structure though. A dictionary in any of these formats is OK.xvc data query --select [path, name, label,...] --from <targets> --where QUERY: This lists the asked info about targets that satisfy query.QUERYcan be a complex query that satisfiesAND,OR,NOToperators,(),==,=~(regex match) operators. For numerics, we can also add numerical operations.--selectand--fromare optional inxvc data query. It's possible to writexvc data query QUERYto run a query over all data.xvc data query --namecan be used to give a name to the query and run it later as a target, with--from. e.g.xvc data query --where 'class ~= .*berry' --name berriesand can be used later asxvc data query --select name --from berries.xvc dataoperations can be run from data by supplying an additional--queryoption that runs the query.xvc data move --query berries berries/We also need a
xvc data/file export <targets>command to copy a set of files to outside of the workspace. This can be used to create directories that contain subsets of data.We may also need
move,copy,removecommands toxvc data/fileto make subsets of the dataset and update its metadata.xvc datacommands will runxvc fileoperations after running the query.xvc filewon't operate with queries or metadata,xvc datawill. The difference of these commands are this.xvc data query--selectshould have some implicit columns.['path', 'filename', 'created', 'updated', ...].labelswill show all labels.--fromshould accept files, directories, and globs. It will walk select similar toxvc file list, and run the query on these elements.--whereaccepts a single query. The query language can be:labels IN [strawberry, blueberry]created < 2020-12-12 12:12:12changed > 2022-10-31 00:00:00name ~= .*berrypath *= images/*/*berry.png(labels ~= .*berry) AND (created >= 2020-12-12 00:00:00)I think this last option may be more flexible. It can contain queries in sql or jql or some other ql with other operators.
(attached JAQ '.[][key] == value')an MVP version can be built by
field OP valueand other features (parens, logical ops, etc.) can be added later.Beta Was this translation helpful? Give feedback.
All reactions