Gmail-style advanced search for dandiset listing by bendichter · Pull Request #2814 · dandi/dandi-archive

bendichter · 2026-05-05T01:21:15Z

Summary

Adds Gmail-style key:value operator parsing to the existing
/api/dandisets/?search= parameter, so users can filter the dandiset
listing by structured properties (dates, species, approach, etc.) from
the same search box. The same syntax works from any HTTP client.

Example query:

hippocampus has_species:mouse has_approach:electrophysiological created_after:2024-01-01

v1 operators

Operator	Match
`created_before:` / `created_after:` (`YYYY-MM-DD`)	`Dandiset.created`
`modified_before:` / `modified_after:`	most-recent `Version.modified`
`published_before:` / `published_after:`	most-recent published `Version.created` (draft-only dandisets excluded)
`has_species:`	substring on `AssetSearch.species`
`has_approach:` / `has_technique:` / `has_standard:`	substring against any element of the asset_metadata JSON array's `.name` (Postgres `jsonb_path_exists` + `like_regex`)
`has_file_type:`	`encodingFormat` startswith, with `nwb`/`image`/`text`/`video` aliases

Free text outside operators continues through the existing full-text path. Unknown operators fall through to free text (Gmail behavior); malformed dates fail closed (return zero results) so a typo doesn't silently expand the set. Multi-word values support quoting (has_technique:"spike sorting").

Negation (-key:value) and OR are intentionally deferred to v2.

Frontend

The search field on the dandiset listing gets an updated placeholder and a help popover listing the operators with examples. No routing changes — the existing searchDandisets route forwards the raw search= string, so the parser does the work.

Other

Adds a ./manage.py create_search_test_data --owner <email> management command that seeds a matrix of dandisets covering every operator. Idempotent — safe to re-run while iterating.

Test plan

tox -e test -- dandiapi/api/tests/test_search_parser.py — 13 unit tests for the parser
tox -e test -- dandiapi/api/tests/test_dandiset.py -k advanced_search — 17 integration tests covering each operator, multi-element JSON array matching, repeated operators (intersection / annotation reuse), free text + operator combinations, embargoed-asset visibility, and fail-closed date parsing
tox -e lint
Manually exercised every operator end-to-end against a local stack with seeded fixtures

Recognize key:value operators inside the existing /api/dandisets/?search= parameter so users can filter by structured properties without leaving the listing's search box. Same syntax works from the dandi CLI, curl, or any HTTP client. v1 operators (AND-combined; quoted multi-word values supported): - created_before / created_after — filter on Dandiset.created - modified_before / modified_after — filter on most-recent Version.modified - published_before / published_after — filter on most-recent published Version.created (draft-only dandisets excluded) - has_species — substring match on AssetSearch.species - has_approach / has_technique / has_standard — case-insensitive substring match against any element of the asset_metadata JSON array's `.name`, via Postgres jsonb_path_exists with a parameterized like_regex - has_file_type — encodingFormat istartswith, with the existing nwb/image/text/video aliases Free text outside operators continues through the existing full-text path. Unknown operators fall through to free text (Gmail behavior); malformed dates fail closed (return 0 results) so typos don't silently expand the result set. Frontend search box gets an updated placeholder and a help popover listing the available operators with examples. The `searchDandisets` route already forwards the raw `search=` string, so no other UI changes were needed. Includes a `create_search_test_data` management command that seeds a matrix of dandisets covering every operator (idempotent — safe to re-run). Tests: - 13 unit tests for the parser (tokenization, quoted phrases, unknown operators, malformed dates, regex metacharacters in values) - 17 integration tests against /api/dandisets/?search= covering each operator, multi-element JSON array matching, repeated operators (intersection / annotation reuse), free text + operator combinations, embargoed-asset visibility, and fail-closed date parsing

bendichter · 2026-05-05T01:32:53Z

Here are the front-end changes, but the search doesn't work on the CI sandbox preview because it also requires changes to the backend.

The seeder used to call delete_dandiset(user=owner, ...) but switched to raw ORM .delete() to bypass the published-version guard. The parameter was left behind with a noqa: ARG001 comment, but the project's ruff config doesn't enable ARG001, so the noqa itself failed RUF100 in CI. Just drop the parameter and the comment.

The Playwright test located the search field by accessibility name "Search Dandisets by name,", which was the original full placeholder. The advanced-search PR shortened that placeholder to surface operator hints, breaking the locator. Switch to the shorter substring "Search Dandisets" so the test survives future copy tweaks.

bendichter · 2026-05-05T01:53:34Z

@yarikoptic I'd love your thoughts on this. I thought this would be enough for a v1.

In the future, I'd like to expand this to other types of metadata, including brain area, but I'd like to take it one step at a time.

bendichter · 2026-05-05T11:35:42Z

If there is interest, I'd be happy to create a full semantics guide in the docs

yarikoptic

My initial human review, then would hunt AI on it in the next iteration after initial concerns addressed/responded to

yarikoptic · 2026-05-05T17:04:38Z

@@ -0,0 +1,247 @@
+"""Seed a matrix of dandisets that exercises every advanced-search operator.
+
+Idempotent — re-running deletes any prior dandisets it created (those whose


this helper command seems to just for local testing, and nowhere used in the unit tests, is that correct?

could/should it be used in the tests or why we need it then, since as soon as feature is "merged" we have access to wide range of data we have in the archive, and unit tests should cover testing: without regular excercising this code regularly would rot and eventually become either not runnable or even could be done by mistake on running main instance and cause "troubles" (imagine test prefix not matching and going through deleting... ) . So, I would be vary of adding this in "final merge"

this is for generating fake data so you can test out the feature. I'd be fine with deleting or somehow archiving it before merge.

Removed in e087cdf — the seeder is gone from the PR.

oh no! we will rely then on your demos ;) that is why I noted to remove it before final merge. ok either way if you keep showing the demos (unlikely I would try directly myself although who knows)

yarikoptic · 2026-05-05T17:07:03Z

+
+from dataclasses import dataclass, field
+import re
+import shlex


hm, potentially could be cute as to "relate" to good old @djarecka 's

[WIP] dandi search command dandi-cli#1126

as to specify those queries in actual posix shell CLI ;)

yarikoptic · 2026-05-05T17:07:43Z

+        'has_species',
+        'has_approach',
+        'has_technique',
+        'has_standard',


do you see any benefit from explicit has_ prefix?

IMO, if you omit the verb the implication is "is", which implies that the options are mutually exclusive. "has" implies (correctly) that a dandiset may be associated with more than one standard/approach/technique/species. I'd be fine with removing the "has_" if you think it's confusing

Replied above — the verb explicitly distinguishes "is" (mutually exclusive) from "has" (one of many). Happy to drop it if you'd rather, but I'd prefer to keep it.

not yet strong on this, but thinking of e.g. github search of similar kind not bothering with has_ prefix for the same semantic:

good counter-point. I did a little digging. GMail, GitHub, and Google Scholar all have tag-like fields and none of them use the prefix "has" in the way I used it here. I'm removing it.

yarikoptic · 2026-05-05T17:12:57Z

+        if (match := _OPERATOR_RE.match(token)) and match.group(1) in OPERATOR_KEYS:
+            parsed.operators.append((match.group(1), match.group(2)))
+        else:
+            parsed.free_text.append(token)


here, again I would prefer to be "explicit better than implicit" as

if there is "some-operator:value" , using some unknown (today!) some-operator would not make it "transparently" into a free_text -- should also return error that such operator is unknown and potentially use libdiff (or whatever thta battery in python) to return candidate command close by - this helps on typos great deal,

❯ git commmit git: 'commmit' is not a git command. See 'git --help'. The most similar command is commit

so envision people mistyping operators...

Yes, that makes sense. I'm just worried about the case where the user actually wants to include a colon character in the free text. I think that would be pretty rare so I am not so worried about it. Maybe we can protect that ability by allowing the user to enter it in quotes?

Done in e087cdf. Unknown operators raise SearchSyntaxError with a difflib-based "Did you mean?" suggestion. To preserve the colon-in-free-text use case, wrapping the token in quotes ("foo:bar") is the documented escape — covered by a parser test and an integration test.

yeap, coolio, could you show screenshot on how it would look channeled back to the user in web ui?

yarikoptic · 2026-05-05T17:33:10Z

+    return {r['identifier'] for r in response.json()['results']}
+
+
+@pytest.mark.django_db


there is no policy/agreement in dandi-archive on this, but I do prefer explicit annotation of ai-produced tests, with @pytest.mark.ai_generated , see e.g. dandi-cli. This allows later to decide on how close attention to pay to the tests body as -- here I immediately would have spotted lots of duplication and recommended to use parametric... I tried to find where in my CLAUDE*mds I have instructions for that but failed! But I do frequently prompt claude code to establish proper parametric tests where testing various values/boundary conditions etc. I feel like here it could be the case too

Done in e087cdf. Every advanced-search test now carries @pytest.mark.ai_generated, and the marker is registered in pyproject.toml so --strict-markers accepts it. Also collapsed the parser tests to two parametrized tests (10 happy-path cases, 6 error cases).

FYI and FTR: in dandi-cli we switched from pyproject.toml markers annotation to https://github.com/dandi/dandi-cli/blob/master/dandi/pytest_plugin.py with additional goodnesses to it... may be we should come up with some dandi-core library to use uniformly for such across repos...

@yarikoptic

Per @yarikoptic's review of dandi#2814: - Drop the create_search_test_data management command. It was a dev-only seeder, never exercised by the test suite, and posed a risk of bit-rot and accidental destructive use against a real DB. - Tighten the parser's error handling: unbalanced quotes and unknown operator keys now raise SearchSyntaxError instead of silently degrading to free text. Unknown keys get a "Did you mean?" suggestion via difflib.get_close_matches. Malformed dates also raise (was previously fail-closed). The view layer translates SearchSyntaxError into a DRF ValidationError so the user sees a 400 with the error message. - Add an explicit escape hatch: wrapping a token in double quotes opts out of operator parsing, so users can search for a literal colon (e.g. `"foo:bar"`). - Add a 1024-char length cap on the search term as DoS hardening (an unauthenticated endpoint). - Move the function-local `from django.db import connection` to the top-of-file imports. - Mark every advanced-search test with `@pytest.mark.ai_generated` and register the marker in pyproject so `--strict-markers` accepts it. - Restructure the parser tests as two parametrized tests (10 happy-path cases, 6 error cases) instead of 13 individual tests. Test count: 35 (16 parametrized parser cases + 19 integration). All pass; ruff check + format clean.

yarikoptic-gitmate · 2026-05-05T20:36:53Z

Round-2 review

TL;DR: round-1 feedback addressed cleanly in e087cdf — strict parser, AI-marked tests, cleanup. Two small open threads from round-1 (has_ prefix, error-UI screenshot). Also one substantive new bug, plus a few smaller things.

Round-1 status

#	Thread	Status
parser.py:32	drop `has_` prefix	open — leaning your way (GitHub's search uses bare `language:` etc.) but not a blocker
parser.py:59	unknown-operator → 400 with "Did you mean?"	done; please post the UI screenshot showing how the 400 surfaces in the web search field
parser.py:43	strict on unbalanced quotes	done
parser.py:13	shlex CLI tie-in	non-blocking aside
views/dandiset.py:155	injection audit	done — incl. 1024-char DoS cap
tests/test_dandiset.py:1592	move import to top	done
tests/test_dandiset.py:1597	reuse fixtures	partially — added `_seed_dandiset_with_asset` helper but didn't find a shared one
tests/test_dandiset.py:1626	`@pytest.mark.ai_generated`	done, marker registered in pyproject
tests/test_search_parser.py:21	parametrize	done
create_search_test_data.py	drop seeder	done

New round-2 findings

1. has_species: only matches the FIRST attributed species — diverges from how has_approach/has_technique/has_standard work (real bug, IMHO worth fixing before merge)

The species column on the asset_search materialized view is denormalized
from a single path:

-- dandiapi/search/migrations/0002_denormalize_species.py
COALESCE({ASSET_TABLE}.metadata->'wasAttributedTo'->0->'species'->>'name', '') AS species

So has_species:rat will not match an asset whose
wasAttributedTo[1].species.name = "Norway rat" if wasAttributedTo[0] is
something else. Meanwhile has_approach: / has_technique: /
has_standard: use jsonb_path_exists with $.approach[*].name and DO
match any element. The PR description even calls out this asymmetry:

has_approach: / has_technique: / has_standard: (substring against any element of the asset_metadata JSON array's .name)

so I think the asymmetry is unintentional. Two ways out:

Lift has_species onto the same jsonb_path_exists machinery, scanning
$.wasAttributedTo[*].species.name. Cleanest, no migration needed.
Change the materialized view to string_agg all species names into the
denormalized column, then keep species__icontains. Cheapest at query
time but needs a migration + view rebuild + regen on changes you'd
otherwise leave to existing refresh cadence.

The integration tests don't catch this because every fixture sets
wasAttributedTo to a single-element list. A [mouse, rat] test case
would surface it.

2. The "unbalanced quote" error advertises an escape (\") the parser doesn't actually support

dandiapi/api/services/search/parser.py:

def _check_balanced_quotes(query: str) -> None:
    if query.count('"') % 2 != 0:
        raise SearchSyntaxError(
            'Unbalanced quote in search query. Use \\" to include a literal double quote, '
            'or remove the stray one.'
        )

But _TOKEN_RE uses [^"]* for the quoted-string body, with no escape
handling — so has_species:"foo\"bar" parses as has_species:"foo\"
(unbalanced once that closes early). Either:

drop the \" half of the message and just say "remove the stray one or
wrap the term in quotes," or
actually implement \" (and probably \\) inside [^"]* and add a
parser test.

The minimal fix is the message change.

3. Repeated same-key asset operators have surprising semantics — single-asset AND, not OR-of-values

has_species:mouse has_species:rat parses to two operator tuples (the
parser test confirms this) and apply_search_filters chains them as

asset_qs = asset_qs.filter(species__icontains='mouse')
asset_qs = asset_qs.filter(species__icontains='rat')

i.e. a single AssetSearch row must contain both substrings. Most
other search UIs (Gmail, GitHub) treat repeated keys as OR. The current
behavior is also partially the same-key-on-same-asset semantics for
non-species keys, where it does feel reasonable
(has_approach:electrophysiological has_approach:behavioral →
"asset uses both"). But for has_species it almost certainly isn't what
users will expect.

Easiest path: document the AND-on-same-asset behavior in the help popover
and the API help text. Better path: collapse repeated same-key operators
to OR per key, AND across keys. Either is fine — but the no-test-coverage
status quo is the worst of three options.

4. has_approach/has_technique/has_standard can't use any existing index — sequential scan over asset_search per query (performance, not correctness)

The migration creates a GIN on asset_metadata->'measurementTechnique'
and a btree on species, but jsonb_path_exists(asset_metadata, '...')
with a runtime-built jsonpath cannot use either. Every advanced-search
hit on these three operators will scan the whole materialized view. On
the live archive that's manageable today; for an unauthenticated endpoint
it's worth a thought before this becomes an attack-surface concern.

Two cheap-ish options:

Add expression GIN indexes with gin_trgm_ops on
(asset_metadata->'approach'), (asset_metadata->'measurementTechnique'),
(asset_metadata->'dataStandard') rendered to text.
Or denormalize the name arrays into text columns on the materialized
view (similar to species), and switch to __icontains against
trigram-GIN'd text.

Either way, please add an index — or a comment acknowledging the
sequential scan and the assumed row count.

5. OpenAPI search help text not updated

DandisetSearchQueryParameterSerializer inherits search from
DandisetQueryParameterSerializer (dandiapi/api/views/serializers.py:301),
whose help text is still

help_text='Search terms to filter the results.',

Anyone reading the swagger doc still sees a generic search field; the
operator vocabulary is invisible. Override search on the search-listing
serializer with a help_text describing the operator syntax (or pointing
at a docs section).

6. has_species: with empty value is silently dropped

apply_search_filters does if not value: continue after value.strip(),
so has_species: foo (note the space) and has_species:"" parse without
error and quietly become a no-op. Consistent with the rest of the
"explicit > implicit" stance pushed in round-1 would be to raise
SearchSyntaxError("has_species expects a non-empty value").

7. Frontend: how does the 400 surface? (this is the screenshot ask from round-1)

I looked at web/src/components/DandisetSearchField.vue and the search
store at web/src/components/Search/store.ts:

async function updateSearchQuery() {
  loading.value = true;
  const results = await dandiRest.searchDandisets(...);
  searchResults.value = results;
  loading.value = false;
}

There's no try/catch here, and no error display in the search field
template. Whatever happens on a 400 depends entirely on the global axios
interceptor — likely a generic error toast. That's not a great UX for the
nice "Did you mean has_species" message you went to the trouble of
producing on the backend. Worth either:

showing the 400 message inline under the search field (best), or
at minimum confirming the toast renders the backend message verbatim
(and posting that screenshot here).

Nits

parser.py: _BARE_OP_RE = re.compile(r'^([a-z_]+):(.+)$', re.DOTALL) — bare tokens come from \S+, so they can't contain newlines; re.DOTALL is a no-op. Drop it.
views/dandiset.py:148: comment "the generated SQL is incompatible previously generated clauses" is missing "with".
services/search/__init__.py re-exports OPERATOR_KEYS and ParsedSearch but they aren't used outside the package. Either drop them from __all__ or document the public surface.
tests/test_dandiset.py: the test named ..._repeated_asset_operators_intersect actually asserts "single asset has BOTH species and approach." Worth a one-line comment in the test stating that semantic explicitly so future readers know it's deliberate (and so the per-asset semantics noted in finding ref: change names #3 are pinned by a test).

`has_` prefix (round-1 carryover)

Just to record where I'm landing: GitHub's search box uses
language:python user:foo without has_, and I find the missing prefix
unsurprising in practice (approach:electrophysiology reads naturally as
"with this approach"). I won't insist, but if you do drop it the
allowlist + suggestions also get smaller and the error messages get
nicer. Up to you.

@yarikoptic

Per @yarikoptic's review: GitHub, Jira, Linear, Notion, etc. all use bare operators like `label:bug` for multi-valued filters; Gmail reserves `has:` for boolean checks (`has:attachment`), not for valued ones. The `has_` prefix here was a redundant disambiguator that the broader ecosystem doesn't need. Renames: has_species → species has_approach → approach has_technique → technique has_standard → standard has_file_type → file_type Date operators (`created_before:` etc.) keep their suffix because that suffix conveys comparison direction, not is/has semantics. Also surface backend SearchSyntaxError 400s inline in DandisetsPage.vue instead of letting them bubble up to the global "something went wrong" snackbar.

`species:` was still using the AssetSearch.species denormalized column, which the materialized view defines as `wasAttributedTo[0].species.name` (only the first attributed subject). That diverged from how the other asset operators (approach/technique/standard) now match — they scan the full array via Postgres jsonpath. An asset attributed to multiple species (xenotransplantation, multi-species recordings) would silently miss `species:` matches against any but the first. Switch `species:` to the same `jsonb_path_exists` path-scan against `$.wasAttributedTo[*].species.name`. Drops the special case in `_apply_asset_filter` and unifies the four NAME_PATH operators behind a single jsonpath dispatch. Adds a regression test that asserts a multi-subject asset matches `species:Human` AND `species:mouse`.

- Drop the `\"` claim from the unbalanced-quote message; the parser doesn't support that escape (no_R2#2). - Empty operator value (e.g. `species:" "`) now raises 400 instead of being silently dropped (no_R2#6). - Override OpenAPI help text on the `search` parameter to document the operator vocabulary instead of "Search terms to filter the results" (no_R2#5). - Acknowledge sequential-scan trade-off on jsonpath operators with a comment, and document cross-key AND-on-same-asset semantics in the apply_search_filters body (no_R2#3, dandi#4 — comment-only, indexes deferred until needed). - Pin same-key AND semantics with a new test test_advanced_search_repeated_same_key_operator_combines_with_and. - Nits: drop no-op re.DOTALL on _BARE_OP_RE; fix "incompatible previously generated" → "incompatible with previously generated" comment in views/dandiset.py; trim __all__ in services.search to the public surface (parse_search, SearchSyntaxError); add a clarifying comment to the cross-key intersect test.

bendichter · 2026-05-05T20:56:08Z

Round-2 status

Thanks for the thorough re-review. Pushed ca7f2837 (plus the earlier f39473d5 and 4dd19c18) addressing the round-2 findings:

#	Finding	Status
1	`species:` only first-element	fixed in `f39473d5` — switched to `jsonb_path_exists` against `$.wasAttributedTo[*].species.name`; new `test_advanced_search_species_matches_any_attributed_subject` covers it
2	`\"` escape advertised but not implemented	fixed — dropped the `\"` half of the message
3	Same-key semantics undocumented	documented — added explicit comment in `apply_search_filters` and a new `test_advanced_search_repeated_same_key_operator_combines_with_and` test pinning AND-on-same-asset (matches GitHub's default for repeated keys)
4	Seq-scan on jsonpath operators	acknowledged in a `NOTE perf:` comment with the two index strategies you suggested. Skipping the actual indexes for now since the view is small enough; happy to do them if you'd rather block on that
5	OpenAPI help text	fixed — overrode `search` on `DandisetQueryParameterSerializer` with the operator vocabulary, MIME aliases, and behavior on invalid syntax
6	Empty value silently dropped	fixed — now raises `SearchSyntaxError` → 400 with `Operator "species" requires a value (e.g. species:something).`; new `test_advanced_search_empty_operator_value_returns_400` covers it
7	Frontend 400 surfacing	fixed in `4dd19c18` — added a `try`/`catch` around the dandiset list call in `DandisetsPage.vue`. Axios 400 with shape `{ search: "..." }` renders inline as a tonal `v-alert` above the result list (red text, with the actual server message); other errors bubble to the global handler as before. Screenshot below 👇
has_ prefix	round-1 carryover	dropped in `4dd19c18` — `species:`, `approach:`, `technique:`, `standard:`, `file_type:`
Nits	DOTALL, comment grammar, `__all__`, intersect test comment	all fixed in `ca7f2837`

Test count: 38 (16 parametrized parser cases + 22 integration tests, all carrying @pytest.mark.ai_generated). Ruff check + format clean.

Frontend error rendering — typing species:mouse into the search box would normally return results, but for specie:mouse:

⚠️ Unknown search operator "specie". Did you mean "species"? Wrap the term in double quotes (e.g. "foo:bar") to search for it as text.

(Will attach a screenshot once the deploy preview rebuilds.)

bendichter · 2026-05-05T20:58:19Z

bendichter · 2026-05-05T20:58:52Z

bendichter · 2026-05-06T19:58:33Z

Screen.Recording.2026-05-06.at.3.57.03.PM.mov

yarikoptic

minor changes requested

yarikoptic · 2026-05-06T19:30:02Z

+
+def _parse_date(value: str) -> datetime | None:
+    try:
+        return datetime.strptime(value, '%Y-%m-%d').replace(tzinfo=UTC)


since one time use -- just move this try/except with datetime.strptime in place of using _parse_date, and thus directly there raise after catch instead of doing the dance of returning None to raise out there

Done in 334938c — strptime + try/except inlined directly, raises SearchSyntaxError from the except. Helper deleted.

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

@yarikoptic

Per @yarikoptic: it's a one-time-use helper that returns None just so the caller can raise. Drop the indirection — strptime + try/except in place, raise SearchSyntaxError directly from the except.

yarikoptic · 2026-05-07T00:57:50Z

we will though need to test drive it while in sandbox well, may be even give to some claude or skill to see how behaves etc going into the limits etc -- better on sandbox than on a real instance.

bendichter · 2026-05-07T11:57:16Z

On the apparent duplication with `/api/dandisets/search/`

Quick note for reviewers worried about asset-filter logic existing in two places (the new ?search=species:mouse operator and the existing ?species=... REST param on /api/dandisets/search/).

The overlap is narrower than it looks

Of the operators in this PR, only the asset filters overlap with anything pre-existing. The rest are net-new:

Operator	Pre-existing REST equivalent
`created_before/after`, `modified_before/after`, `published_before/after`	none — server-side date filtering didn't exist
`owner:<anything but me>`	none — `?user=me` is a `ChoiceField(choices=['me'])`, deliberately limited
`owner:me`	`?user=me` (full overlap, two spellings)
`species:`, `approach:`, `technique:`, `standard:`, `file_type:`	partial — discrete params on `/api/dandisets/search/`

Even within the asset-filter overlap, the matching semantics differ: the discrete params do exact match against a controlled vocabulary (MultipleChoiceField with choices populated from AssetSearch.species), while the operators do case-insensitive substring / jsonpath scans. ?species=mouse returns 400 (not in choices); ?search=species:mouse matches "House mouse", "Mus musculus - House mouse", etc. Different tools for different jobs.

Two URL shapes is a normal design pattern

Major search platforms commonly expose both an operator string and a structured-form/discrete-param shape, addressing different UX:

Google Scholar: top search bar accepts intitle:, author:, source: operators inside q=; the advanced search form uses discrete params (as_q=, as_sauthors=PJ, as_ylo=2020, ...). Both work, both shipped.
Gmail: simple search box accepts from:, subject:, has:; the advanced search dropdown is a form that emits the same operator string under the hood. One canonical URL shape, two ways to construct it.
GitHub: search box uses ?q=is:open+assignee:foo. They removed the discrete-param advanced form but the operator string survived.
Twitter/X, Slack, Linear, Sourcegraph: operator-string search box; a few have separate filter UIs that produce the same query string.

DANDI today already has the dual entry points:

Top search box on the dandiset list → operator string (this PR)
Faceted SearchView with checkboxes → discrete REST params on /api/dandisets/search/ (pre-existing, drives the sidebar checkbox UI)

The duplication is the cost of supporting both UX patterns. Scholar pays it. Gmail pays it (via different mechanism). It's the standard trade-off when you want both quick free-form typing and a guided checkbox experience.

Why not consolidate now?

The cleanest long-term move is a Gmail-style convergence: have the faceted UI emit the operator string under the hood, so there's one canonical filter language with two front-ends. That requires:

Loosening ?species= from MultipleChoiceField (exact) to CharField (substring) — a behavior change for existing API consumers.
Rewriting web/src/views/SearchView/SearchView.vue to construct the operator string instead of building discrete params.
Deciding whether /api/dandisets/search/ itself stays or gets folded into the listing endpoint.

That's a worthwhile follow-up, but it's a separate decision about the faceted UI and changes pre-existing API behavior. Bundling it with this PR would conflate "introduce operator search" with "change how the existing search endpoint works" — two reviews in one. Easier to ship the additive feature, see how people actually use the operator syntax, then consolidate intentionally.

Net

The PR adds a search syntax users have learned everywhere else (Gmail/GitHub/Slack), addresses one missing capability (date filtering, anyone-not-me owner filtering), and overlaps with one existing endpoint in a narrow, semantically-distinct way that mirrors what Scholar and Gmail already do. The "single source of truth" cleanup is a follow-up worth doing on its own merits.

bendichter · 2026-05-07T12:06:26Z

ok great, let's move this to the sandbox so we can all test it!

bendichter added 2 commits May 4, 2026 21:34

bendichter requested a review from yarikoptic May 5, 2026 01:52

yarikoptic added the search Search and filtering of dandisets/assets label May 5, 2026

yarikoptic requested changes May 5, 2026

View reviewed changes

bendichter added 3 commits May 5, 2026 16:43

yarikoptic requested changes May 6, 2026

View reviewed changes

bendichter and others added 2 commits May 6, 2026 16:45

Update web/src/components/DandisetSearchField.vue

65659e2

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

Inline _parse_date into apply_search_filters

334938c

Per @yarikoptic: it's a one-time-use helper that returns None just so the caller can raise. Drop the indirection — strptime + try/except in place, raise SearchSyntaxError directly from the except.

bendichter mentioned this pull request May 7, 2026

Add owner: operator (stacked on #2814) bendichter/dandi-archive#1

Closed

3 tasks

yarikoptic approved these changes May 7, 2026

View reviewed changes

bendichter merged commit 2692ef6 into dandi:master May 7, 2026
7 checks passed

This was referenced May 7, 2026

Wider, responsive advanced-search help popover (closes #2818) bendichter/dandi-archive#2

Closed

Wider, responsive advanced-search help popover (closes #2818) #2820

Merged

jjnesbitt added the needs-kitware-review This was merged without Kitware approval and should be reviewed by them label May 7, 2026

jjnesbitt added needs-kitware-review This was merged without Kitware approval and should be reviewed by them and removed needs-kitware-review This was merged without Kitware approval and should be reviewed by them labels May 7, 2026

This was referenced May 8, 2026

Add owner: operator to advanced search #2821

Open

Add contributor + per-role operators to advanced search #2822

Open

		@@ -0,0 +1,247 @@
		"""Seed a matrix of dandisets that exercises every advanced-search operator.

		Idempotent — re-running deletes any prior dandisets it created (those whose

		return {r['identifier'] for r in response.json()['results']}


		@pytest.mark.django_db

Conversation

bendichter commented May 5, 2026

Summary

v1 operators

Frontend

Other

Test plan

Uh oh!

bendichter commented May 5, 2026

Uh oh!

bendichter commented May 5, 2026

Uh oh!

bendichter commented May 5, 2026

Uh oh!

yarikoptic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yarikoptic-gitmate commented May 5, 2026

Round-2 review

Round-1 status

New round-2 findings

has_ prefix (round-1 carryover)

Uh oh!

bendichter commented May 5, 2026

Round-2 status

Uh oh!

bendichter commented May 5, 2026

Uh oh!

bendichter commented May 5, 2026

Uh oh!

bendichter commented May 6, 2026

Uh oh!

yarikoptic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

`has_` prefix (round-1 carryover)

On the apparent duplication with `/api/dandisets/search/`