Skip to content

Gmail-style advanced search for dandiset listing#2814

Merged
bendichter merged 9 commits into
dandi:masterfrom
bendichter:advanced-search
May 7, 2026
Merged

Gmail-style advanced search for dandiset listing#2814
bendichter merged 9 commits into
dandi:masterfrom
bendichter:advanced-search

Conversation

@bendichter
Copy link
Copy Markdown
Member

Summary

Adds Gmail-style key:value operator parsing to the existing
/api/dandisets/?search= parameter, so users can filter the dandiset
listing by structured properties (dates, species, approach, etc.) from
the same search box. The same syntax works from any HTTP client.

Example query:

hippocampus has_species:mouse has_approach:electrophysiological created_after:2024-01-01

v1 operators

Operator Match
created_before: / created_after: (YYYY-MM-DD) Dandiset.created
modified_before: / modified_after: most-recent Version.modified
published_before: / published_after: most-recent published Version.created (draft-only dandisets excluded)
has_species: substring on AssetSearch.species
has_approach: / has_technique: / has_standard: substring against any element of the asset_metadata JSON array's .name (Postgres jsonb_path_exists + like_regex)
has_file_type: encodingFormat startswith, with nwb/image/text/video aliases

Free text outside operators continues through the existing full-text path. Unknown operators fall through to free text (Gmail behavior); malformed dates fail closed (return zero results) so a typo doesn't silently expand the set. Multi-word values support quoting (has_technique:"spike sorting").

Negation (-key:value) and OR are intentionally deferred to v2.

Frontend

The search field on the dandiset listing gets an updated placeholder and a help popover listing the operators with examples. No routing changes — the existing searchDandisets route forwards the raw search= string, so the parser does the work.

Other

Adds a ./manage.py create_search_test_data --owner <email> management command that seeds a matrix of dandisets covering every operator. Idempotent — safe to re-run while iterating.

Test plan

  • tox -e test -- dandiapi/api/tests/test_search_parser.py — 13 unit tests for the parser
  • tox -e test -- dandiapi/api/tests/test_dandiset.py -k advanced_search — 17 integration tests covering each operator, multi-element JSON array matching, repeated operators (intersection / annotation reuse), free text + operator combinations, embargoed-asset visibility, and fail-closed date parsing
  • tox -e lint
  • Manually exercised every operator end-to-end against a local stack with seeded fixtures

Recognize key:value operators inside the existing /api/dandisets/?search=
parameter so users can filter by structured properties without leaving the
listing's search box. Same syntax works from the dandi CLI, curl, or any
HTTP client.

v1 operators (AND-combined; quoted multi-word values supported):

- created_before / created_after — filter on Dandiset.created
- modified_before / modified_after — filter on most-recent Version.modified
- published_before / published_after — filter on most-recent published
  Version.created (draft-only dandisets excluded)
- has_species — substring match on AssetSearch.species
- has_approach / has_technique / has_standard — case-insensitive substring
  match against any element of the asset_metadata JSON array's `.name`,
  via Postgres jsonb_path_exists with a parameterized like_regex
- has_file_type — encodingFormat istartswith, with the existing
  nwb/image/text/video aliases

Free text outside operators continues through the existing full-text path.
Unknown operators fall through to free text (Gmail behavior); malformed
dates fail closed (return 0 results) so typos don't silently expand the
result set.

Frontend search box gets an updated placeholder and a help popover
listing the available operators with examples. The `searchDandisets`
route already forwards the raw `search=` string, so no other UI changes
were needed.

Includes a `create_search_test_data` management command that seeds a
matrix of dandisets covering every operator (idempotent — safe to re-run).

Tests:

- 13 unit tests for the parser (tokenization, quoted phrases, unknown
  operators, malformed dates, regex metacharacters in values)
- 17 integration tests against /api/dandisets/?search= covering each
  operator, multi-element JSON array matching, repeated operators
  (intersection / annotation reuse), free text + operator combinations,
  embargoed-asset visibility, and fail-closed date parsing
@bendichter
Copy link
Copy Markdown
Member Author

image

Here are the front-end changes, but the search doesn't work on the CI sandbox preview because it also requires changes to the backend.

bendichter added 2 commits May 4, 2026 21:34
The seeder used to call delete_dandiset(user=owner, ...) but switched to
raw ORM .delete() to bypass the published-version guard. The parameter
was left behind with a noqa: ARG001 comment, but the project's ruff
config doesn't enable ARG001, so the noqa itself failed RUF100 in CI.
Just drop the parameter and the comment.
The Playwright test located the search field by accessibility name
"Search Dandisets by name,", which was the original full placeholder.
The advanced-search PR shortened that placeholder to surface operator
hints, breaking the locator. Switch to the shorter substring "Search
Dandisets" so the test survives future copy tweaks.
@bendichter bendichter requested a review from yarikoptic May 5, 2026 01:52
@bendichter
Copy link
Copy Markdown
Member Author

@yarikoptic I'd love your thoughts on this. I thought this would be enough for a v1.

In the future, I'd like to expand this to other types of metadata, including brain area, but I'd like to take it one step at a time.

@bendichter
Copy link
Copy Markdown
Member Author

If there is interest, I'd be happy to create a full semantics guide in the docs

@yarikoptic yarikoptic added the search Search and filtering of dandisets/assets label May 5, 2026
Copy link
Copy Markdown
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial human review, then would hunt AI on it in the next iteration after initial concerns addressed/responded to

@@ -0,0 +1,247 @@
"""Seed a matrix of dandisets that exercises every advanced-search operator.

Idempotent — re-running deletes any prior dandisets it created (those whose
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this helper command seems to just for local testing, and nowhere used in the unit tests, is that correct?

could/should it be used in the tests or why we need it then, since as soon as feature is "merged" we have access to wide range of data we have in the archive, and unit tests should cover testing: without regular excercising this code regularly would rot and eventually become either not runnable or even could be done by mistake on running main instance and cause "troubles" (imagine test prefix not matching and going through deleting... ) . So, I would be vary of adding this in "final merge"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for generating fake data so you can test out the feature. I'd be fine with deleting or somehow archiving it before merge.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in e087cdf — the seeder is gone from the PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no! we will rely then on your demos ;) that is why I noted to remove it before final merge. ok either way if you keep showing the demos (unlikely I would try directly myself although who knows)

Comment thread dandiapi/api/services/search/parser.py Outdated

from dataclasses import dataclass, field
import re
import shlex
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, potentially could be cute as to "relate" to good old @djarecka 's

as to specify those queries in actual posix shell CLI ;)

Comment thread dandiapi/api/services/search/parser.py Outdated
'has_species',
'has_approach',
'has_technique',
'has_standard',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you see any benefit from explicit has_ prefix?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, if you omit the verb the implication is "is", which implies that the options are mutually exclusive. "has" implies (correctly) that a dandiset may be associated with more than one standard/approach/technique/species. I'd be fine with removing the "has_" if you think it's confusing

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied above — the verb explicitly distinguishes "is" (mutually exclusive) from "has" (one of many). Happy to drop it if you'd rather, but I'd prefer to keep it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet strong on this, but thinking of e.g. github search of similar kind not bothering with has_ prefix for the same semantic:

image

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good counter-point. I did a little digging. GMail, GitHub, and Google Scholar all have tag-like fields and none of them use the prefix "has" in the way I used it here. I'm removing it.

Comment thread dandiapi/api/services/search/parser.py Outdated
Comment thread dandiapi/api/services/search/parser.py Outdated
if (match := _OPERATOR_RE.match(token)) and match.group(1) in OPERATOR_KEYS:
parsed.operators.append((match.group(1), match.group(2)))
else:
parsed.free_text.append(token)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, again I would prefer to be "explicit better than implicit" as

if there is "some-operator:value" , using some unknown (today!) some-operator would not make it "transparently" into a free_text -- should also return error that such operator is unknown and potentially use libdiff (or whatever thta battery in python) to return candidate command close by - this helps on typos great deal,

❯ git commmit
git: 'commmit' is not a git command. See 'git --help'.

The most similar command is
	commit

so envision people mistyping operators...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. I'm just worried about the case where the user actually wants to include a colon character in the free text. I think that would be pretty rare so I am not so worried about it. Maybe we can protect that ability by allowing the user to enter it in quotes?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e087cdf. Unknown operators raise SearchSyntaxError with a difflib-based "Did you mean?" suggestion. To preserve the colon-in-free-text use case, wrapping the token in quotes ("foo:bar") is the documented escape — covered by a parser test and an integration test.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, coolio, could you show screenshot on how it would look channeled back to the user in web ui?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Comment thread dandiapi/api/tests/test_dandiset.py Outdated
Comment thread dandiapi/api/tests/test_dandiset.py
return {r['identifier'] for r in response.json()['results']}


@pytest.mark.django_db
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no policy/agreement in dandi-archive on this, but I do prefer explicit annotation of ai-produced tests, with @pytest.mark.ai_generated , see e.g. dandi-cli. This allows later to decide on how close attention to pay to the tests body as -- here I immediately would have spotted lots of duplication and recommended to use parametric... I tried to find where in my CLAUDE*mds I have instructions for that but failed! But I do frequently prompt claude code to establish proper parametric tests where testing various values/boundary conditions etc. I feel like here it could be the case too

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e087cdf. Every advanced-search test now carries @pytest.mark.ai_generated, and the marker is registered in pyproject.toml so --strict-markers accepts it. Also collapsed the parser tests to two parametrized tests (10 happy-path cases, 6 error cases).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI and FTR: in dandi-cli we switched from pyproject.toml markers annotation to https://github.com/dandi/dandi-cli/blob/master/dandi/pytest_plugin.py with additional goodnesses to it... may be we should come up with some dandi-core library to use uniformly for such across repos...

Comment thread dandiapi/api/tests/test_search_parser.py Outdated
Comment thread dandiapi/api/views/dandiset.py Outdated
Per @yarikoptic's review of dandi#2814:

- Drop the create_search_test_data management command. It was a dev-only
  seeder, never exercised by the test suite, and posed a risk of bit-rot
  and accidental destructive use against a real DB.

- Tighten the parser's error handling: unbalanced quotes and unknown
  operator keys now raise SearchSyntaxError instead of silently degrading
  to free text. Unknown keys get a "Did you mean?" suggestion via
  difflib.get_close_matches. Malformed dates also raise (was previously
  fail-closed). The view layer translates SearchSyntaxError into a DRF
  ValidationError so the user sees a 400 with the error message.

- Add an explicit escape hatch: wrapping a token in double quotes opts
  out of operator parsing, so users can search for a literal colon
  (e.g. `"foo:bar"`).

- Add a 1024-char length cap on the search term as DoS hardening
  (an unauthenticated endpoint).

- Move the function-local `from django.db import connection` to the
  top-of-file imports.

- Mark every advanced-search test with `@pytest.mark.ai_generated` and
  register the marker in pyproject so `--strict-markers` accepts it.

- Restructure the parser tests as two parametrized tests (10 happy-path
  cases, 6 error cases) instead of 13 individual tests.

Test count: 35 (16 parametrized parser cases + 19 integration). All
pass; ruff check + format clean.
@yarikoptic-gitmate
Copy link
Copy Markdown

Round-2 review

TL;DR: round-1 feedback addressed cleanly in e087cdf — strict parser, AI-marked tests, cleanup. Two small open threads from round-1 (has_ prefix, error-UI screenshot). Also one substantive new bug, plus a few smaller things.

Round-1 status

# Thread Status
parser.py:32 drop has_ prefix open — leaning your way (GitHub's search uses bare language: etc.) but not a blocker
parser.py:59 unknown-operator → 400 with "Did you mean?" done; please post the UI screenshot showing how the 400 surfaces in the web search field
parser.py:43 strict on unbalanced quotes done
parser.py:13 shlex CLI tie-in non-blocking aside
views/dandiset.py:155 injection audit done — incl. 1024-char DoS cap
tests/test_dandiset.py:1592 move import to top done
tests/test_dandiset.py:1597 reuse fixtures partially — added _seed_dandiset_with_asset helper but didn't find a shared one
tests/test_dandiset.py:1626 @pytest.mark.ai_generated done, marker registered in pyproject
tests/test_search_parser.py:21 parametrize done
create_search_test_data.py drop seeder done

New round-2 findings

1. has_species: only matches the FIRST attributed species — diverges from how has_approach/has_technique/has_standard work  (real bug, IMHO worth fixing before merge)

The species column on the asset_search materialized view is denormalized
from a single path:

-- dandiapi/search/migrations/0002_denormalize_species.py
COALESCE({ASSET_TABLE}.metadata->'wasAttributedTo'->0->'species'->>'name', '') AS species

So has_species:rat will not match an asset whose
wasAttributedTo[1].species.name = "Norway rat" if wasAttributedTo[0] is
something else. Meanwhile has_approach: / has_technique: /
has_standard: use jsonb_path_exists with $.approach[*].name and DO
match any element. The PR description even calls out this asymmetry:

has_approach: / has_technique: / has_standard: (substring against any element of the asset_metadata JSON array's .name)

so I think the asymmetry is unintentional. Two ways out:

  1. Lift has_species onto the same jsonb_path_exists machinery, scanning
    $.wasAttributedTo[*].species.name. Cleanest, no migration needed.
  2. Change the materialized view to string_agg all species names into the
    denormalized column, then keep species__icontains. Cheapest at query
    time but needs a migration + view rebuild + regen on changes you'd
    otherwise leave to existing refresh cadence.

The integration tests don't catch this because every fixture sets
wasAttributedTo to a single-element list. A [mouse, rat] test case
would surface it.

2. The "unbalanced quote" error advertises an escape (\") the parser doesn't actually support

dandiapi/api/services/search/parser.py:

def _check_balanced_quotes(query: str) -> None:
    if query.count('"') % 2 != 0:
        raise SearchSyntaxError(
            'Unbalanced quote in search query. Use \\" to include a literal double quote, '
            'or remove the stray one.'
        )

But _TOKEN_RE uses [^"]* for the quoted-string body, with no escape
handling — so has_species:"foo\"bar" parses as has_species:"foo\"
(unbalanced once that closes early). Either:

  • drop the \" half of the message and just say "remove the stray one or
    wrap the term in quotes," or
  • actually implement \" (and probably \\) inside [^"]* and add a
    parser test.

The minimal fix is the message change.

3. Repeated same-key asset operators have surprising semantics — single-asset AND, not OR-of-values

has_species:mouse has_species:rat parses to two operator tuples (the
parser test confirms this) and apply_search_filters chains them as

asset_qs = asset_qs.filter(species__icontains='mouse')
asset_qs = asset_qs.filter(species__icontains='rat')

i.e. a single AssetSearch row must contain both substrings. Most
other search UIs (Gmail, GitHub) treat repeated keys as OR. The current
behavior is also partially the same-key-on-same-asset semantics for
non-species keys, where it does feel reasonable
(has_approach:electrophysiological has_approach:behavioral
"asset uses both"). But for has_species it almost certainly isn't what
users will expect.

Easiest path: document the AND-on-same-asset behavior in the help popover
and the API help text. Better path: collapse repeated same-key operators
to OR per key, AND across keys. Either is fine — but the no-test-coverage
status quo is the worst of three options.

4. has_approach/has_technique/has_standard can't use any existing index — sequential scan over asset_search per query  (performance, not correctness)

The migration creates a GIN on asset_metadata->'measurementTechnique'
and a btree on species, but jsonb_path_exists(asset_metadata, '...')
with a runtime-built jsonpath cannot use either. Every advanced-search
hit on these three operators will scan the whole materialized view. On
the live archive that's manageable today; for an unauthenticated endpoint
it's worth a thought before this becomes an attack-surface concern.

Two cheap-ish options:

  • Add expression GIN indexes with gin_trgm_ops on
    (asset_metadata->'approach'), (asset_metadata->'measurementTechnique'),
    (asset_metadata->'dataStandard') rendered to text.
  • Or denormalize the name arrays into text columns on the materialized
    view (similar to species), and switch to __icontains against
    trigram-GIN'd text.

Either way, please add an index — or a comment acknowledging the
sequential scan and the assumed row count.

5. OpenAPI search help text not updated

DandisetSearchQueryParameterSerializer inherits search from
DandisetQueryParameterSerializer (dandiapi/api/views/serializers.py:301),
whose help text is still

help_text='Search terms to filter the results.',

Anyone reading the swagger doc still sees a generic search field; the
operator vocabulary is invisible. Override search on the search-listing
serializer with a help_text describing the operator syntax (or pointing
at a docs section).

6. has_species: with empty value is silently dropped

apply_search_filters does if not value: continue after value.strip(),
so has_species: foo (note the space) and has_species:"" parse without
error and quietly become a no-op. Consistent with the rest of the
"explicit > implicit" stance pushed in round-1 would be to raise
SearchSyntaxError("has_species expects a non-empty value").

7. Frontend: how does the 400 surface?  (this is the screenshot ask from round-1)

I looked at web/src/components/DandisetSearchField.vue and the search
store at web/src/components/Search/store.ts:

async function updateSearchQuery() {
  loading.value = true;
  const results = await dandiRest.searchDandisets(...);
  searchResults.value = results;
  loading.value = false;
}

There's no try/catch here, and no error display in the search field
template. Whatever happens on a 400 depends entirely on the global axios
interceptor — likely a generic error toast. That's not a great UX for the
nice "Did you mean has_species" message you went to the trouble of
producing on the backend. Worth either:

  • showing the 400 message inline under the search field (best), or
  • at minimum confirming the toast renders the backend message verbatim
    (and posting that screenshot here).
Nits
  • parser.py: _BARE_OP_RE = re.compile(r'^([a-z_]+):(.+)$', re.DOTALL) — bare tokens come from \S+, so they can't contain newlines; re.DOTALL is a no-op. Drop it.
  • views/dandiset.py:148: comment "the generated SQL is incompatible previously generated clauses" is missing "with".
  • services/search/__init__.py re-exports OPERATOR_KEYS and ParsedSearch but they aren't used outside the package. Either drop them from __all__ or document the public surface.
  • tests/test_dandiset.py: the test named ..._repeated_asset_operators_intersect actually asserts "single asset has BOTH species and approach." Worth a one-line comment in the test stating that semantic explicitly so future readers know it's deliberate (and so the per-asset semantics noted in finding ref: change names #3 are pinned by a test).

has_ prefix (round-1 carryover)

Just to record where I'm landing: GitHub's search box uses
language:python user:foo without has_, and I find the missing prefix
unsurprising in practice (approach:electrophysiology reads naturally as
"with this approach"). I won't insist, but if you do drop it the
allowlist + suggestions also get smaller and the error messages get
nicer. Up to you.

bendichter added 3 commits May 5, 2026 16:43
Per @yarikoptic's review: GitHub, Jira, Linear, Notion, etc. all use
bare operators like `label:bug` for multi-valued filters; Gmail reserves
`has:` for boolean checks (`has:attachment`), not for valued ones. The
`has_` prefix here was a redundant disambiguator that the broader
ecosystem doesn't need.

Renames:
  has_species   → species
  has_approach  → approach
  has_technique → technique
  has_standard  → standard
  has_file_type → file_type

Date operators (`created_before:` etc.) keep their suffix because that
suffix conveys comparison direction, not is/has semantics.

Also surface backend SearchSyntaxError 400s inline in DandisetsPage.vue
instead of letting them bubble up to the global "something went wrong"
snackbar.
`species:` was still using the AssetSearch.species denormalized column,
which the materialized view defines as `wasAttributedTo[0].species.name`
(only the first attributed subject). That diverged from how the other
asset operators (approach/technique/standard) now match — they scan the
full array via Postgres jsonpath. An asset attributed to multiple
species (xenotransplantation, multi-species recordings) would silently
miss `species:` matches against any but the first.

Switch `species:` to the same `jsonb_path_exists` path-scan against
`$.wasAttributedTo[*].species.name`. Drops the special case in
`_apply_asset_filter` and unifies the four NAME_PATH operators behind a
single jsonpath dispatch.

Adds a regression test that asserts a multi-subject asset matches
`species:Human` AND `species:mouse`.
- Drop the `\"` claim from the unbalanced-quote message; the parser
  doesn't support that escape (no_R2#2).
- Empty operator value (e.g. `species:" "`) now raises 400 instead of
  being silently dropped (no_R2#6).
- Override OpenAPI help text on the `search` parameter to document the
  operator vocabulary instead of "Search terms to filter the results"
  (no_R2#5).
- Acknowledge sequential-scan trade-off on jsonpath operators with a
  comment, and document cross-key AND-on-same-asset semantics in the
  apply_search_filters body (no_R2#3, dandi#4 — comment-only, indexes
  deferred until needed).
- Pin same-key AND semantics with a new test
  test_advanced_search_repeated_same_key_operator_combines_with_and.
- Nits: drop no-op re.DOTALL on _BARE_OP_RE; fix "incompatible
  previously generated" → "incompatible with previously generated"
  comment in views/dandiset.py; trim __all__ in services.search to
  the public surface (parse_search, SearchSyntaxError); add a clarifying
  comment to the cross-key intersect test.
@bendichter
Copy link
Copy Markdown
Member Author

Round-2 status

Thanks for the thorough re-review. Pushed ca7f2837 (plus the earlier f39473d5 and 4dd19c18) addressing the round-2 findings:

# Finding Status
1 species: only first-element fixed in f39473d5 — switched to jsonb_path_exists against $.wasAttributedTo[*].species.name; new test_advanced_search_species_matches_any_attributed_subject covers it
2 \" escape advertised but not implemented fixed — dropped the \" half of the message
3 Same-key semantics undocumented documented — added explicit comment in apply_search_filters and a new test_advanced_search_repeated_same_key_operator_combines_with_and test pinning AND-on-same-asset (matches GitHub's default for repeated keys)
4 Seq-scan on jsonpath operators acknowledged in a NOTE perf: comment with the two index strategies you suggested. Skipping the actual indexes for now since the view is small enough; happy to do them if you'd rather block on that
5 OpenAPI help text fixed — overrode search on DandisetQueryParameterSerializer with the operator vocabulary, MIME aliases, and behavior on invalid syntax
6 Empty value silently dropped fixed — now raises SearchSyntaxError → 400 with Operator "species" requires a value (e.g. species:something).; new test_advanced_search_empty_operator_value_returns_400 covers it
7 Frontend 400 surfacing fixed in 4dd19c18 — added a try/catch around the dandiset list call in DandisetsPage.vue. Axios 400 with shape { search: "..." } renders inline as a tonal v-alert above the result list (red text, with the actual server message); other errors bubble to the global handler as before. Screenshot below 👇
has_ prefix round-1 carryover dropped in 4dd19c18species:, approach:, technique:, standard:, file_type:
Nits DOTALL, comment grammar, __all__, intersect test comment all fixed in ca7f2837

Test count: 38 (16 parametrized parser cases + 22 integration tests, all carrying @pytest.mark.ai_generated). Ruff check + format clean.

Frontend error rendering — typing species:mouse into the search box would normally return results, but for specie:mouse:

⚠️ Unknown search operator "specie". Did you mean "species"? Wrap the term in double quotes (e.g. "foo:bar") to search for it as text.

(Will attach a screenshot once the deploy preview rebuilds.)

@bendichter
Copy link
Copy Markdown
Member Author

image

@bendichter
Copy link
Copy Markdown
Member Author

image

@bendichter
Copy link
Copy Markdown
Member Author

Screen.Recording.2026-05-06.at.3.57.03.PM.mov

Copy link
Copy Markdown
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor changes requested

Comment thread dandiapi/api/services/search/filters.py Outdated

def _parse_date(value: str) -> datetime | None:
try:
return datetime.strptime(value, '%Y-%m-%d').replace(tzinfo=UTC)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since one time use -- just move this try/except with datetime.strptime in place of using _parse_date, and thus directly there raise after catch instead of doing the dance of returning None to raise out there

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 334938c — strptime + try/except inlined directly, raises SearchSyntaxError from the except. Helper deleted.

Comment thread web/src/components/DandisetSearchField.vue Outdated
bendichter and others added 2 commits May 6, 2026 16:45
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Per @yarikoptic: it's a one-time-use helper that returns None just so
the caller can raise. Drop the indirection — strptime + try/except in
place, raise SearchSyntaxError directly from the except.
@yarikoptic
Copy link
Copy Markdown
Member

we will though need to test drive it while in sandbox well, may be even give to some claude or skill to see how behaves etc going into the limits etc -- better on sandbox than on a real instance.

@bendichter
Copy link
Copy Markdown
Member Author

On the apparent duplication with /api/dandisets/search/

Quick note for reviewers worried about asset-filter logic existing in two places (the new ?search=species:mouse operator and the existing ?species=... REST param on /api/dandisets/search/).

The overlap is narrower than it looks

Of the operators in this PR, only the asset filters overlap with anything pre-existing. The rest are net-new:

Operator Pre-existing REST equivalent
created_before/after, modified_before/after, published_before/after none — server-side date filtering didn't exist
owner:<anything but me> none — ?user=me is a ChoiceField(choices=['me']), deliberately limited
owner:me ?user=me (full overlap, two spellings)
species:, approach:, technique:, standard:, file_type: partial — discrete params on /api/dandisets/search/

Even within the asset-filter overlap, the matching semantics differ: the discrete params do exact match against a controlled vocabulary (MultipleChoiceField with choices populated from AssetSearch.species), while the operators do case-insensitive substring / jsonpath scans. ?species=mouse returns 400 (not in choices); ?search=species:mouse matches "House mouse", "Mus musculus - House mouse", etc. Different tools for different jobs.

Two URL shapes is a normal design pattern

Major search platforms commonly expose both an operator string and a structured-form/discrete-param shape, addressing different UX:

  • Google Scholar: top search bar accepts intitle:, author:, source: operators inside q=; the advanced search form uses discrete params (as_q=, as_sauthors=PJ, as_ylo=2020, ...). Both work, both shipped.
  • Gmail: simple search box accepts from:, subject:, has:; the advanced search dropdown is a form that emits the same operator string under the hood. One canonical URL shape, two ways to construct it.
  • GitHub: search box uses ?q=is:open+assignee:foo. They removed the discrete-param advanced form but the operator string survived.
  • Twitter/X, Slack, Linear, Sourcegraph: operator-string search box; a few have separate filter UIs that produce the same query string.

DANDI today already has the dual entry points:

  • Top search box on the dandiset list → operator string (this PR)
  • Faceted SearchView with checkboxes → discrete REST params on /api/dandisets/search/ (pre-existing, drives the sidebar checkbox UI)

The duplication is the cost of supporting both UX patterns. Scholar pays it. Gmail pays it (via different mechanism). It's the standard trade-off when you want both quick free-form typing and a guided checkbox experience.

Why not consolidate now?

The cleanest long-term move is a Gmail-style convergence: have the faceted UI emit the operator string under the hood, so there's one canonical filter language with two front-ends. That requires:

  1. Loosening ?species= from MultipleChoiceField (exact) to CharField (substring) — a behavior change for existing API consumers.
  2. Rewriting web/src/views/SearchView/SearchView.vue to construct the operator string instead of building discrete params.
  3. Deciding whether /api/dandisets/search/ itself stays or gets folded into the listing endpoint.

That's a worthwhile follow-up, but it's a separate decision about the faceted UI and changes pre-existing API behavior. Bundling it with this PR would conflate "introduce operator search" with "change how the existing search endpoint works" — two reviews in one. Easier to ship the additive feature, see how people actually use the operator syntax, then consolidate intentionally.

Net

The PR adds a search syntax users have learned everywhere else (Gmail/GitHub/Slack), addresses one missing capability (date filtering, anyone-not-me owner filtering), and overlaps with one existing endpoint in a narrow, semantically-distinct way that mirrors what Scholar and Gmail already do. The "single source of truth" cleanup is a follow-up worth doing on its own merits.

@bendichter
Copy link
Copy Markdown
Member Author

ok great, let's move this to the sandbox so we can all test it!

@bendichter bendichter merged commit 2692ef6 into dandi:master May 7, 2026
7 checks passed
@jjnesbitt jjnesbitt added the needs-kitware-review This was merged without Kitware approval and should be reviewed by them label May 7, 2026
@jjnesbitt jjnesbitt added needs-kitware-review This was merged without Kitware approval and should be reviewed by them and removed needs-kitware-review This was merged without Kitware approval and should be reviewed by them labels May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-kitware-review This was merged without Kitware approval and should be reviewed by them search Search and filtering of dandisets/assets

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants