dify - 💡(How to fix) Fix Allow Knowledge (dataset) API keys to be scoped to a single knowledge base

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
RAW_BUFFERClick to expand / collapse

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

Knowledge (dataset) API keys are scoped to the whole workspace (tenant), not to an individual knowledge base. One key can read/write every knowledge base in the workspace. When creating a dataset API key (POST /console/api/datasets/api-keys), only api_token.tenant_id is set — there is no dataset binding. Authorization (validate_dataset_token in api/controllers/service_api/wraps.py) then authorizes purely on Dataset.tenant_id == api_token.tenant_id. So any dataset key reaches any knowledge base in the tenant, and there is no way to issue a key limited to a single KB. This is a data-isolation / least-privilege gap: a customer who wants to give an integration access to ONE knowledge base is forced to grant it access to ALL of them. What I'd like: at key-creation time, choose a scope — • "Workspace" (default, current behavior), or • "This knowledge base only" (binds the key to a single dataset_id). A KB-bound key should be rejected (403) when used against any other knowledge base.

2. Additional context or comments

  • Backward compatible: existing keys would have NULL dataset_id => workspace-scoped, behavior unchanged.
  • Per-KB scoping actually used to exist: the api_tokens.dataset_id column was dropped in migration 2e9819ca5b28_add_tenant_id_in_api_token (2023). This proposes reintroducing it as an opt-in.
  • There is also leftover per-dataset console scaffolding in api/controllers/console/apikey.py (DatasetApiKeyListResource / DatasetApiKeyResource) that references a non-existent ApiToken.dataset_id and currently 500s; reviving and fixing it would be part of this work. Rough implementation sketch:
  • Data model: nullable, indexed dataset_id on api_tokens (NULL = workspace-scoped).
  • API: scope selector on the create endpoint; consolidate onto one controller.
  • Auth: in validate_dataset_token, if api_token.dataset_id is set and the requested dataset differs, raise Forbidden; NULL keeps the existing tenant-only check.
  • Cache: add dataset_id to CachedApiToken (de)serialization in api/services/api_token_service.py.
  • Frontend: pass datasetId into the API Keys modal, add a scope selector on create, and show a "Scope" column (bound KB name or "All knowledge bases").
  • Migration: new Alembic migration adding the nullable column + index; no backfill needed. Open question for maintainers: how should list-all endpoints (no dataset_id in the path, e.g. listing datasets) behave for a KB-bound key — return only its KB, or 403?

3. Can you help us with this feature?

  • I am interested in contributing to this feature.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING