transformers - ✅(Solved) Fix [rag-end2end-retriever] Broken Google Drive link for SQuAD dataset and hyperparameters [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44868Fetched 2026-04-08 01:03:23
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
mentioned ×1subscribed ×1

The README for the rag-end2end-retriever research project contains a broken Google Drive link to the SQuAD training dataset, knowledge base, and hyperparameters used in the experiments.

Location: https://github.com/huggingface/transformers-research-projects/tree/main/rag-end2end-retriever

Broken link text:

Training dataset, the knowledge-base, and hyperparameters used in experiments can be accessed from here.

Issue: The Google Drive folder returns "Sorry, the file you have requested does not exist" (404).

This link is essential for anyone trying to reproduce the results reported in:

Siriwardhana et al., Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering, TACL 2023. https://aclanthology.org/2023.tacl-1.1/

Could @shamanez please restore or update the link?

Thank you.

Root Cause

The README for the rag-end2end-retriever research project contains a broken Google Drive link to the SQuAD training dataset, knowledge base, and hyperparameters used in the experiments.

Location: https://github.com/huggingface/transformers-research-projects/tree/main/rag-end2end-retriever

Broken link text:

Training dataset, the knowledge-base, and hyperparameters used in experiments can be accessed from here.

Issue: The Google Drive folder returns "Sorry, the file you have requested does not exist" (404).

This link is essential for anyone trying to reproduce the results reported in:

Siriwardhana et al., Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering, TACL 2023. https://aclanthology.org/2023.tacl-1.1/

Could @shamanez please restore or update the link?

Thank you.

PR fix notes

PR #5: docs(rag-end2end-retriever): replace broken Google Drive link

Description (problem / solution / changelog)

The rag-end2end-retriever/README.md links to a Google Drive folder for the SQuAD training dataset, knowledge-base, and hyperparameters used in the original experiments. The folder no longer exists - the URL returns HTTP 404 ("Sorry, the file you have requested does not exist") - which blocks anyone trying to reproduce the results.

This was reported in huggingface/transformers#44868 (filed against the wrong repo, the README lives here).

Verified the link is broken:

$ curl -sL -o /dev/null -w "%{http_code}\n" "https://drive.google.com/drive/folders/1qyzV-PaEARWvaU_jjpnU_NUS3U_dSjtG?usp=sharing"
404

The fix replaces the dead link with an actionable note pointing readers to:

  • The existing rag-end2end-retriever/finetune_rag_ray_end2end.sh script in this directory for the hyperparameters.
  • The official SQuAD release for the dataset.

This keeps reproducibility instructions but removes the misleading URL.

Refs huggingface/transformers#44868.

This contribution was developed with AI assistance (Claude Code).

Changed files

  • rag-end2end-retriever/README.md (modified, +1/-1)
RAW_BUFFERClick to expand / collapse

Description

The README for the rag-end2end-retriever research project contains a broken Google Drive link to the SQuAD training dataset, knowledge base, and hyperparameters used in the experiments.

Location: https://github.com/huggingface/transformers-research-projects/tree/main/rag-end2end-retriever

Broken link text:

Training dataset, the knowledge-base, and hyperparameters used in experiments can be accessed from here.

Issue: The Google Drive folder returns "Sorry, the file you have requested does not exist" (404).

This link is essential for anyone trying to reproduce the results reported in:

Siriwardhana et al., Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering, TACL 2023. https://aclanthology.org/2023.tacl-1.1/

Could @shamanez please restore or update the link?

Thank you.

extent analysis

Fix Plan

To fix the broken link, we need to update the README with a new, working link to the SQuAD training dataset, knowledge base, and hyperparameters.

Steps

  • Check if the Google Drive folder still exists and is accessible.
  • If the folder exists, update the link in the README to the new sharing link.
  • If the folder does not exist, upload the necessary files to a new Google Drive folder and update the link in the README.

Example Code (Markdown)

Update the README.md file with the new link:

Training dataset, the knowledge-base, and hyperparameters used in experiments 
can be accessed from [here](NEW_LINK_TO_GOOGLE_DRIVE_FOLDER).

Replace NEW_LINK_TO_GOOGLE_DRIVE_FOLDER with the actual link to the new Google Drive folder.

Verification

  • Open the updated README.md file and click on the link to verify that it works.
  • Check that the link points to the correct Google Drive folder containing the SQuAD training dataset, knowledge base, and hyperparameters.

Extra Tips

  • Consider uploading the necessary files to a more stable storage solution, such as a GitHub repository or a cloud storage service, to avoid broken links in the future.
  • Update the link in any other relevant documentation or references to ensure consistency.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING