pytorch - ✅(Solved) Fix Increase number of shards for TraceType (and VarType?) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178666Fetched 2026-04-08 01:40:24
View on GitHub
Comments
1
Participants
2
Timeline
40
Reactions
0
Author
Participants
Timeline (top)
subscribed ×19labeled ×8mentioned ×8unlabeled ×2

PR fix notes

PR #178894: [autograd] Increase VariableType and TraceType num_shards from 5 to 10

Description (problem / solution / changelog)

Stack from ghstack (oldest at bottom):

  • -> #178894

These generated files have grown large enough to cause slow compilation. Doubling the shard count reduces per-file size and improves parallel build times.

Fixes #178666.

🤖 Generated with Claude Code

Changed files

  • buckbuild.bzl (modified, +5/-0)
  • build.bzl (modified, +10/-0)
  • build_variables.bzl (modified, +10/-0)
  • caffe2/CMakeLists.txt (modified, +10/-0)
  • pt_template_srcs.bzl (modified, +10/-0)
  • tools/autograd/gen_trace_type.py (modified, +1/-1)
  • tools/autograd/gen_variable_type.py (modified, +1/-1)
RAW_BUFFERClick to expand / collapse

Recently, fresh compiles of individual TraceType files create OOMs on CI runners. This can be remedied by (repeated) reruns that fill the sccache, but this is both wasteful and annoying. A simple remedy should be increasing the number of shards in order to reduce the size of those files to counteract the increase of file size since the number of shards was decided back in August 2021.

September 2021 (2,140 native functions) vs Now (2,681 native functions), both with 5 shards:

FileLines (2021)Size (2021)Lines (now)Size (now)Growth
TraceType_0.cpp11,608500K19,349847K+69%
TraceType_1.cpp11,984514K18,664820K+60%
TraceType_2.cpp12,889554K20,104884K+60%
TraceType_3.cpp13,042557K17,068745K+34%
TraceType_4.cpp10,422450K15,213662K+47%
Total59,9452.6M90,3983.9M+51%

The shards have grown ~51% in lines and ~50% in file size since the shard count was set in August 2021. The largest shard (TraceType_2) went from 554K to 884K. The hash-based bucketing also shows increasing imbalance — the largest shard is now 33% bigger than the smallest, up from 23% in 2021.

cc @malfet @seemethere @pytorch/pytorch-dev-infra @ezyang @bhosmer @bdhirsh @kadeng @bobrenjc93 @atalman

extent analysis

Fix Plan

To address the issue of increasing file size and OOMs on CI runners, we will increase the number of shards.

Steps to Increase Shards

  • Update the shard count configuration to a higher number (e.g., 10 shards)
  • Recompile the TraceType files with the new shard count
  • Monitor the file sizes and CI runner performance to determine if further adjustments are needed

Example Code Changes

// Update the shard count in the configuration file
const int NUM_SHARDS = 10;

// Recompile the TraceType files with the new shard count
for (int i = 0; i < NUM_SHARDS; i++) {
    // Compile TraceType_i.cpp
}

Verification

  • Verify that the file sizes have decreased and are more balanced across shards
  • Monitor CI runner performance to ensure that OOMs are no longer occurring

Extra Tips

  • Continuously monitor file sizes and adjust the shard count as needed to prevent future OOMs
  • Consider implementing automated scripts to periodically review and adjust the shard count based on file size growth.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix Increase number of shards for TraceType (and VarType?) [1 pull requests, 1 comments, 2 participants]