pytorch - 💡(How to fix) Fix ONNX export mismatch for `avg_pool2d` with `ceil_mode=True` and `count_include_pad=True` [1 participants]

pytorch2026-05-13 08:29:34

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#183528•Fetched 2026-05-14 03:28:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

tinywisdom

Participants

tinywisdom

Timeline (top)

labeled ×2mentioned ×2subscribed ×2added_to_project_v2 ×1

I found a mismatch between PyTorch eager execution and the ONNX model exported by torch.onnx.export for torch.nn.functional.avg_pool2d when both ceil_mode=True and count_include_pad=True are used.

The issue appears at the right/bottom boundary windows introduced by ceil_mode=True.

For the following input:

x = torch.arange(1, 17, dtype=torch.float32).reshape(1, 1, 4, 4)

and operation:

F.avg_pool2d(
    x,
    kernel_size=3,
    stride=2,
    padding=1,
    ceil_mode=True,
    count_include_pad=True,
)

PyTorch eager returns:

[1.5555556, 3.3333333, 2.0,
 6.3333335, 11.0, 6.0,
 4.5, 7.5, 4.0]

The exported ONNX model, when run with ONNXRuntime, returns:

[1.5555556, 3.3333333, 1.3333334,
 6.3333335, 11.0, 4.0,
 3.0, 5.0, 1.7777778]

The ceil_mode=False control case matches between PyTorch and ONNXRuntime.

Root Cause

The issue appears at the right/bottom boundary windows introduced by ceil_mode=True.

For the following input:

x = torch.arange(1, 17, dtype=torch.float32).reshape(1, 1, 4, 4)

and operation:

F.avg_pool2d(
    x,
    kernel_size=3,
    stride=2,
    padding=1,
    ceil_mode=True,
    count_include_pad=True,
)

PyTorch eager returns:

[1.5555556, 3.3333333, 2.0,
 6.3333335, 11.0, 6.0,
 4.5, 7.5, 4.0]

The exported ONNX model, when run with ONNXRuntime, returns:

[1.5555556, 3.3333333, 1.3333334,
 6.3333335, 11.0, 4.0,
 3.0, 5.0, 1.7777778]

The ceil_mode=False control case matches between PyTorch and ONNXRuntime.

Code Example

x = torch.arange(1, 17, dtype=torch.float32).reshape(1, 1, 4, 4)

---

F.avg_pool2d(
    x,
    kernel_size=3,
    stride=2,
    padding=1,
    ceil_mode=True,
    count_include_pad=True,
)

---

[1.5555556, 3.3333333, 2.0,
 6.3333335, 11.0, 6.0,
 4.5, 7.5, 4.0]

---

[1.5555556, 3.3333333, 1.3333334,
 6.3333335, 11.0, 4.0,
 3.0, 5.0, 1.7777778]

---

import json
import tempfile

import numpy as np
import torch
import torch.nn.functional as F
import onnx
import onnxruntime as ort
from onnx import helper


OPSET = 18


class AvgPoolModel(torch.nn.Module):
    def __init__(self, ceil_mode: bool):
        super().__init__()
        self.ceil_mode = ceil_mode

    def forward(self, x):
        return F.avg_pool2d(
            x,
            kernel_size=3,
            stride=2,
            padding=1,
            ceil_mode=self.ceil_mode,
            count_include_pad=True,
        )


def get_averagepool_attrs(model):
    out = []
    for node in model.graph.node:
        if node.op_type == "AveragePool":
            attrs = {}
            for a in node.attribute:
                v = helper.get_attribute_value(a)
                if isinstance(v, bytes):
                    v = v.decode("utf-8")
                attrs[a.name] = v
            out.append(attrs)
    return out


def run_ort(model_path, x):
    sess = ort.InferenceSession(
        model_path,
        providers=["CPUExecutionProvider"],
    )
    return sess.run(None, {sess.get_inputs()[0].name: x.detach().cpu().numpy()})[0]


def export_and_run(ceil_mode, x):
    model = AvgPoolModel(ceil_mode=ceil_mode).eval()

    with torch.no_grad():
        torch_out = model(x).detach().cpu().numpy()

    with tempfile.TemporaryDirectory() as tmp:
        onnx_path = f"{tmp}/model.onnx"

        torch.onnx.export(
            model,
            (x,),
            onnx_path,
            opset_version=OPSET,
            input_names=["input"],
            output_names=["output"],
            do_constant_folding=True,
        )

        exported = onnx.load(onnx_path)
        onnx.checker.check_model(exported)
        attrs = get_averagepool_attrs(exported)

        ort_out = run_ort(onnx_path, x)

    return torch_out, ort_out, attrs


def main():
    x = torch.arange(1, 17, dtype=torch.float32).reshape(1, 1, 4, 4)

    for ceil_mode in [False, True]:
        torch_out, ort_out, attrs = export_and_run(ceil_mode, x)

        print(f"\nceil_mode={ceil_mode}")
        print("Exported AveragePool attrs:")
        print(json.dumps(attrs, indent=2, sort_keys=True))
        print("PyTorch:", torch_out.reshape(-1).tolist())
        print("ORT:", ort_out.reshape(-1).tolist())
        print("allclose:", np.allclose(torch_out, ort_out))
        print("max_abs_diff:", float(np.max(np.abs(torch_out - ort_out))))


if __name__ == "__main__":
    main()

---

PyTorch:
[1.5555556, 3.3333333, 6.3333335, 11.0]

ONNXRuntime:
[1.5555556, 3.3333333, 6.3333335, 11.0]

allclose: true
max_abs_diff: 0.0

---

PyTorch:
[1.5555556, 3.3333333, 2.0,
 6.3333335, 11.0, 6.0,
 4.5, 7.5, 4.0]

ONNXRuntime:
[1.5555556, 3.3333333, 1.3333334,
 6.3333335, 11.0, 4.0,
 3.0, 5.0, 1.7777778]

max_abs_diff: 2.5

---

{
  "auto_pad": "NOTSET",
  "ceil_mode": 1,
  "count_include_pad": 1,
  "kernel_shape": [3, 3],
  "pads": [1, 1, 1, 1],
  "strides": [2, 2]
}

---

PyTorch version:  2.11.0
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True

ONNX: 1.19.1
ONNX opset: 18
ONNXRuntime: 1.23.2
ONNXRuntime providers: AzureExecutionProvider, CPUExecutionProvider

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Summary

The issue appears at the right/bottom boundary windows introduced by ceil_mode=True.

For the following input:

x = torch.arange(1, 17, dtype=torch.float32).reshape(1, 1, 4, 4)

and operation:

F.avg_pool2d(
    x,
    kernel_size=3,
    stride=2,
    padding=1,
    ceil_mode=True,
    count_include_pad=True,
)

PyTorch eager returns:

[1.5555556, 3.3333333, 2.0,
 6.3333335, 11.0, 6.0,
 4.5, 7.5, 4.0]

The exported ONNX model, when run with ONNXRuntime, returns:

[1.5555556, 3.3333333, 1.3333334,
 6.3333335, 11.0, 4.0,
 3.0, 5.0, 1.7777778]

The ceil_mode=False control case matches between PyTorch and ONNXRuntime.

Minimal reproducer

import json
import tempfile

import numpy as np
import torch
import torch.nn.functional as F
import onnx
import onnxruntime as ort
from onnx import helper


OPSET = 18


class AvgPoolModel(torch.nn.Module):
    def __init__(self, ceil_mode: bool):
        super().__init__()
        self.ceil_mode = ceil_mode

    def forward(self, x):
        return F.avg_pool2d(
            x,
            kernel_size=3,
            stride=2,
            padding=1,
            ceil_mode=self.ceil_mode,
            count_include_pad=True,
        )


def get_averagepool_attrs(model):
    out = []
    for node in model.graph.node:
        if node.op_type == "AveragePool":
            attrs = {}
            for a in node.attribute:
                v = helper.get_attribute_value(a)
                if isinstance(v, bytes):
                    v = v.decode("utf-8")
                attrs[a.name] = v
            out.append(attrs)
    return out


def run_ort(model_path, x):
    sess = ort.InferenceSession(
        model_path,
        providers=["CPUExecutionProvider"],
    )
    return sess.run(None, {sess.get_inputs()[0].name: x.detach().cpu().numpy()})[0]


def export_and_run(ceil_mode, x):
    model = AvgPoolModel(ceil_mode=ceil_mode).eval()

    with torch.no_grad():
        torch_out = model(x).detach().cpu().numpy()

    with tempfile.TemporaryDirectory() as tmp:
        onnx_path = f"{tmp}/model.onnx"

        torch.onnx.export(
            model,
            (x,),
            onnx_path,
            opset_version=OPSET,
            input_names=["input"],
            output_names=["output"],
            do_constant_folding=True,
        )

        exported = onnx.load(onnx_path)
        onnx.checker.check_model(exported)
        attrs = get_averagepool_attrs(exported)

        ort_out = run_ort(onnx_path, x)

    return torch_out, ort_out, attrs


def main():
    x = torch.arange(1, 17, dtype=torch.float32).reshape(1, 1, 4, 4)

    for ceil_mode in [False, True]:
        torch_out, ort_out, attrs = export_and_run(ceil_mode, x)

        print(f"\nceil_mode={ceil_mode}")
        print("Exported AveragePool attrs:")
        print(json.dumps(attrs, indent=2, sort_keys=True))
        print("PyTorch:", torch_out.reshape(-1).tolist())
        print("ORT:", ort_out.reshape(-1).tolist())
        print("allclose:", np.allclose(torch_out, ort_out))
        print("max_abs_diff:", float(np.max(np.abs(torch_out - ort_out))))


if __name__ == "__main__":
    main()

Actual behavior

For ceil_mode=False, PyTorch and ONNXRuntime match:

PyTorch:
[1.5555556, 3.3333333, 6.3333335, 11.0]

ONNXRuntime:
[1.5555556, 3.3333333, 6.3333335, 11.0]

allclose: true
max_abs_diff: 0.0

For ceil_mode=True, PyTorch and ONNXRuntime differ:

PyTorch:
[1.5555556, 3.3333333, 2.0,
 6.3333335, 11.0, 6.0,
 4.5, 7.5, 4.0]

ONNXRuntime:
[1.5555556, 3.3333333, 1.3333334,
 6.3333335, 11.0, 4.0,
 3.0, 5.0, 1.7777778]

max_abs_diff: 2.5

The exported ONNX node is:

{
  "auto_pad": "NOTSET",
  "ceil_mode": 1,
  "count_include_pad": 1,
  "kernel_shape": [3, 3],
  "pads": [1, 1, 1, 1],
  "strides": [2, 2]
}

Versions

PyTorch version:  2.11.0
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True

ONNX: 1.19.1
ONNX opset: 18
ONNXRuntime: 1.23.2
ONNXRuntime providers: AzureExecutionProvider, CPUExecutionProvider

cc @justinchuby @titaiwangms

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#runtime error #dependency conflict #environment setup #docker error #permission error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix ONNX export mismatch for `avg_pool2d` with `ceil_mode=True` and `count_include_pad=True` [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

🐛 Describe the bug

Summary

Minimal reproducer

Actual behavior

Versions

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix ONNX export mismatch for `avg_pool2d` with `ceil_mode=True` and `count_include_pad=True` [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

🐛 Describe the bug

Summary

Minimal reproducer

Actual behavior

Versions

Still need to ship something?

RELATED_DISCOVERY

TRENDING