pytorch - ✅(Solved) Fix [Stable C Shim] Ability to retrieve error messages [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#179427Fetched 2026-04-08 02:51:46
View on GitHub
Comments
1
Participants
2
Timeline
67
Reactions
1
Author
Participants
Timeline (top)
mentioned ×29subscribed ×29labeled ×6cross-referenced ×2

Error Message

/// Retrieve a pointer to the string that holds the most recent exception's message. /// This pointer is a borrowed pointer and is invalidated when the next exception occurs. AOTI_TORCH_EXPORT const char* aoti_torch_exception_get_what();

/// Retrieve a pointer to the string that holds the most recent exception's message and backtrace. /// This pointer is a borrowed pointer and is invalidated when the next exception occurs. /// This may be the same as the less detailed aoti_torch_exception_get_what() in case more information /// is not available. AOTI_TORCH_EXPORT const char* aoti_torch_exception_get_what_with_backtrace();

Root Cause

The default callback can perform the current behaviour of logging directly to stderr. But because the callback would be called from a catch block, we don't actually gain much from having the callback (the stack doesn't provide information all the way to where the error occurred), which is what I was hoping to get from this approach. It's very helpful for debugging to get the correct stack to the source of the error, but that only works if the callback would be triggered from the source of the error, not from the catch as we would have in our situation.

PR fix notes

PR #180135: [Stable C Shim] Shim functions to retrieve error message

Description (problem / solution / changelog)

Issue

Fixes https://github.com/pytorch/pytorch/issues/179427 filing as a draft PR for discussion. Fyi @janeyx99

Summary

This PR adds two new C shim functions;

  • const char* torch_exception_get_what()
  • const char* torch_exception_get_what_with_backtrace()

These functions can be used to retrieve information about the c10::Error's caught by the AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE macro. A new macro STABLE_TORCH_ERROR_CHECK is introduced in the stable/macros.h file that uses this to provide an error with information about what went wrong. Tests are added to test_libtorch_agnostic.py for these.

Checklist

  • Passes lint (spin fixlint)
  • Added/updated tests
  • Updated documentation (if applicable)
  • Included benchmark results (for PRs impacting perf)

Benchmark with this code; https://gist.github.com/iwanders/8d35de7aed67ae50767b838625b94902

Master (61fdec7ddb5d6bb5745681f2898ebe3a6e671711) 
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_WithoutException       85.5 ns         85.5 ns      8203711
BM_Exception            983010 ns       982994 ns          711

This PR (6df5de7b1eea204812f58721fea1c41cce6ac95c);
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_WithoutException       85.8 ns         85.8 ns      8102151
BM_Exception            980399 ns       980392 ns          713

BC-breaking?

Not strictly breaking, but a change in behaviour; the macro AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE no longer prints its backtrace.

Changed files

  • test/cpp_extensions/libtorch_agn_2_12_extension/csrc/my_exception_what.cpp (added, +45/-0)
  • test/cpp_extensions/test_libtorch_agnostic.py (modified, +52/-0)
  • torch/csrc/inductor/aoti_torch/utils.h (modified, +27/-10)
  • torch/csrc/shim_common.cpp (modified, +25/-0)
  • torch/csrc/shim_common.h (added, +13/-0)
  • torch/csrc/stable/c/shim.h (modified, +25/-0)
  • torch/csrc/stable/c/shim_function_versions.txt (modified, +6/-0)
  • torch/csrc/stable/macros.h (modified, +23/-2)

PR #180404: [Stable C shim] Ensure shim usage linter can handle return types of multiple words.

Description (problem / solution / changelog)

Issue

Relates to https://github.com/pytorch/pytorch/issues/179427 , bugfix for the stable shim usage issue encountered in https://github.com/pytorch/pytorch/pull/180135#discussion_r3068591046 suggested to file a separate PR by @janeyx99.

Summary

The current stable_shim_usage test doesn't correctly match functions that have a return type consisting of two words, like const char*, this PR adds a failing test to stable_shim_usage_linter_data and modifies the regular expression to make it pass.

Checklist

  • Passes lint (spin fixlint)
  • Added/updated tests
  • Updated documentation (if applicable)
  • Included benchmark results (for PRs impacting perf) Na; test only, trivial change.

BC-breaking?

Nope, test only.

Changed files

  • tools/linter/adapters/stable_shim_usage_linter.py (modified, +1/-1)
  • tools/test/stable_shim_usage_linter_data/sample_shim.h (modified, +7/-0)
  • tools/test/test_stable_shim_usage_linter.py (modified, +2/-0)

Code Example

/// Retrieve a pointer to the string that holds the most recent exception's message.
/// This pointer is a borrowed pointer and is invalidated when the next exception occurs.
AOTI_TORCH_EXPORT const char* aoti_torch_exception_get_what();

/// Retrieve a pointer to the string that holds the most recent exception's message and backtrace.
/// This pointer is a borrowed pointer and is invalidated when the next exception occurs.
/// This may be the same as the less detailed aoti_torch_exception_get_what() in case more information
/// is not available.
AOTI_TORCH_EXPORT const char* aoti_torch_exception_get_what_with_backtrace();
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

When using the C shim functions provided by the Stable API it is currently impossible to get information about why a particular function returned AOTI_TORCH_FAILURE. A backtrace is printed to stderr, but there's no way to retrieve this programatically. This prevents consumers from providing detailed error messages.

See also https://github.com/pytorch/pytorch/issues/174507#issuecomment-4150977835 where I originally raised this, and @janeyx99 's comment below it that it could be improved.

Alternatives

In my original comment, I stated I expected something like aoti_torch_recent_error_in_thread(char*, size_t len); write the most recent error message to a (provided) buffer. I've since done some exploration & prototyping:

Currently the exception is caught here and logged to stderr. That appears to be an internal header and modifying it to also contain an catch (const c10::Error& e) clause provides us with easy access to the useful information. I think the methods what and what_without_backtrace are the most relevant ones. The what_without_backtrace() value would be what I'd consider the useful error message, while what() with the full backtrace may be good to keep as the in-depth information that helps developers figure out what actually happened on the other side of the C shims.

I originally explored something akin to how XSetErrorHandler works in X11, so something where you can register an error handler and that error handler is called with an handle from which information can be retrieved. Something like this set of functions.

The default callback can perform the current behaviour of logging directly to stderr. But because the callback would be called from a catch block, we don't actually gain much from having the callback (the stack doesn't provide information all the way to where the error occurred), which is what I was hoping to get from this approach. It's very helpful for debugging to get the correct stack to the source of the error, but that only works if the callback would be triggered from the source of the error, not from the catch as we would have in our situation.

This is also a lot of extra additions & complexity, for arguably minimal gains for the stable API interface. So while a good first exploration, I'm not sold on it.

As an alternative, I explored (in 6206ee142bc4ccb4749791646ce05d2bc368a8b5), we could simplify this all to just two functions:

/// Retrieve a pointer to the string that holds the most recent exception's message.
/// This pointer is a borrowed pointer and is invalidated when the next exception occurs.
AOTI_TORCH_EXPORT const char* aoti_torch_exception_get_what();

/// Retrieve a pointer to the string that holds the most recent exception's message and backtrace.
/// This pointer is a borrowed pointer and is invalidated when the next exception occurs.
/// This may be the same as the less detailed aoti_torch_exception_get_what() in case more information
/// is not available.
AOTI_TORCH_EXPORT const char* aoti_torch_exception_get_what_with_backtrace();

To make this work, we create two new thread_local std::string in csrs/shim_common.cpp like this here and modify the try-catch block to populate those. It's simple, avoids overcomplicating and could be extended with additional methods that allow retrieving other aspects of the exception in the future.

I'm not sure yet how we'd use it on the c++ side though; the TORCH_ERROR_CODE_CHECK macro is defined in the 'headeronly' headers, and we'd have to do a forward declaration on aoti_torch_exception_get_what() if we wanted to use that, but then we can hardly call the header headeronly. So in this branch I added TORCH_ERROR_CODE_CHECK_DETAILED to torch/csrc/stable/macros.h instead...

The TORCH_ERROR_CODE macro is used in many places, avoiding changing its behaviour is probably desired, so perhaps we need to do something with a global / thread local / environment flag to toggle the printing behaviour of the backtrace. Not sure what's best here.

There may also be ways to convey the information through the C shims I haven't thought of, obtaining the information seems pretty straightforward because of c10::Error, main choices are how it is exposed through the stable API and how consume it.

Additional context

No response

cc @chauhang @penguinwu @avikchaudhuri @zhxchen17 @tugsbayasgalan @angelayi @suo @ydwu4 @desertfire @yushangdi @benjaminglass1 @jataylo @iupaikov-amd

extent analysis

TL;DR

To address the issue of retrieving error information when a function returns AOTI_TORCH_FAILURE, consider adding two new functions, aoti_torch_exception_get_what() and aoti_torch_exception_get_what_with_backtrace(), to the Stable API.

Guidance

  1. Simplify error handling: Introduce two new functions, aoti_torch_exception_get_what() and aoti_torch_exception_get_what_with_backtrace(), to retrieve the most recent exception's message and backtrace, respectively.
  2. Modify try-catch block: Update the try-catch block to populate the thread_local std::string variables with the exception information.
  3. Use thread-local storage: Utilize thread-local storage to store the exception information, ensuring that each thread has access to its own exception data.
  4. Introduce a new macro: Consider introducing a new macro, such as TORCH_ERROR_CODE_CHECK_DETAILED, to handle the detailed error checking and provide a way to toggle the printing behavior of the backtrace.

Example

AOTI_TORCH_EXPORT const char* aoti_torch_exception_get_what();
AOTI_TORCH_EXPORT const char* aoti_torch_exception_get_what_with_backtrace();

Notes

The proposed solution aims to provide a simple and effective way to retrieve error information without overcomplicating the Stable API. However, the implementation details, such as how to use the new functions on the C++ side and how to handle the printing behavior of the backtrace, require further consideration.

Recommendation

Apply the workaround by introducing the two new functions, aoti_torch_exception_get_what() and aoti_torch_exception_get_what_with_backtrace(), to provide a way to retrieve error information when a function returns AOTI_TORCH_FAILURE. This approach offers a straightforward solution to the problem while avoiding unnecessary complexity.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix [Stable C Shim] Ability to retrieve error messages [2 pull requests, 1 comments, 2 participants]