Skip to content

Generated GPT_OSS model files through porter script.#2384

Merged
sachinprasadhs merged 33 commits intokeras-team:masterfrom
laxmareddyp:test_gpt_oss_model
Dec 17, 2025
Merged

Generated GPT_OSS model files through porter script.#2384
sachinprasadhs merged 33 commits intokeras-team:masterfrom
laxmareddyp:test_gpt_oss_model

Conversation

@laxmareddyp
Copy link
Copy Markdown
Collaborator

@laxmareddyp laxmareddyp commented Sep 5, 2025

@divyashreepathihalli @mattdangerw @abheesht17 Could you please check and provide your feedback on the quality of this code generated through script.

I assume that 80-85% the code is matching and backbone files import successfully and it's possible to instantiate a backbone model. There still were some errors , which might be alleviated with a stronger model.

The converter and weight conversion scripts are still in development. Generating a workable solution is complex because it requires providing the model with a comprehensive understanding of the entire architectural layout to handle the intricate dependencies of the model's layers and weights.

Numeric verifications:


Screenshot 2025-12-09 at 1 00 10 AM

Matching output of both the models :

Screenshot 2025-12-09 at 1 00 45 AM

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have followed the Keras Hub Model contribution guidelines in making these changes.
  • I have followed the Keras Hub API design guidelines in making these changes.
  • I have signed the Contributor License Agreement.

@divyashreepathihalli
Copy link
Copy Markdown
Collaborator

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new GPT-OSS model, seemingly generated by a script. While the overall structure aligns with the repository's conventions, there are several critical issues that need to be addressed. The weight conversion scripts in both utils/transformers and tools/checkpoint_conversion are non-functional due to incorrect layer and weight names, likely a result of being copied from another model's implementation without proper adaptation. Additionally, there's a critical code duplication issue with the GptOssCausalLMPreprocessor class being defined in two separate files. Several style guide violations were also found, including the use of type hints in function signatures, incorrect docstring formatting, and missing docstrings or examples in several new components. These issues must be resolved to ensure the model is correct, maintainable, and usable.

@laxmareddyp
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the GPT_OSS model, including its backbone, causal language model task, preprocessor, and tokenizer. The overall structure aligns well with the project's conventions. However, there are several areas that need attention.

Most critically, the Hugging Face weight conversion script in convert_gpt_oss.py is incorrect and incompatible with the implemented model architecture, particularly for the Mixture-of-Experts (MoE) layers. This will prevent loading pre-trained weights correctly. I've provided a corrected implementation for the convert_weights function.

Additionally, there are multiple instances of incorrect docstring formatting across the new files, which violates the project's style guide. I've pointed these out with suggestions for correction. A minor but important clarification is also needed in a test file comment to improve maintainability.

@laxmareddyp
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new GptOss model, including its backbone, causal language model, preprocessor, and associated tests and conversion scripts. The code is script-generated, and this review focuses on its quality, correctness, and adherence to the project's style guide.

The model definition and tests are generally well-structured. However, there are a few issues in the GptOssBackbone implementation regarding initialization and configuration that need to be addressed. The most critical issue lies in the Hugging Face conversion script (convert_gpt_oss.py), which appears to contain significant copy-paste errors from another model's converter. This will prevent correct weight loading and needs a substantial revision. My detailed comments provide specific suggestions to fix these issues.

@mattdangerw
Copy link
Copy Markdown
Member

@laxmareddyp does it work? Looks like it probably doesn't even output valid code yet right? Going by the test output.

The overall code style looks ok, so this will come down to how accurate the code is.

I would view scripts like this as a developer tool rather than an automated workflow. So maybe the thing to do here is to try getting this code end to end working. Then you can have some first hand experience on how useful this was at saving time. If there's tons of hard to find errors in the code, this might be slower than doing it yourself. If it's pretty accurate, maybe this is saving time and worth putting forward as a tool for contributors to use.

@laxmareddyp
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new GptOss model, which is a significant contribution. The code is mostly well-structured and follows the project's conventions. However, I've found a few critical issues, primarily related to the tokenizer usage in tests and a bug in the weight conversion logic for the MoE layers. There are also some inconsistencies in the checkpoint conversion script and a minor point of confusion in a test file's comments. Addressing these points will be crucial for the model's correctness and maintainability.

@laxmareddyp laxmareddyp added the WIP Pull requests which are work in progress and not ready yet for review. label Sep 24, 2025
@laxmareddyp laxmareddyp added the kokoro:force-run Runs Tests on GPU label Nov 18, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Nov 18, 2025
@laxmareddyp laxmareddyp added the kokoro:force-run Runs Tests on GPU label Nov 22, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Nov 22, 2025
@laxmareddyp laxmareddyp added the kokoro:force-run Runs Tests on GPU label Dec 8, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Dec 8, 2025
@laxmareddyp laxmareddyp added the kokoro:force-run Runs Tests on GPU label Dec 9, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Dec 9, 2025
@laxmareddyp laxmareddyp added the kokoro:force-run Runs Tests on GPU label Dec 9, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Dec 9, 2025
Copy link
Copy Markdown
Collaborator

@sachinprasadhs sachinprasadhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, overall the code and result looks good. Suggested some minor changes on the code comments.

@laxmareddyp laxmareddyp added the kokoro:force-run Runs Tests on GPU label Dec 17, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Dec 17, 2025
Copy link
Copy Markdown
Collaborator

@sachinprasadhs sachinprasadhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks

@sachinprasadhs sachinprasadhs merged commit b238c7d into keras-team:master Dec 17, 2025
11 checks passed
Vivek1106-04 pushed a commit to Vivek1106-04/keras-hub that referenced this pull request Jan 14, 2026
* Test GPT_OSS files through porter

* generate API and moved files to respective folders

* Fix format issues

* Add gpt_oss to preset loader and Fix format issues

* Add gpt_oss to preset loader

* generated files through 2.5-pro model

* Format fix

* Add converter, RoPE update

* Fix format

* Fix BPE tests

* Update converter

* Fix converter, checkpoints conversion and attention

* Fix the parameter count and debug code

* Add dequantization logic to converter

* Add YaRN support,Fix Serialisation,Fix dequantization

* Fixed several pytest tests

* Address gpt_oss_causal_lm tests

* Fix format issues

* Address review comments

* set start token id to None to match the HF output

* Fix test cases

* Fix test

* Fix error

* Fix

* Address all  comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants