Generated GPT_OSS model files through porter script.#2384
Generated GPT_OSS model files through porter script.#2384sachinprasadhs merged 33 commits intokeras-team:masterfrom
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new GPT-OSS model, seemingly generated by a script. While the overall structure aligns with the repository's conventions, there are several critical issues that need to be addressed. The weight conversion scripts in both utils/transformers and tools/checkpoint_conversion are non-functional due to incorrect layer and weight names, likely a result of being copied from another model's implementation without proper adaptation. Additionally, there's a critical code duplication issue with the GptOssCausalLMPreprocessor class being defined in two separate files. Several style guide violations were also found, including the use of type hints in function signatures, incorrect docstring formatting, and missing docstrings or examples in several new components. These issues must be resolved to ensure the model is correct, maintainable, and usable.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the GPT_OSS model, including its backbone, causal language model task, preprocessor, and tokenizer. The overall structure aligns well with the project's conventions. However, there are several areas that need attention.
Most critically, the Hugging Face weight conversion script in convert_gpt_oss.py is incorrect and incompatible with the implemented model architecture, particularly for the Mixture-of-Experts (MoE) layers. This will prevent loading pre-trained weights correctly. I've provided a corrected implementation for the convert_weights function.
Additionally, there are multiple instances of incorrect docstring formatting across the new files, which violates the project's style guide. I've pointed these out with suggestions for correction. A minor but important clarification is also needed in a test file comment to improve maintainability.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new GptOss model, including its backbone, causal language model, preprocessor, and associated tests and conversion scripts. The code is script-generated, and this review focuses on its quality, correctness, and adherence to the project's style guide.
The model definition and tests are generally well-structured. However, there are a few issues in the GptOssBackbone implementation regarding initialization and configuration that need to be addressed. The most critical issue lies in the Hugging Face conversion script (convert_gpt_oss.py), which appears to contain significant copy-paste errors from another model's converter. This will prevent correct weight loading and needs a substantial revision. My detailed comments provide specific suggestions to fix these issues.
|
@laxmareddyp does it work? Looks like it probably doesn't even output valid code yet right? Going by the test output. The overall code style looks ok, so this will come down to how accurate the code is. I would view scripts like this as a developer tool rather than an automated workflow. So maybe the thing to do here is to try getting this code end to end working. Then you can have some first hand experience on how useful this was at saving time. If there's tons of hard to find errors in the code, this might be slower than doing it yourself. If it's pretty accurate, maybe this is saving time and worth putting forward as a tool for contributors to use. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new GptOss model, which is a significant contribution. The code is mostly well-structured and follows the project's conventions. However, I've found a few critical issues, primarily related to the tokenizer usage in tests and a bug in the weight conversion logic for the MoE layers. There are also some inconsistencies in the checkpoint conversion script and a minor point of confusion in a test file's comments. Addressing these points will be crucial for the model's correctness and maintainability.
keras_hub/src/models/gpt_oss/gpt_oss_causal_lm_preprocessor_test.py
Outdated
Show resolved
Hide resolved
sachinprasadhs
left a comment
There was a problem hiding this comment.
Thanks, overall the code and result looks good. Suggested some minor changes on the code comments.
* Test GPT_OSS files through porter * generate API and moved files to respective folders * Fix format issues * Add gpt_oss to preset loader and Fix format issues * Add gpt_oss to preset loader * generated files through 2.5-pro model * Format fix * Add converter, RoPE update * Fix format * Fix BPE tests * Update converter * Fix converter, checkpoints conversion and attention * Fix the parameter count and debug code * Add dequantization logic to converter * Add YaRN support,Fix Serialisation,Fix dequantization * Fixed several pytest tests * Address gpt_oss_causal_lm tests * Fix format issues * Address review comments * set start token id to None to match the HF output * Fix test cases * Fix test * Fix error * Fix * Address all comments
@divyashreepathihalli @mattdangerw @abheesht17 Could you please check and provide your feedback on the quality of this code generated through script.
I assume that 80-85% the code is matching and backbone files import successfully and it's possible to instantiate a backbone model. There still were some errors , which might be alleviated with a stronger model.
The converter and weight conversion scripts are still in development. Generating a workable solution is complex because it requires providing the model with a comprehensive understanding of the entire architectural layout to handle the intricate dependencies of the model's layers and weights.
Numeric verifications:
Matching output of both the models :
Checklist