There may be an issue affecting the serialize_model methods of the Pydantic models in this library.
Taking the DocumentContent model as an example, we see:
class DocumentContent(BaseModel):
full_text_list: Annotated[
Optional[List[str]], pydantic.Field(alias="fullTextList")
] = None
r"""The plaintext content of the document."""
@model_serializer(mode="wrap")
def serialize_model(self, handler):
optional_fields = set(["fullTextList"])
serialized = handler(self)
m = {}
for n, f in type(self).model_fields.items():
k = f.alias or n
val = serialized.get(k)
if val != UNSET_SENTINEL:
if val is not None or k not in optional_fields:
m[k] = val
return m
This model uses a field alias that, when constructing the Pydantic object from an API response, will map the fullTextList field of the JSON object to the full_text_list field of the Pydantic object.
However, the model serializer uses:
...
k = f.alias or n
val = serialized.get(k)
...
which means that the field alias (fullTextList) will be used to extract the value rather than the Pydantic field name. This results in value being None and in missing keys in the returned dictionary m when the field name and its alias are different.
To support this claim, please find attached a documents.json file that contains an anonymized response collected from the Glean API (/rest/api/v1/getdocuments endpoint).
And below is a simple debug.py script to run alongside it:
import pathlib
from glean.api_client import models
from glean.api_client.utils.unmarshal_json_response import unmarshal_json_response
class DummyHttpResponse:
def __init__(self, text):
self.status_code = 200
self.text = text
with pathlib.Path("documents.json").open("r") as f:
http_res = DummyHttpResponse(
text=f.read(),
)
documents_response = unmarshal_json_response(models.GetDocumentsResponse, http_res)
assert isinstance(documents_response, models.GetDocumentsResponse)
assert documents_response.documents is not None
assert isinstance(
documents_response.documents["https://company.com/Test"].content,
models.DocumentContent,
)
assert (
documents_response.documents["https://company.com/Test"].content.full_text_list[0]
== "This is a test document."
)
serialized_document_response = documents_response.model_dump()
assert isinstance(serialized_document_response, dict)
assert serialized_document_response["documents"] is not None
# Here's the problem: no `full_text_list` or `fullTextList` in the serialized response!
assert (
len(serialized_document_response["documents"]["https://company.com/Test"]["content"])
> 0
)
Running it yields:
$ ls
debug.py documents.json
$ python debug.py
Traceback (most recent call last):
File "/workspace/app/debug/debug.py", line 40, in <module>
len(serialized_document_response["documents"]["https://company.com/Test"]["content"])
> 0
AssertionError
Note that I'm using:
pydantic_core==2.41.5
pydantic==2.12.5
glean-api-client==0.11.27
There may be an issue affecting the
serialize_modelmethods of the Pydantic models in this library.Taking the
DocumentContentmodel as an example, we see:This model uses a field alias that, when constructing the Pydantic object from an API response, will map the
fullTextListfield of the JSON object to thefull_text_listfield of the Pydantic object.However, the model serializer uses:
which means that the field alias (
fullTextList) will be used to extract the value rather than the Pydantic field name. This results invaluebeingNoneand in missing keys in the returned dictionarymwhen the field name and its alias are different.To support this claim, please find attached a
documents.jsonfile that contains an anonymized response collected from the Glean API (/rest/api/v1/getdocumentsendpoint).And below is a simple
debug.pyscript to run alongside it:Running it yields:
Note that I'm using: