Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONDecodeError when using a small background list #300

Open
wongkarenhy-hex opened this issue Feb 6, 2025 · 13 comments
Open

JSONDecodeError when using a small background list #300

wongkarenhy-hex opened this issue Feb 6, 2025 · 13 comments

Comments

@wongkarenhy-hex
Copy link

Setup

I am reporting a problem with GSEApy version, Python version, and operating
system as follows:

>>> import sys; print(sys.version)
3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
>>> import platform; print(platform.python_implementation()); print(platform.platform())
CPython
macOS-14.2.1-arm64-arm-64bit
>>> import gseapy; print(gseapy.__version__)
1.1.5

Expected behaviour

enr_bg = gp.enrichr(gene_list=gene_list,
                 gene_sets=['MSigDB_Hallmark_2020','KEGG_2021_Human'],
                 # organism='human', # organism argment is ignored because user input a background
                 background="tests/data/background.txt",
                 outdir=None, # don't write to disk
                )

The above is directly copied from the gseapy documentation using the same gene_list and background as provided. However, it raises an error when I switched to gene_set=['MGI_Mammalian_Phenotype_2017'] (see below).

Actual behaviour

JSONDecodeError                           Traceback (most recent call last)
File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/requests/models.py:963, in Response.json(self, **kwargs)
    962 try:
--> 963     return complexjson.loads(self.content.decode(encoding), **kwargs)
    964 except UnicodeDecodeError:
    965     # Wrong UTF codec detected; usually because it's not UTF-8
    966     # but some other 8-bit codec.  This is an RFC violation,
    967     # and the server didn't bother to tell us what codec *was*
    968     # used.

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/simplejson/__init__.py:514, in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, allow_nan, **kw)
    510 if (cls is None and encoding is None and object_hook is None and
    511         parse_int is None and parse_float is None and
    512         parse_constant is None and object_pairs_hook is None
    513         and not use_decimal and not allow_nan and not kw):
--> 514     return _default_decoder.decode(s)
    515 if cls is None:

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/simplejson/decoder.py:386, in JSONDecoder.decode(self, s, _w, _PY3)
    385     s = str(s, self.encoding)
--> 386 obj, end = self.raw_decode(s)
    387 end = _w(s, end).end()

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/simplejson/decoder.py:416, in JSONDecoder.raw_decode(self, s, idx, _w, _PY3)
    415         idx += 3
--> 416 return self.scan_once(s, idx=_w(s, idx).end())

JSONDecodeError: Expecting value: line 1 column 10266 (char 10265)

During handling of the above exception, another exception occurred:

JSONDecodeError                           Traceback (most recent call last)
Cell In[7], line 1
----> 1 enr_bg = gp.enrichr(gene_list=gene_list,
      2                  gene_sets=['MGI_Mammalian_Phenotype_2017'],
      3                  # organism='human', # organism argment is ignored because user input a background
      4                  background="tests/data/background.txt",
      5                  outdir=None, # don't write to disk
      6                 )

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/gseapy/__init__.py:554, in enrichr(gene_list, gene_sets, organism, outdir, background, cutoff, format, figsize, top_term, no_plot, verbose)
    552 # set organism
    553 enr.set_organism()
--> 554 enr.run()
    556 return enr

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/gseapy/enrichr.py:652, in Enrichr.run(self)
    650 # whether user input background
    651 if isinstance(bg, set) and len(bg) > 0:
--> 652     shortID, res = self.get_results_with_background(genes_list, bg)
    653 else:
    654     shortID, res = self.get_results(genes_list)

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/gseapy/enrichr.py:297, in Enrichr.get_results_with_background(self, gene_list, background)
    293     self._logger.error("Error fetching enrichment results: %s" % self._gs)
    295 # print(response.text[5700:5900])
--> 297 data = response.json()
    298 # Note: missig Overlap column
    299 colnames = [
    300     "Rank",
    301     "Term",
   (...)
    308     "Old adjusted P-value",
    309 ]

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/requests/models.py:971, in Response.json(self, **kwargs)
    969             pass
    970         except JSONDecodeError as e:
--> 971             raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
    973 try:
    974     return complexjson.loads(self.text, **kwargs)

JSONDecodeError: Expecting value: line 1 column 10266 (char 10265)

Steps to reproduce

Just switch out the gene_sets param from the above example and it should hit the JSONDecodeError.

Looking at the response.text object, some results apparently look like this:

[35,"MP:0008729 decreased memory B cell number",3.392253592257852E-5, Infinity, Infinity, ["PTPRC","JAK3","TLR4"],0.0024278843567445483, 0, 0 ] 

I think the two Infinity values are the offending ones here. I think they represent the odds ratio and combined scores here? Presumably, this happens because the background gene list does not contain any of the genes in the given gene set.

This becomes a much bigger problem when analyzing gene sets from proteomics experiments, which typically detect far fewer than 10k genes.

Thanks!

@zqfang
Copy link
Owner

zqfang commented Feb 6, 2025

I can't reproduce the bug on my end. Can you re-run the code and try again ?

I use macOS M3 chip

@wongkarenhy-hex
Copy link
Author

I'm still able to reproduce the same error when I swap the gene set to MGI_Mammalian_Phenotype_2017:

enr_bg = gp.enrichr(gene_list=gene_list,
                 gene_sets=['MGI_Mammalian_Phenotype_2017'],
                 # organism='human', # organism argment is ignored because user input a background
                 background="tests/data/background.txt",
                 outdir=None, # don't write to disk
                )

Another way to reproduce a similar JSONDecodeError is be to replace background with the original gene_list:

enr_bg = gp.enrichr(gene_list=gene_list,
                 gene_sets=['MGI_Mammalian_Phenotype_2017'],
                 # organism='human', # organism argment is ignored because user input a background
                 background="tests/data/gene_list.txt",
                 outdir=None, # don't write to disk
                )

You should run into the same mathematical error Infinity using the above test case.

FWIW I'm on Apple M3 Max chip.

Thanks!

@zqfang
Copy link
Owner

zqfang commented Feb 6, 2025

It's weird. I use my test dataset, it works even I try 5 times

Image

@wongkarenhy-hex
Copy link
Author

Huh did you find the following record from your output? And I suppose the json module in python doesn't support decoding infinity

35,"MP:0008729 decreased memory B cell number",3.392253592257852E-5, Infinity, Infinity, ["PTPRC","JAK3","TLR4"],0.0024278843567445483, 0, 0 ]

@zqfang
Copy link
Owner

zqfang commented Feb 6, 2025

is your requests outdated?

Image

@wongkarenhy-hex
Copy link
Author

We are on the same version. How about simplejson?

(hx) (base) karenwong@Karen-Wong-Macbook hx % pixi list | grep requests          
requests                              2.32.3          pyhd8ed1ab_1              57.3 KiB   conda  requests
(hx) (base) karenwong@Karen-Wong-Macbook hx % pixi list | grep simplejson
simplejson                            3.19.3          py311h460d6c5_1           129.8 KiB  conda  simplejson

@zqfang
Copy link
Owner

zqfang commented Feb 7, 2025

I don't have simplejson installed. So the bug comes from simplejson

Image

@wongkarenhy-hex
Copy link
Author

wongkarenhy-hex commented Feb 7, 2025

Thanks for checking! Would you be able to support simplejson as well?

According to this, allow_nan defaults to False...

@zqfang
Copy link
Owner

zqfang commented Feb 7, 2025

simplejson is an optional dependency for requests. I think you can submit an issue for simplejson/requests team to fix this

@wongkarenhy-hex
Copy link
Author

Thanks for the quick reply!

The simplejson module recently updated the default value of allow_nan from True to False in its latest version. Supporting NaN, Infinity, and -Infinity is actually outside the JSON spec, so they may have decided to change the default behavior to align with that.

Looking at the documentation for both simplejson and the built-in json module, we can simply add allow_nan=True to this line of your code to ensure compatibility with both versions. I've tested it on different systems, both with and without simplejson and it works well.

@zqfang
Copy link
Owner

zqfang commented Feb 9, 2025

thanks. but allow_nan break my codebase. I revert it back to default

TypeError: JSONDecoder.__init__() got an unexpected keyword argument 'allow_nan'

zqfang pushed a commit that referenced this issue Feb 9, 2025
@wongkarenhy-hex
Copy link
Author

Does the following work?

if 'simplejson' in requests.compat.json.__name__:
    data = response.json(allow_nan=True)
else:
    data = response.json()

@zqfang
Copy link
Owner

zqfang commented Feb 10, 2025

I think the better solution is this:

data = json.loads(response.content)

I prefer to use the build-in library instead of testing the new simplejson as a denpendency

zqfang pushed a commit that referenced this issue Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants