Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slows down OLAF app start time #14

Open
ryanpdx opened this issue Jan 24, 2024 · 4 comments
Open

Slows down OLAF app start time #14

ryanpdx opened this issue Jan 24, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@ryanpdx
Copy link
Member

ryanpdx commented Jan 24, 2024

On a OreSat card, loading in configs for an OreSat mission takes ~14 seconds. This is way too slow, should to be less than 100 milliseconds.

@ryanpdx ryanpdx added the enhancement New feature or request label Jan 24, 2024
@ryanpdx ryanpdx self-assigned this Feb 4, 2024
@ryanpdx ryanpdx moved this from Todo to In Progress in Space Segment Feb 4, 2024
@ThirteenFish
Copy link
Contributor

It looks like the bulk of time spent is in the OreSatConfig() constructor. A pair of patches on the fast-oresatconfigs branch has decreased construction time by ~85% when measured on the C3, from ~75s to 9s. The bulk of the slowness looks like it comes from two places: pyyaml and dataclasses_json.

I'm using two scripts for this investigation:
timing.py: Takes a list of yaml files and times how long it takes to parse them, and then also builds an OreSatConfig and times how long that takes, printing the average times of each. This makes it easy to compare how much overhead there is in OreSatConfig that isn't caused by yaml parsing.

import sys 
from timeit import timeit
from yaml import load, Loader
from oresat_configs import OreSatConfig

def read_configs():
    configs = []
    for f in sys.argv[1:]:
        configs.append(load(f, Loader=Loader))

iterations = 10
duration = timeit(read_configs, number=iterations)
print(duration/iterations)

duration = timeit(lambda: OreSatConfig("0.5"), number=iterations)
print(duration/iterations)

prof.py: Takes a profile of the construction of OreSatConfig and prints the top 30 functions by cumulative time spent.

import cProfile
from pstats import Stats
from oresat_configs import OreSatConfig

with cProfile.Profile() as pr: 
    OreSatConfig("0.5")
    pr.create_stats()
    Stats(pr).sort_stats("cumtime").print_stats(30)

Running timing.py on the branch yaml-dataclass-tests returns:

find oresat_configs/ -name *.yaml | xargs python timing.py
0.0999185250002483
73.04428008790019

The find here lists all the yaml files in oresat_configs and then xargs passes them all to timing.py on the command line. The results here show that it takes ~100ms to parse all the yaml files once but that it takes ~73 seconds to construct an OreSatConfig, implying almost no time is spent loading yaml. I'm not sure I got the loading yaml part right because that is suspiciously fast, and we'll see later that reducing it makes a sizeable difference.

Running prof.py returns:

python prof.py
         3180397 function calls (3075778 primitive calls) in 165.244 seconds

   Ordered by: cumulative time
   List reduced from 308 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001  165.760  165.760 oresat_configs/__init__.py:24(__init__)
       35    0.009    0.000  142.164    4.062 oresat_configs/card_config.py:247(from_yaml)
        1    0.017    0.017  140.583  140.583 oresat_configs/_yaml_to_od.py:550(_load_configs)
       38    0.133    0.003  104.846    2.759 yaml/__init__.py:74(load)
       38    0.003    0.000  104.555    2.751 yaml/constructor.py:47(get_single_data)
       38    0.008    0.000   99.063    2.607 yaml/composer.py:29(get_single_node)
       38    0.004    0.000   98.809    2.600 yaml/composer.py:50(compose_document)
 11421/38    2.374    0.000   98.790    2.600 yaml/composer.py:63(compose_node)
   911/68    0.942    0.001   98.374    1.447 yaml/composer.py:117(compose_mapping_node)
   629/85    0.372    0.001   97.492    1.147 yaml/composer.py:99(compose_sequence_node)
    33678    2.224    0.000   90.399    0.003 yaml/parser.py:94(check_event)
    86395    6.898    0.000   71.734    0.001 yaml/scanner.py:113(check_token)
       68    0.014    0.000   52.814    0.777 dataclasses_json/api.py:65(from_dict)
   873/68    2.170    0.002   52.801    0.776 dataclasses_json/core.py:143(_decode_dataclass)
    17068    3.556    0.000   50.690    0.003 yaml/scanner.py:156(fetch_more_tokens)
 2588/282    0.819    0.000   50.043    0.177 dataclasses_json/core.py:266(_decode_generic)
 3866/352    0.948    0.000   49.392    0.140 dataclasses_json/core.py:360(_decode_items)
6287/1057    0.167    0.000   49.185    0.047 dataclasses_json/core.py:399(<genexpr>)
 2421/705    0.287    0.000   49.151    0.070 dataclasses_json/core.py:370(_decode_item)
     4614    0.873    0.000   37.701    0.008 yaml/parser.py:446(parse_block_mapping_value)
     9762    0.935    0.000   32.000    0.003 yaml/scanner.py:668(fetch_plain)
     5525    1.033    0.000   31.760    0.006 yaml/parser.py:427(parse_block_mapping_key)
     9762    7.227    0.001   28.821    0.003 yaml/scanner.py:1270(scan_plain)
      873    2.133    0.002   22.054    0.025 python3.9/typing.py:1369(get_type_hints)
    11421    4.350    0.000   21.493    0.002 yaml/parser.py:273(parse_node)
     9228    0.633    0.000   17.760    0.002 yaml/parser.py:270(parse_block_node_or_indentless_sequence)
   140556    9.035    0.000   17.056    0.000 yaml/scanner.py:145(need_more_tokens)
     1695    0.344    0.000   15.173    0.009 yaml/parser.py:381(parse_block_sequence_entry)
        1    0.362    0.362   13.170   13.170 oresat_configs/_yaml_to_od.py:581(_gen_od_db)
    11974    4.560    0.000   12.278    0.001 yaml/scanner.py:1311(scan_plain_spaces)

Because of the overhead of the profiler it is running about twice as slow, but relative timings should be mostly preserved. This shows about 60% of the time is spent in yaml.load() (4th line in the table) and surprisingly ~30% of the time converting the returned dict into dataclass from dataclasses_json.from_dict() (13th line). That's a very slow conversion.

Instead of trying to figure out why dataclasses_json.from_dict() is so slow I ripped it out and replaced it with dacite, a library dedicated to converting dict to dataclass. Running the testing scripts again:

find oresat_configs/ -name *.yaml | xargs python timing.py
0.08758852089995343
48.91220097660007

python prof.py
         2219837 function calls (2159674 primitive calls) in 116.163 seconds

   Ordered by: cumulative time
   List reduced from 306 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001  116.227  116.227 oresat_configs/__init__.py:24(__init__)
       38    0.134    0.004  104.210    2.742 yaml/__init__.py:74(load)
       38    0.003    0.000  103.919    2.735 yaml/constructor.py:47(get_single_data)
       38    0.010    0.000   98.900    2.603 yaml/composer.py:29(get_single_node)
       38    0.004    0.000   98.644    2.596 yaml/composer.py:50(compose_document)
 11421/38    2.382    0.000   98.620    2.595 yaml/composer.py:63(compose_node)
   911/68    0.910    0.001   98.207    1.444 yaml/composer.py:117(compose_mapping_node)
   629/85    0.370    0.001   97.372    1.146 yaml/composer.py:99(compose_sequence_node)
       35    0.009    0.000   97.280    2.779 oresat_configs/card_config.py:240(from_yaml)
        1    0.015    0.015   96.204   96.204 oresat_configs/_yaml_to_od.py:552(_load_configs)
    33678    2.201    0.000   90.380    0.003 yaml/parser.py:94(check_event)
    86395    6.969    0.000   71.971    0.001 yaml/scanner.py:113(check_token)
    17068    3.272    0.000   50.728    0.003 yaml/scanner.py:156(fetch_more_tokens)
     4614    0.829    0.000   37.268    0.008 yaml/parser.py:446(parse_block_mapping_value)
     9762    0.938    0.000   32.271    0.003 yaml/scanner.py:668(fetch_plain)
     5525    0.976    0.000   32.022    0.006 yaml/parser.py:427(parse_block_mapping_key)
     9762    7.292    0.001   29.102    0.003 yaml/scanner.py:1270(scan_plain)
    11421    4.244    0.000   21.482    0.002 yaml/parser.py:273(parse_node)
     9228    0.588    0.000   17.716    0.002 yaml/parser.py:270(parse_block_node_or_indentless_sequence)
   140556    9.007    0.000   17.164    0.000 yaml/scanner.py:145(need_more_tokens)
     1695    0.292    0.000   15.266    0.009 yaml/parser.py:381(parse_block_sequence_entry)
    11974    4.544    0.000   12.206    0.001 yaml/scanner.py:1311(scan_plain_spaces)
        1    0.410    0.410   11.248   11.248 oresat_configs/_yaml_to_od.py:583(_gen_od_db)
    80360    7.579    0.000    8.671    0.000 yaml/reader.py:99(forward)
        2    0.005    0.002    7.436    3.718 oresat_configs/_yaml_to_od.py:473(_load_std_objs)
   311610    6.969    0.000    6.973    0.000 yaml/reader.py:87(peek)
    17068    2.746    0.000    6.557    0.000 yaml/scanner.py:752(scan_to_next_token)
   146298    6.126    0.000    6.126    0.000 yaml/scanner.py:279(stale_possible_simple_keys)
    24056    1.758    0.000    5.233    0.000 yaml/scanner.py:135(get_token)
       38    0.071    0.002    5.016    0.132 yaml/constructor.py:54(construct_document)

It's about 30% faster, the amount taken up by dataclasses_json.from_dict() before. The new from_dict() doesn't even show up in the profile output.

Looking at the profile output now yaml loading takes up most of the bulk of it. Specifically yaml.load() is called 38 times, but we only have at most 16 unique yaml files. It should only be called at most once per yaml file. Instead of carefully ensuring that it is done, using python's functools.cache decorator to memoize the construction of CardConfig can cheat it in. That will prevent most of the duplication. The testing scripts show that lead to a big improvement:

find oresat_configs/ -name *.yaml | xargs python timing.py
0.09680294579993642
9.058323363499948

python prof.py
         1277903 function calls (1232274 primitive calls) in 64.360 seconds

   Ordered by: cumulative time
   List reduced from 306 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001   64.392   64.392 oresat_configs/__init__.py:24(__init__)
       14    0.074    0.005   54.146    3.868 yaml/__init__.py:74(load)
       14    0.003    0.000   54.009    3.858 yaml/constructor.py:47(get_single_data)
       14    0.002    0.000   51.449    3.675 yaml/composer.py:29(get_single_node)
       14    0.002    0.000   51.360    3.669 yaml/composer.py:50(compose_document)
  6068/14    1.217    0.000   51.354    3.668 yaml/composer.py:63(compose_node)
   470/44    0.448    0.001   51.030    1.160 yaml/composer.py:117(compose_mapping_node)
   366/29    0.206    0.001   50.716    1.749 yaml/composer.py:99(compose_sequence_node)
    17934    1.180    0.000   46.946    0.003 yaml/parser.py:94(check_event)
        1    0.009    0.009   45.886   45.886 oresat_configs/_yaml_to_od.py:552(_load_configs)
       11    0.003    0.000   45.875    4.170 oresat_configs/card_config.py:241(from_yaml)
    45910    3.558    0.000   37.206    0.001 yaml/scanner.py:113(check_token)
     9130    1.793    0.000   26.144    0.003 yaml/scanner.py:156(fetch_more_tokens)
     2426    0.430    0.000   19.456    0.008 yaml/parser.py:446(parse_block_mapping_value)
     5190    0.495    0.000   16.747    0.003 yaml/scanner.py:668(fetch_plain)
     2896    0.508    0.000   16.564    0.006 yaml/parser.py:427(parse_block_mapping_key)
     5190    3.764    0.001   15.084    0.003 yaml/scanner.py:1270(scan_plain)
     6068    2.198    0.000   11.157    0.002 yaml/parser.py:273(parse_node)
        1    0.343    0.343   11.083   11.083 oresat_configs/_yaml_to_od.py:583(_gen_od_db)
     4852    0.314    0.000    9.217    0.002 yaml/parser.py:270(parse_block_node_or_indentless_sequence)
    74706    4.843    0.000    9.105    0.000 yaml/scanner.py:145(need_more_tokens)
      821    0.167    0.000    7.643    0.009 yaml/parser.py:381(parse_block_sequence_entry)
        2    0.004    0.002    7.175    3.588 oresat_configs/_yaml_to_od.py:473(_load_std_objs)
     6491    2.441    0.000    6.379    0.001 yaml/scanner.py:1311(scan_plain_spaces)
25795/669    2.496    0.000    5.164    0.008 python3.9/copy.py:128(deepcopy)
  607/241    0.174    0.000    4.866    0.020 python3.9/copy.py:258(_reconstruct)
 2249/669    0.760    0.000    4.706    0.007 python3.9/copy.py:226(_deepcopy_dict)
    41021    3.837    0.000    4.376    0.000 yaml/reader.py:99(forward)
        1    0.002    0.002    3.760    3.760 oresat_configs/_yaml_to_od.py:693(_gen_fw_base_od)
        1    0.000    0.000    3.587    3.587 oresat_configs/beacon_config.py:90(from_yaml)

yaml.load() is only called 14 times now, reducing the amount of time spent loading yaml by 50%.

Combining both results in an overall reduction by ~85%.

@ThirteenFish
Copy link
Contributor

The next low hanging fruit is that pyyaml isn't using the C bindings to libyaml on the C3. Getting it installed correctly cuts OreSatConfig init time by another 60%, from 9s to 3.3s.

I think pyyaml isn't getting the bindings by default because the aarch wheel on pipy is built without the bindings. This can be tested by seeing if the import of CLoader fails: from yaml import CLoader. This succeeds on my local x86 machine, but fails on the C3. If libyaml is preinstalled, then building pyyaml locally will generate the C bindings: pip install --no-binary pyyaml. I think the flag can be specified in requirements.txt too?

With the properly installed pyyaml using CLoader directly these are the timing and profile I got:

find oresat_configs/ -name *.yaml | xargs python timing.py
0.08745663749998585
3.2795491295000145

python prof.py 
         344227 function calls (305415 primitive calls) in 15.631 seconds

   Ordered by: cumulative time
   List reduced from 214 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001   15.659   15.659 oresat_configs/__init__.py:24(__init__)
        1    0.319    0.319    7.859    7.859 oresat_configs/_yaml_to_od.py:583(_gen_od_db)
        1    0.009    0.009    6.134    6.134 oresat_configs/_yaml_to_od.py:552(_load_configs)
       11    0.003    0.000    6.125    0.557 oresat_configs/card_config.py:241(from_yaml)
       14    0.039    0.003    5.585    0.399 yaml/__init__.py:74(load)
       14    0.001    0.000    5.543    0.396 yaml/constructor.py:47(get_single_data)
25795/669    2.479    0.000    5.110    0.008 python3.9/copy.py:128(deepcopy)
  607/241    0.154    0.000    4.779    0.020 python3.9/copy.py:258(_reconstruct)
 2249/669    0.755    0.000    4.639    0.007 python3.9/copy.py:226(_deepcopy_dict)
       14    0.031    0.002    3.004    0.215 yaml/constructor.py:54(construct_document)
       14    1.367    0.098    2.537    0.181 {method 'get_single_node' of 'yaml._yaml.CParser' objects}
  940/908    0.044    0.000    2.504    0.003 yaml/constructor.py:410(construct_yaml_map)
      470    0.059    0.000    2.445    0.005 yaml/constructor.py:215(construct_mapping)
   444/44    0.490    0.001    2.366    0.054 dacite/core.py:39(from_dict)
      470    0.287    0.001    2.235    0.005 yaml/constructor.py:132(construct_mapping)
     6068    0.784    0.000    2.025    0.000 yaml/constructor.py:67(construct_object)
 3715/209    0.394    0.000    1.968    0.009 dacite/core.py:87(_build_value)
   390/43    0.091    0.000    1.877    0.044 dacite/core.py:138(_build_value_for_collection)
 1534/327    0.049    0.000    1.777    0.005 dacite/core.py:154(<genexpr>)
        2    0.004    0.002    1.194    0.597 oresat_configs/_yaml_to_od.py:473(_load_std_objs)
33881/30793    0.628    0.000    1.119    0.000 {built-in method builtins.isinstance}
    91188    1.108    0.000    1.108    0.000 {method 'get' of 'dict' objects}
        1    0.002    0.002    0.900    0.900 oresat_configs/_yaml_to_od.py:693(_gen_fw_base_od)
     4555    0.182    0.000    0.802    0.000 yaml/constructor.py:402(construct_yaml_str)
       16    0.008    0.000    0.760    0.048 oresat_configs/_yaml_to_od.py:456(_add_all_rpdo_data)
       66    0.216    0.003    0.759    0.011 oresat_configs/_yaml_to_od.py:314(_add_rpdo_data)
     6068    0.544    0.000    0.751    0.000 yaml/resolver.py:143(resolve)
     5232    0.465    0.000    0.738    0.000 constructor.py:173(construct_scalar)
        1    0.000    0.000    0.705    0.705 oresat_configs/beacon_config.py:90(from_yaml)
     8206    0.441    0.000    0.655    0.000 canopen/objectdictionary/__init__.py:104(__getitem__)

yaml.load() is still prominent, but it only takes up about 1/3 of the total time. Constructing dataclasses from dicts creeps back in, taking up about 15%. The new interesting thing is that now the deepcopy()s that are being done takes up another 1/3 of the time. At the very bottom a canopen method starts to sneak in too.

@ryanpdx ryanpdx removed their assignment Feb 24, 2024
@ryanpdx ryanpdx moved this from In Progress to Todo in Space Segment Feb 24, 2024
@ryanpdx ryanpdx moved this from Todo to In Progress in Space Segment Jun 2, 2024
@ThirteenFish
Copy link
Contributor

The final change I had wanted to try was custom yaml tags (types) to directly go from yaml to the final python type without intermediate conversion. That yielded notable improvements but the downside was non-standard yaml and it was still kinda slow because of the parsing.

What if we just didn't do the parsing? At runtime at least. Like fw-files, what if we had a yaml -> python generator (py-files?) that ran while building the oresat-configs package and then users import the OD already written in python? The contents of _yaml_to_od.py and pyyaml wouldn't have to run at application startup.

@ryanpdx
Copy link
Member Author

ryanpdx commented Oct 27, 2024

Yea, we could go the code generation route. That would help a lot.

I was thing of caching the ODs as JSONs (where the JSON have all the synchronization/generated stuff from _yaml_to_od.py already done, so the contents of JSON are just strait dumped of the OD and non-c3 node could just load their JSON OD and no other config). It would keep the oresat-config package roughly the same size as it is currently. And it could always to the current method as a backup, if the cache was corrupted somehow.

Either way would be a great improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Implementing
Development

No branches or pull requests

2 participants