Task #1263
closedrepository.py needs some refactoring
0%
Description
The main problem is poor performance of the get_hook_file function due to redundant calls to glob with similar pattern. This could be fixed by doing 2 calls to glob in the 2 concerned directories and doing in memory pattern search to find the required filenames. Some work is required for the optimization to be complete as the function calls itself recursively. More generally, the whole module is getting obfuscated and would require simplification. The get_hook_file function is an example. In my test case, is it is called 13 time from an external loop, contains 2 imbricated loops, and calls itself recursively. As a consquence it tooks 0.7sec to retrieve ~20 filenames from 1 directory. save_all_string take 1.3s to complete and most of the time is spent in globing instead of reading files. The limiting point should be the file reading. Repository.py is 369 lines long and could probably be reduced to <200, maybe less, while being faster and easier to read. Another example is that the code reading is made in the mother class, preventing any retrieving from other means than local files. My suggestion is to think again the structure and do the following separation:
- Begin with API functions. A set limited to what is strictly needed: get_code, get_doc, get_directive, get_deps and get_hook is probably all we need. Implemented once and for all in the mother class and working on dictionnaries.
- Driver function in the mother class call by init with the main loop that fills in the dictionnaries.
- utilitary functions that do in memory pattern search with the pipelet name convention in the mother class, on file lists exposed by the specialized class
- utilitary functions in the specialized class that do the globbing (or code retrieving ...)
- utilitary function in the specialized class that implement the file reading.