Skip to content

Python Analyzers

The PythonAstAnalyzer can be used to analyze Python source code files. An lxml.etree.Element representation of each file's abstract syntax tree (AST) is passed to its FeatureFinders for analysis:

codesurvey.analyzers.python.PythonAstAnalyzer

Bases: FileAnalyzer[Element]

Analyzer that finds .py files and parses them into lxml documents representing Python abstract syntax trees for feature analysis.

Source code in codesurvey/analyzers/python/core.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
class PythonAstAnalyzer(FileAnalyzer[Element]):
    """Analyzer that finds .py files and parses them into lxml documents
    representing Python abstract syntax trees for feature analysis."""
    default_name = 'python'
    default_file_glob = '**/*.py'
    default_file_filters = [
        py_site_packages_filter,
    ]
    """Excludes files under a `site-packages` directory that are unlikely
    to belong to the Repo under analysis."""

    def _subprocess_parse_ast(self, *, queue: Queue, file_text: str):
        try:
            file_tree = ast.parse(file_text)
        except Exception as ex:
            queue.put(ex)
        else:
            queue.put(file_tree)

    def prepare_file(self, file_info: FileInfo) -> Optional[Element]:
        with open(file_info.abs_path, 'r') as f:
            file_text = f.read()

        # Parse ast in a subprocess, as sufficiently complex files can
        # crash the interpreter:
        # https://docs.python.org/3/library/ast.html#ast.parse
        queue: Queue = Queue()
        process = Process(target=self._subprocess_parse_ast, kwargs=dict(
            queue=queue,
            file_text=file_text,
        ))
        try:
            process.start()
            result = queue.get()
            process.join()
            if isinstance(result, Exception):
                raise result
        except (SyntaxError, ValueError) as ex:
            logger.error((f'Skipping Python file "{file_info.rel_path}" in '
                          f'repo "{file_info.repo}" that could not be parsed: {ex}'))
            return None

        file_tree = result
        file_xml = astpath.convert_to_xml(file_tree)
        return file_xml

default_file_glob = '**/*.py' class-attribute instance-attribute

default_file_filters = [py_site_packages_filter] class-attribute instance-attribute

Excludes files under a site-packages directory that are unlikely to belong to the Repo under analysis.

__init__(feature_finders: Sequence[FeatureFinder], *, file_glob: Optional[str] = None, file_filters: Optional[Sequence[Callable[[FileInfo], bool]]] = None, name: Optional[str] = None)

Parameters:

  • feature_finders (Sequence[FeatureFinder]) –

    The FeatureFinders for analyzing each source-code file.

  • file_glob (Optional[str], default: None ) –

    Glob pattern for finding source-code files within the Repo.

  • file_filters (Optional[Sequence[Callable[[FileInfo], bool]]], default: None ) –

    Filters to identify files to exclude from analysis. Each filter is a function that takes a FileInfo and returns True if the file should be excluded. file_filters cannot be lambdas, as they need to be pickled when passed to sub-processes.

  • name (Optional[str], default: None ) –

    Name to identify the Analyzer. If None, defaults to the Analyzer type's default_name.

Source code in codesurvey/analyzers/core.py
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
def __init__(self, feature_finders: Sequence[FeatureFinder], *,
             file_glob: Optional[str] = None,
             file_filters: Optional[Sequence[Callable[[FileInfo], bool]]] = None,
             name: Optional[str] = None):
    """
    Args:
        feature_finders:
            The [FeatureFinders][codesurvey.analyzers.FeatureFinder]
            for analyzing each source-code file.
        file_glob: Glob pattern for finding source-code files within
            the Repo.
        file_filters: Filters to identify files to exclude from analysis.
            Each filter is a function that takes a
            [`FileInfo`][codesurvey.analyzers.FileInfo] and
            returns `True` if the file should be excluded. file_filters
            cannot be lambdas, as they need to be pickled when passed to
            sub-processes.
        name: Name to identify the Analyzer. If `None`, defaults to the
            Analyzer type's default_name.

    """
    super().__init__(feature_finders=feature_finders, name=name)
    self.file_glob = self.default_file_glob if file_glob is None else file_glob
    self.file_filters = self.default_file_filters if file_filters is None else file_filters

test(code_snippet: str, *, test_filename: str = 'test_file.txt') -> Dict[str, Feature]

Utility for directly analyzing a string of source-code.

A Repo will be created in a temporary directory to perform analysis of a file created with the given code_snippet.

Parameters:

  • code_snippet (str) –

    String of source-code to analyze.

  • test_filename (str, default: 'test_file.txt' ) –

    Optional custom filename used for the test file.

Returns:

  • Dict[str, Feature]

    A dictionary mapping feature names to Feature results.

Source code in codesurvey/analyzers/core.py
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
def test(self, code_snippet: str, *, test_filename: str = 'test_file.txt') -> Dict[str, Feature]:
    """Utility for directly analyzing a string of source-code.

    A Repo will be created in a temporary directory to perform
    analysis of a file created with the given `code_snippet`.

    Args:
        code_snippet: String of source-code to analyze.
        test_filename: Optional custom filename used for the test file.

    Returns:
        A dictionary mapping feature names to
            [`Feature`][codesurvey.analyzers.Feature] results.

    """
    source = TestSource({test_filename: code_snippet})
    repo = next(source.repo_generator())
    code = self.analyze_code(
        repo=repo,
        code_key=test_filename,
        features=self.get_feature_names(),
    )
    return code.features

Built-In Feature Finders

CodeSurvey comes equipped with the following FeatureFinders that can be used with PythonAstAnalyzer:

has_for_else

FeatureFinder for else clauses in for loops.

Source code in codesurvey/analyzers/python/features.py
129
130
131
132
133
134
@py_ast_feature_finder_with_transform('for_else', xpath='For/orelse')
def has_for_else(orelse_el):
    """FeatureFinder for else clauses in for loops."""
    if len(orelse_el) == 0:
        return None
    return orelse_el

has_try_finally

FeatureFinder for finally clauses in try statements.

Source code in codesurvey/analyzers/python/features.py
138
139
140
141
142
143
@py_ast_feature_finder_with_transform('try_finally', xpath='Try/finalbody')
def has_try_finally(finalbody_el):
    """FeatureFinder for finally clauses in try statements."""
    if len(finalbody_el) == 0:
        return None
    return finalbody_el

has_type_hint

FeatureFinder for type hints.

Source code in codesurvey/analyzers/python/features.py
147
148
149
150
151
152
@py_ast_feature_finder_with_transform('type_hint', xpath='FunctionDef/args/arguments//annotation')
def has_type_hint(annotation_el):
    """FeatureFinder for type hints."""
    if annotation_el.get('*') is not None:
        return annotation_el
    return None

has_set_function

FeatureFinder for the set function.

Source code in codesurvey/analyzers/python/features.py
156
157
158
159
160
161
@py_ast_feature_finder_with_transform('set_function', xpath='Call/func/Name')
def has_set_function(func_name_el):
    """FeatureFinder for the set function."""
    if func_name_el.get('id') == 'set':
        return func_name_el
    return None

has_set_value = py_ast_feature_finder('set_value', xpath='Set') module-attribute

FeatureFinder for set literals.

has_set = union_feature_finder('set', [has_set_function, has_set_value]) module-attribute

FeatureFinder for sets.

has_fstring = py_ast_feature_finder('fstring', xpath='FormattedValue') module-attribute

FeatureFinder for f-strings.

has_ternary = py_ast_feature_finder('ternary', xpath='IfExp') module-attribute

FeatureFinder for ternary expressions.

has_pattern_matching = py_ast_feature_finder('pattern_matching', xpath='Match') module-attribute

FeatureFinder for pattern matching.

has_walrus = py_ast_feature_finder('walrus', xpath='NamedExpr') module-attribute

FeatureFinder for the walrus operator.

Custom Python Feature Finders

The following utilities can be used to define simple FeatureFinders that can be used with PythonAstAnalyzer to analyze Python abstract syntax trees:

codesurvey.analyzers.python.py_ast_feature_finder(name: str, *, xpath: str) -> FeatureFinder[lxml.etree.Element]

Defines a FeatureFinder that looks for elements in a Python AST matching the given xpath query.

To explore the AST structure of the code constructs you are interested in identifying, consider using a tool like: https://python-ast-explorer.com/

Source code in codesurvey/analyzers/python/features.py
36
37
38
39
40
41
42
43
44
45
def py_ast_feature_finder(name: str, *, xpath: str) -> FeatureFinder[lxml.etree.Element]:
    """Defines a FeatureFinder that looks for elements in a Python AST
    matching the given xpath query.

    To explore the AST structure of the code constructs you are
    interested in identifying, consider using a tool like:
    https://python-ast-explorer.com/

    """
    return partial_feature_finder(name, _py_ast_feature_finder, xpath=xpath)

codesurvey.analyzers.python.py_ast_feature_finder_with_transform(name: str, *, xpath: str) -> Callable[[ElementTransform], FeatureFinder[lxml.etree.Element]]

Decorator for defining a FeatureFinder that looks for elements in a Python AST matching the given xpath query, transforming found elements with decorated function.

The function should receive and return an lxml.etree.Element, or return None if the element should not be considered an occurrence of the feature.

Example usage to look for function calls where the function name is 'set':

@py_ast_feature_finder_with_transform('set_function', xpath='Call/func/Name')
def has_set_function(func_name_el):
    if func_name_el.get('id') == 'set':
        return func_name_el
    return None

To explore the AST structure of the code constructs you are interested in identifying, consider using a tool like: https://python-ast-explorer.com/

Source code in codesurvey/analyzers/python/features.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
def py_ast_feature_finder_with_transform(name: str, *, xpath: str) -> Callable[[ElementTransform], FeatureFinder[lxml.etree.Element]]:
    """Decorator for defining a FeatureFinder that looks for elements in a
    Python AST matching the given xpath query, transforming found elements
    with decorated function.

    The function should receive and return an `lxml.etree.Element`, or
    return `None` if the element should not be considered an
    occurrence of the feature.

    Example usage to look for function calls where the function name
    is 'set':

    ```python
    @py_ast_feature_finder_with_transform('set_function', xpath='Call/func/Name')
    def has_set_function(func_name_el):
        if func_name_el.get('id') == 'set':
            return func_name_el
        return None
    ```

    To explore the AST structure of the code constructs you are
    interested in identifying, consider using a tool like:
    https://python-ast-explorer.com/

    """

    def decorator(func: ElementTransform) -> FeatureFinder[lxml.etree.Element]:

        @wraps(func)
        @feature_finder(name)
        def decorated(xml):
            return _py_ast_feature_finder(xml, xpath=xpath, transform=func)

        return decorated

    return decorator

codesurvey.analyzers.python.py_module_feature_finder(name: str, *, modules: Sequence[str]) -> FeatureFinder

Defines a FeatureFinder that looks for import statements of one or more target Python modules.

Example usage:

has_dataclasses = py_module_feature_finder('dataclasses_module', modules=['dataclasses'])
Source code in codesurvey/analyzers/python/features.py
112
113
114
115
116
117
118
119
120
121
122
123
def py_module_feature_finder(name: str, *, modules: Sequence[str]) -> FeatureFinder:
    """Defines a FeatureFinder that looks for import statements of one or
    more target Python `modules`.

    Example usage:

    ```python
    has_dataclasses = py_module_feature_finder('dataclasses_module', modules=['dataclasses'])
    ```

    """
    return partial_feature_finder(name, _py_module_feature_finder, modules=modules)