Python Analyzers

The PythonAstAnalyzer can be used to analyze Python source code files. An lxml.etree.Element representation of each file's abstract syntax tree (AST) is passed to its FeatureFinders for analysis:

`codesurvey.analyzers.python.PythonAstAnalyzer`

Bases: FileAnalyzer[Element]

Analyzer that finds .py files and parses them into lxml documents representing Python abstract syntax trees for feature analysis.

Source code in codesurvey/analyzers/python/core.py

class PythonAstAnalyzer(FileAnalyzer[Element]):
    """Analyzer that finds .py files and parses them into lxml documents
    representing Python abstract syntax trees for feature analysis."""
    default_name = 'python'
    default_file_glob = '**/*.py'
    default_file_filters = [
        py_site_packages_filter,
    ]
    """Excludes files under a `site-packages` directory that are unlikely
    to belong to the Repo under analysis."""

    def _subprocess_parse_ast(self, *, queue: Queue, file_text: str):
        try:
            file_tree = ast.parse(file_text)
        except Exception as ex:
            queue.put(ex)
        else:
            queue.put(file_tree)

    def prepare_file(self, file_info: FileInfo) -> Optional[Element]:
        with open(file_info.abs_path, 'r') as f:
            file_text = f.read()

        # Parse ast in a subprocess, as sufficiently complex files can
        # crash the interpreter:
        # https://docs.python.org/3/library/ast.html#ast.parse
        queue: Queue = Queue()
        process = Process(target=self._subprocess_parse_ast, kwargs=dict(
            queue=queue,
            file_text=file_text,
        ))
        try:
            process.start()
            result = queue.get()
            process.join()
            if isinstance(result, Exception):
                raise result
        except (SyntaxError, ValueError) as ex:
            logger.error((f'Skipping Python file "{file_info.rel_path}" in '
                          f'repo "{file_info.repo}" that could not be parsed: {ex}'))
            return None

        file_tree = result
        file_xml = astpath.convert_to_xml(file_tree)
        return file_xml

`default_file_glob = '**/*.py'` `class-attribute` `instance-attribute`

`default_file_filters = [py_site_packages_filter]` `class-attribute` `instance-attribute`

Excludes files under a site-packages directory that are unlikely to belong to the Repo under analysis.

`init(feature_finders: Sequence[FeatureFinder], *, file_glob: Optional[str] = None, file_filters: Optional[Sequence[Callable[[FileInfo], bool]]] = None, name: Optional[str] = None)`

Parameters:

feature_finders (Sequence[FeatureFinder]) –

The FeatureFinders for analyzing each source-code file.
file_glob (Optional[str], default: None ) –

Glob pattern for finding source-code files within the Repo.
file_filters (Optional[Sequence[Callable[[FileInfo], bool]]], default: None ) –

Filters to identify files to exclude from analysis. Each filter is a function that takes a FileInfo and returns True if the file should be excluded. file_filters cannot be lambdas, as they need to be pickled when passed to sub-processes.
name (Optional[str], default: None ) –

Name to identify the Analyzer. If None, defaults to the Analyzer type's default_name.

Source code in codesurvey/analyzers/core.py

def __init__(self, feature_finders: Sequence[FeatureFinder], *,
             file_glob: Optional[str] = None,
             file_filters: Optional[Sequence[Callable[[FileInfo], bool]]] = None,
             name: Optional[str] = None):
    """
    Args:
        feature_finders:
            The [FeatureFinders][codesurvey.analyzers.FeatureFinder]
            for analyzing each source-code file.
        file_glob: Glob pattern for finding source-code files within
            the Repo.
        file_filters: Filters to identify files to exclude from analysis.
            Each filter is a function that takes a
            [`FileInfo`][codesurvey.analyzers.FileInfo] and
            returns `True` if the file should be excluded. file_filters
            cannot be lambdas, as they need to be pickled when passed to
            sub-processes.
        name: Name to identify the Analyzer. If `None`, defaults to the
            Analyzer type's default_name.

    """
    super().__init__(feature_finders=feature_finders, name=name)
    self.file_glob = self.default_file_glob if file_glob is None else file_glob
    self.file_filters = self.default_file_filters if file_filters is None else file_filters

`test(code_snippet: str, *, test_filename: str = 'test_file.txt') -> Dict[str, Feature]`

Utility for directly analyzing a string of source-code.

A Repo will be created in a temporary directory to perform analysis of a file created with the given code_snippet.

Parameters:

code_snippet (str) –

String of source-code to analyze.
test_filename (str, default: 'test_file.txt' ) –

Optional custom filename used for the test file.

Returns:

Dict[str, Feature] –

A dictionary mapping feature names to Feature results.

Source code in codesurvey/analyzers/core.py

def test(self, code_snippet: str, *, test_filename: str = 'test_file.txt') -> Dict[str, Feature]:
    """Utility for directly analyzing a string of source-code.

    A Repo will be created in a temporary directory to perform
    analysis of a file created with the given `code_snippet`.

    Args:
        code_snippet: String of source-code to analyze.
        test_filename: Optional custom filename used for the test file.

    Returns:
        A dictionary mapping feature names to
            [`Feature`][codesurvey.analyzers.Feature] results.

    """
    source = TestSource({test_filename: code_snippet})
    repo = next(source.repo_generator())
    code = self.analyze_code(
        repo=repo,
        code_key=test_filename,
        features=self.get_feature_names(),
    )
    return code.features

Built-In Feature Finders

CodeSurvey comes equipped with the following FeatureFinders that can be used with PythonAstAnalyzer:

`has_for_else`

FeatureFinder for else clauses in for loops.

Source code in codesurvey/analyzers/python/features.py

@py_ast_feature_finder_with_transform('for_else', xpath='For/orelse')
def has_for_else(orelse_el):
    """FeatureFinder for else clauses in for loops."""
    if len(orelse_el) == 0:
        return None
    return orelse_el

`has_try_finally`

FeatureFinder for finally clauses in try statements.

Source code in codesurvey/analyzers/python/features.py

@py_ast_feature_finder_with_transform('try_finally', xpath='Try/finalbody')
def has_try_finally(finalbody_el):
    """FeatureFinder for finally clauses in try statements."""
    if len(finalbody_el) == 0:
        return None
    return finalbody_el

`has_type_hint`

FeatureFinder for type hints.

Source code in codesurvey/analyzers/python/features.py

@py_ast_feature_finder_with_transform('type_hint', xpath='FunctionDef/args/arguments//annotation')
def has_type_hint(annotation_el):
    """FeatureFinder for type hints."""
    if annotation_el.get('*') is not None:
        return annotation_el
    return None

`has_set_function`

FeatureFinder for the set function.

Source code in codesurvey/analyzers/python/features.py

@py_ast_feature_finder_with_transform('set_function', xpath='Call/func/Name')
def has_set_function(func_name_el):
    """FeatureFinder for the set function."""
    if func_name_el.get('id') == 'set':
        return func_name_el
    return None

`has_set_value = py_ast_feature_finder('set_value', xpath='Set')` `module-attribute`

FeatureFinder for set literals.

`has_set = union_feature_finder('set', [has_set_function, has_set_value])` `module-attribute`

FeatureFinder for sets.

`has_fstring = py_ast_feature_finder('fstring', xpath='FormattedValue')` `module-attribute`

FeatureFinder for f-strings.

`has_ternary = py_ast_feature_finder('ternary', xpath='IfExp')` `module-attribute`

FeatureFinder for ternary expressions.

`has_pattern_matching = py_ast_feature_finder('pattern_matching', xpath='Match')` `module-attribute`

FeatureFinder for pattern matching.

`has_walrus = py_ast_feature_finder('walrus', xpath='NamedExpr')` `module-attribute`

FeatureFinder for the walrus operator.

Custom Python Feature Finders

The following utilities can be used to define simple FeatureFinders that can be used with PythonAstAnalyzer to analyze Python abstract syntax trees:

`codesurvey.analyzers.python.py_ast_feature_finder(name: str, *, xpath: str) -> FeatureFinder[lxml.etree.Element]`

Defines a FeatureFinder that looks for elements in a Python AST matching the given xpath query.

To explore the AST structure of the code constructs you are interested in identifying, consider using a tool like: https://python-ast-explorer.com/

Source code in codesurvey/analyzers/python/features.py

def py_ast_feature_finder(name: str, *, xpath: str) -> FeatureFinder[lxml.etree.Element]:
    """Defines a FeatureFinder that looks for elements in a Python AST
    matching the given xpath query.

    To explore the AST structure of the code constructs you are
    interested in identifying, consider using a tool like:
    https://python-ast-explorer.com/

    """
    return partial_feature_finder(name, _py_ast_feature_finder, xpath=xpath)

`codesurvey.analyzers.python.py_ast_feature_finder_with_transform(name: str, *, xpath: str) -> Callable[[ElementTransform], FeatureFinder[lxml.etree.Element]]`

Decorator for defining a FeatureFinder that looks for elements in a Python AST matching the given xpath query, transforming found elements with decorated function.

The function should receive and return an lxml.etree.Element, or return None if the element should not be considered an occurrence of the feature.

Example usage to look for function calls where the function name is 'set':

@py_ast_feature_finder_with_transform('set_function', xpath='Call/func/Name')
def has_set_function(func_name_el):
    if func_name_el.get('id') == 'set':
        return func_name_el
    return None

To explore the AST structure of the code constructs you are interested in identifying, consider using a tool like: https://python-ast-explorer.com/

Source code in codesurvey/analyzers/python/features.py

def py_ast_feature_finder_with_transform(name: str, *, xpath: str) -> Callable[[ElementTransform], FeatureFinder[lxml.etree.Element]]:
    """Decorator for defining a FeatureFinder that looks for elements in a
    Python AST matching the given xpath query, transforming found elements
    with decorated function.

    The function should receive and return an `lxml.etree.Element`, or
    return `None` if the element should not be considered an
    occurrence of the feature.

    Example usage to look for function calls where the function name
    is 'set':

    ```python
    @py_ast_feature_finder_with_transform('set_function', xpath='Call/func/Name')
    def has_set_function(func_name_el):
        if func_name_el.get('id') == 'set':
            return func_name_el
        return None
    ```

    To explore the AST structure of the code constructs you are
    interested in identifying, consider using a tool like:
    https://python-ast-explorer.com/

    """

    def decorator(func: ElementTransform) -> FeatureFinder[lxml.etree.Element]:

        @wraps(func)
        @feature_finder(name)
        def decorated(xml):
            return _py_ast_feature_finder(xml, xpath=xpath, transform=func)

        return decorated

    return decorator

`codesurvey.analyzers.python.py_module_feature_finder(name: str, *, modules: Sequence[str]) -> FeatureFinder`

Defines a FeatureFinder that looks for import statements of one or more target Python modules.

Example usage:

has_dataclasses = py_module_feature_finder('dataclasses_module', modules=['dataclasses'])

Source code in codesurvey/analyzers/python/features.py

def py_module_feature_finder(name: str, *, modules: Sequence[str]) -> FeatureFinder:
    """Defines a FeatureFinder that looks for import statements of one or
    more target Python `modules`.

    Example usage:

    ```python
    has_dataclasses = py_module_feature_finder('dataclasses_module', modules=['dataclasses'])
    ```

    """
    return partial_feature_finder(name, _py_module_feature_finder, modules=modules)

Python Analyzers

codesurvey.analyzers.python.PythonAstAnalyzer

default_file_glob = '**/*.py' class-attribute instance-attribute

default_file_filters = [py_site_packages_filter] class-attribute instance-attribute

__init__(feature_finders: Sequence[FeatureFinder], *, file_glob: Optional[str] = None, file_filters: Optional[Sequence[Callable[[FileInfo], bool]]] = None, name: Optional[str] = None)

test(code_snippet: str, *, test_filename: str = 'test_file.txt') -> Dict[str, Feature]

Built-In Feature Finders

has_for_else

has_try_finally

has_type_hint

has_set_function

has_set_value = py_ast_feature_finder('set_value', xpath='Set') module-attribute

has_set = union_feature_finder('set', [has_set_function, has_set_value]) module-attribute

has_fstring = py_ast_feature_finder('fstring', xpath='FormattedValue') module-attribute

has_ternary = py_ast_feature_finder('ternary', xpath='IfExp') module-attribute

has_pattern_matching = py_ast_feature_finder('pattern_matching', xpath='Match') module-attribute

has_walrus = py_ast_feature_finder('walrus', xpath='NamedExpr') module-attribute

Custom Python Feature Finders

codesurvey.analyzers.python.py_ast_feature_finder(name: str, *, xpath: str) -> FeatureFinder[lxml.etree.Element]

codesurvey.analyzers.python.py_ast_feature_finder_with_transform(name: str, *, xpath: str) -> Callable[[ElementTransform], FeatureFinder[lxml.etree.Element]]

codesurvey.analyzers.python.py_module_feature_finder(name: str, *, modules: Sequence[str]) -> FeatureFinder

`codesurvey.analyzers.python.PythonAstAnalyzer`

`default_file_glob = '**/*.py'` `class-attribute` `instance-attribute`

`default_file_filters = [py_site_packages_filter]` `class-attribute` `instance-attribute`

`init(feature_finders: Sequence[FeatureFinder], *, file_glob: Optional[str] = None, file_filters: Optional[Sequence[Callable[[FileInfo], bool]]] = None, name: Optional[str] = None)`

`test(code_snippet: str, *, test_filename: str = 'test_file.txt') -> Dict[str, Feature]`

`has_for_else`

`has_try_finally`

`has_type_hint`

`has_set_function`

`has_set_value = py_ast_feature_finder('set_value', xpath='Set')` `module-attribute`

`has_set = union_feature_finder('set', [has_set_function, has_set_value])` `module-attribute`

`has_fstring = py_ast_feature_finder('fstring', xpath='FormattedValue')` `module-attribute`

`has_ternary = py_ast_feature_finder('ternary', xpath='IfExp')` `module-attribute`

`has_pattern_matching = py_ast_feature_finder('pattern_matching', xpath='Match')` `module-attribute`

`has_walrus = py_ast_feature_finder('walrus', xpath='NamedExpr')` `module-attribute`

`codesurvey.analyzers.python.py_ast_feature_finder(name: str, *, xpath: str) -> FeatureFinder[lxml.etree.Element]`

`codesurvey.analyzers.python.py_ast_feature_finder_with_transform(name: str, *, xpath: str) -> Callable[[ElementTransform], FeatureFinder[lxml.etree.Element]]`

`codesurvey.analyzers.python.py_module_feature_finder(name: str, *, modules: Sequence[str]) -> FeatureFinder`