Core CodeSurvey

`codesurvey.CodeSurvey`

Primary interface for running surveys and inspecting their results.

A CodeSurvey is instantiated with a set of Sources to be surveyed and a set of Analyzers to count the occurrences of features within them. Each Source may fetch multiple Repos (e.g. a project directory, a git repository), each of which may contain multiple Codes (e.g. a source-code file) to be analyzed. Each Analyzer will be configured to identify a particular set of features within each Code.

Additional arguments can be passed to __init__() to control persistent storage, parallelism, and other options.

The survey can be executed with run(), which accepts options that determine the stopping condition for the survey. Multiple calls to run() will extend the results of the survey.

get_repo_features(), get_code_features(), and get_survey_tree() can be used to inspect the results of the survey.

Previous survey results can be loaded for inspection specifying the same db_filepath used for previous survey run(s).

Source code in codesurvey/core.py

class CodeSurvey:
    """Primary interface for running surveys and inspecting their results.

    A CodeSurvey is instantiated with a set of
    [Sources](sources/core.md) to be surveyed and a set of
    [Analyzers](analyzers/core.md) to count the occurrences of
    *features* within them. Each Source may fetch multiple *Repos*
    (e.g. a project directory, a git repository), each of which may
    contain multiple *Codes* (e.g. a source-code file) to be analyzed.
    Each Analyzer will be configured to identify a particular set of
    *features* within each Code.

    Additional arguments can be passed to
    [`__init__()`][codesurvey.CodeSurvey.__init__] to control
    persistent storage, parallelism, and other options.

    The survey can be executed with
    [`run()`][codesurvey.CodeSurvey.run], which accepts options that
    determine the stopping condition for the survey. Multiple calls to
    [`run()`][codesurvey.CodeSurvey.run] will extend the results of
    the survey.

    [`get_repo_features()`][codesurvey.CodeSurvey.get_repo_features],
    [`get_code_features()`][codesurvey.CodeSurvey.get_code_features],
    and [`get_survey_tree()`][codesurvey.CodeSurvey.get_survey_tree]
    can be used to inspect the results of the survey.

    Previous survey results can be loaded for inspection specifying
    the same `db_filepath` used for previous survey run(s).

    """

    def __init__(self, *,
                 sources: Sequence[Source],
                 analyzers: Sequence[Analyzer],
                 db_filepath: str = ':memory:',
                 max_workers: Optional[int] = 1,
                 continue_on_failure: bool = True,
                 save_code_features: bool = True,
                 save_occurrences: bool = True,
                 use_saved_features: bool = True):
        """
        Args:
            sources: Sources from which to fetch Repos of Codes
                to survey. If multiple Sources are provided, Repo fetching
                will cycle through them in a round-robin fashion.
            analyzers: Analyzers to identify features in fetched code.
            db_filepath: Path to an sqlite database file for persisting survey
                results. Creates a new sqlite database if the path does not
                exist. Defaults to a non-persistent in-memory database.
            max_workers: The maximum number of parallel worker processes for
                fetching Repos from Sources and executing Analyzers. Defaults
                to a single worker.
            continue_on_failure: If `True`, exceptions raised by Sources and
                Analyzers will be logged, but will not halt the survey.
            save_code_features: If `True`, features of individual Codes will be
                retained in the survey database. Otherwise, Code features will
                be deleted once they have been used to compute aggregate
                features of its respective Repo.
            save_occurrences: If `True`, occurrence objects returned by
                FeatureFinders will be saved in the survey database.
            use_saved_features: If `True`, re-use saved features from an
                Analyzer for a Code when they already exist in the survey
                database. Otherwise, reapply all Analyzers to all Codes.

        Raises:
            ValueError: Invalid survey configuration was specified.

        """
        duplicate_source_names = get_duplicates([source.name for source in sources])
        if duplicate_source_names:
            duplicate_sources_str = ', '.join(duplicate_source_names)
            raise ValueError(('Cannot instantiate CodeSurvey with duplicate '
                              f'source names: {duplicate_sources_str}. '
                              'Please set a unique name for each source.'))
        self.sources = {source.name: source for source in sources}

        duplicate_analyzer_names = get_duplicates([analyzer.name for analyzer in analyzers])
        if duplicate_analyzer_names:
            duplicate_analyzers_str = ', '.join(duplicate_analyzer_names)
            raise ValueError(('Cannot instantiate CodeSurvey with duplicate '
                              f'analyzer names: {duplicate_analyzers_str}. '
                              'Please set a unique name for each analyzer.'))
        self.analyzers = {analyzer.name: analyzer for analyzer in analyzers}

        self.analyzer_features = {analyzer.name: analyzer.get_feature_names()
                                  for analyzer in analyzers}
        self.db_filepath = db_filepath
        self.max_workers = max_workers or os.cpu_count() or 1
        self.continue_on_failure = continue_on_failure
        self.save_code_features = save_code_features
        self.save_occurrences = save_occurrences
        self.use_saved_features = use_saved_features

        self._runner = CodeSurveyRunner(self)

    def get_db(self):
        """Returns the Database that persists survey results."""
        return Database(self.db_filepath)

    def run(self, *,
            max_repos: Optional[int] = None,
            max_codes: Optional[int] = None,
            disable_progress: bool = False,
            progress_analyzer_features: Optional[Mapping[str, Sequence[str]]] = None):
        """Runs the survey by fetching code from sources and applying analyzers.

        If neither of the `max_repos` nor `max_codes` stopping
        conditions is specified, the survey will continue running
        until a `KeyboardInterrupt` exception.

        Args:
            max_repos: If specified, the run will stop after analysing this
                many Repos.
            max_codes: If specified, the run will stop after analysing this
                many Codes.
            disable_progress: If `True`, do not display tqdm progress bars
                counting Repos and Codes analyzed.
            progress_analyzer_features: Mapping of analyzer names to sequences
                of feature names for which progress trackers should be
                displayed to count Repos found with those features. Defaults
                to all features, but disables feature progress trackers with
                a warning when there are more than 10 features.

        """
        self._runner.run(
            max_repos=max_repos,
            max_codes=max_codes,
            disable_progress=disable_progress,
            progress_analyzer_features=progress_analyzer_features,
        )

    def get_repo_features(self, *,
                          source_names: Optional[Sequence[str]] = None,
                          analyzer_names: Optional[Sequence[str]] = None,
                          feature_names: Optional[Sequence[str]] = None) -> List[RepoFeature]:
        """Returns RepoFeatures of surveyed Repos.

        Args:
            source_names: If specified, only features from the named Sources
                will be returned.
            analyzer_names: If specified, only features from the named Analyzers
                will be returned.
            feature_names: If specified, only results for the named features
                will be returned.

        """
        return self.get_db().get_repo_features(source_names=source_names,
                                               analyzer_names=analyzer_names,
                                               feature_names=feature_names)

    def get_code_features(self, *,
                          source_names: Optional[Sequence[str]] = None,
                          analyzer_names: Optional[Sequence[str]] = None,
                          feature_names: Optional[Sequence[str]] = None) -> List[CodeFeature]:
        """Returns CodeFeatures of surveyed Codes.

        Only returns results from runs where `save_code_results` was `True`.

        Args:
            source_names: If specified, only features from the named Sources
                will be returned.
            analyzer_names: If specified, only features from the named Analyzers
                will be returned.
            feature_names: If specified, only results for the named features
                will be returned.

        """
        return self.get_db().get_code_features(source_names=source_names,
                                               analyzer_names=analyzer_names,
                                               feature_names=feature_names)

    def get_survey_tree(self, *,
                        source_names: Optional[Sequence[str]] = None,
                        analyzer_names: Optional[Sequence[str]] = None,
                        feature_names: Optional[Sequence[str]] = None) -> Dict:
        """Returns surveyed CodeFeatures and RepoFeatures structured under a
        tree structure of Sources, Repos, and Analyzers.

        Args:
            source_names: If specified, only features from the named Sources
                will be returned.
            analyzer_names: If specified, only features from the named Analyzers
                will be returned.
            feature_names: If specified, only results for the named features
                will be returned.

        Returns:
            A dictionary with the following structure:
                ```python
                {
                    'sources': {
                        '<source_name>': {
                            'repos: {
                                '<repo_key>': {
                                    'analyzers': {
                                        '<analyzer_name>': {
                                            'features': {
                                                'updated': datetime(...),
                                                'occurence_count': int(...),
                                                'code_occurrence_count': int(...),
                                                'code_total_count': int(...),
                                            },
                                            # 'codes' key is only present if
                                            # survey runs are performed with
                                            # `save_code_features=True`
                                            'codes': {
                                                '<code_key>': {
                                                    'features': {
                                                        '<feature_name>': {
                                                            'updated': datetime(...),
                                                            'occurence_count': int(...),
                                                        },
                                                        ...
                                                    }
                                                },
                                                ...
                                            }
                                        },
                                        ...
                                    },
                                    'repo_metadata': {
                                        '<metadata_key>': ...,
                                        ...
                                    }
                                },
                                ...
                            }
                        },
                        ...
                    }
                }
                ```

        """
        tree: Dict = {'sources': {}}
        code_features = self.get_code_features(source_names=source_names,
                                               analyzer_names=analyzer_names,
                                               feature_names=feature_names)
        for c in code_features:
            recursive_update(tree, {
                'sources': {c.source_name: {
                    'repos': {c.repo_key: {
                        'analyzers': {c.analyzer_name: {
                            'codes': {c.code_key: {
                                'features': {c.feature_name: {
                                    'updated': c.updated,
                                    'occurrence_count': c.occurrence_count,
                                    'occurrences': c.occurrences,
                                }}
                            }}
                        }},
                        'repo_metadata': c.repo_metadata,
                    }}
                }}
            })
        repo_features = self.get_repo_features(source_names=source_names,
                                               analyzer_names=analyzer_names)
        for r in repo_features:
            recursive_update(tree, {
                'sources': {r.source_name: {
                    'repos': {r.repo_key: {
                        'analyzers': {r.analyzer_name: {
                            'features': {r.feature_name: {
                                'updated': r.updated,
                                'occurrence_count': r.occurrence_count,
                                'code_occurrence_count': r.code_occurrence_count,
                                'code_total_count': r.code_total_count,
                            }},
                        }},
                        'repo_metadata': r.repo_metadata,
                    }}
                }}
            })
        return tree

`init(*, sources: Sequence[Source], analyzers: Sequence[Analyzer], db_filepath: str = ':memory:', max_workers: Optional[int] = 1, continue_on_failure: bool = True, save_code_features: bool = True, save_occurrences: bool = True, use_saved_features: bool = True)`

Parameters:

sources (Sequence[Source]) –

Sources from which to fetch Repos of Codes to survey. If multiple Sources are provided, Repo fetching will cycle through them in a round-robin fashion.
analyzers (Sequence[Analyzer]) –

Analyzers to identify features in fetched code.
db_filepath (str, default: ':memory:' ) –

Path to an sqlite database file for persisting survey results. Creates a new sqlite database if the path does not exist. Defaults to a non-persistent in-memory database.
max_workers (Optional[int], default: 1 ) –

The maximum number of parallel worker processes for fetching Repos from Sources and executing Analyzers. Defaults to a single worker.
continue_on_failure (bool, default: True ) –

If True, exceptions raised by Sources and Analyzers will be logged, but will not halt the survey.
save_code_features (bool, default: True ) –

If True, features of individual Codes will be retained in the survey database. Otherwise, Code features will be deleted once they have been used to compute aggregate features of its respective Repo.
save_occurrences (bool, default: True ) –

If True, occurrence objects returned by FeatureFinders will be saved in the survey database.
use_saved_features (bool, default: True ) –

If True, re-use saved features from an Analyzer for a Code when they already exist in the survey database. Otherwise, reapply all Analyzers to all Codes.

Raises:

ValueError –

Invalid survey configuration was specified.

Source code in codesurvey/core.py

def __init__(self, *,
             sources: Sequence[Source],
             analyzers: Sequence[Analyzer],
             db_filepath: str = ':memory:',
             max_workers: Optional[int] = 1,
             continue_on_failure: bool = True,
             save_code_features: bool = True,
             save_occurrences: bool = True,
             use_saved_features: bool = True):
    """
    Args:
        sources: Sources from which to fetch Repos of Codes
            to survey. If multiple Sources are provided, Repo fetching
            will cycle through them in a round-robin fashion.
        analyzers: Analyzers to identify features in fetched code.
        db_filepath: Path to an sqlite database file for persisting survey
            results. Creates a new sqlite database if the path does not
            exist. Defaults to a non-persistent in-memory database.
        max_workers: The maximum number of parallel worker processes for
            fetching Repos from Sources and executing Analyzers. Defaults
            to a single worker.
        continue_on_failure: If `True`, exceptions raised by Sources and
            Analyzers will be logged, but will not halt the survey.
        save_code_features: If `True`, features of individual Codes will be
            retained in the survey database. Otherwise, Code features will
            be deleted once they have been used to compute aggregate
            features of its respective Repo.
        save_occurrences: If `True`, occurrence objects returned by
            FeatureFinders will be saved in the survey database.
        use_saved_features: If `True`, re-use saved features from an
            Analyzer for a Code when they already exist in the survey
            database. Otherwise, reapply all Analyzers to all Codes.

    Raises:
        ValueError: Invalid survey configuration was specified.

    """
    duplicate_source_names = get_duplicates([source.name for source in sources])
    if duplicate_source_names:
        duplicate_sources_str = ', '.join(duplicate_source_names)
        raise ValueError(('Cannot instantiate CodeSurvey with duplicate '
                          f'source names: {duplicate_sources_str}. '
                          'Please set a unique name for each source.'))
    self.sources = {source.name: source for source in sources}

    duplicate_analyzer_names = get_duplicates([analyzer.name for analyzer in analyzers])
    if duplicate_analyzer_names:
        duplicate_analyzers_str = ', '.join(duplicate_analyzer_names)
        raise ValueError(('Cannot instantiate CodeSurvey with duplicate '
                          f'analyzer names: {duplicate_analyzers_str}. '
                          'Please set a unique name for each analyzer.'))
    self.analyzers = {analyzer.name: analyzer for analyzer in analyzers}

    self.analyzer_features = {analyzer.name: analyzer.get_feature_names()
                              for analyzer in analyzers}
    self.db_filepath = db_filepath
    self.max_workers = max_workers or os.cpu_count() or 1
    self.continue_on_failure = continue_on_failure
    self.save_code_features = save_code_features
    self.save_occurrences = save_occurrences
    self.use_saved_features = use_saved_features

    self._runner = CodeSurveyRunner(self)

`run(*, max_repos: Optional[int] = None, max_codes: Optional[int] = None, disable_progress: bool = False, progress_analyzer_features: Optional[Mapping[str, Sequence[str]]] = None)`

Runs the survey by fetching code from sources and applying analyzers.

If neither of the max_repos nor max_codes stopping conditions is specified, the survey will continue running until a KeyboardInterrupt exception.

Parameters:

max_repos (Optional[int], default: None ) –

If specified, the run will stop after analysing this many Repos.
max_codes (Optional[int], default: None ) –

If specified, the run will stop after analysing this many Codes.
disable_progress (bool, default: False ) –

If True, do not display tqdm progress bars counting Repos and Codes analyzed.
progress_analyzer_features (Optional[Mapping[str, Sequence[str]]], default: None ) –

Mapping of analyzer names to sequences of feature names for which progress trackers should be displayed to count Repos found with those features. Defaults to all features, but disables feature progress trackers with a warning when there are more than 10 features.

Source code in codesurvey/core.py

def run(self, *,
        max_repos: Optional[int] = None,
        max_codes: Optional[int] = None,
        disable_progress: bool = False,
        progress_analyzer_features: Optional[Mapping[str, Sequence[str]]] = None):
    """Runs the survey by fetching code from sources and applying analyzers.

    If neither of the `max_repos` nor `max_codes` stopping
    conditions is specified, the survey will continue running
    until a `KeyboardInterrupt` exception.

    Args:
        max_repos: If specified, the run will stop after analysing this
            many Repos.
        max_codes: If specified, the run will stop after analysing this
            many Codes.
        disable_progress: If `True`, do not display tqdm progress bars
            counting Repos and Codes analyzed.
        progress_analyzer_features: Mapping of analyzer names to sequences
            of feature names for which progress trackers should be
            displayed to count Repos found with those features. Defaults
            to all features, but disables feature progress trackers with
            a warning when there are more than 10 features.

    """
    self._runner.run(
        max_repos=max_repos,
        max_codes=max_codes,
        disable_progress=disable_progress,
        progress_analyzer_features=progress_analyzer_features,
    )

`get_repo_features(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> List[RepoFeature]`

Returns RepoFeatures of surveyed Repos.

Parameters:

source_names (Optional[Sequence[str]], default: None ) –

If specified, only features from the named Sources will be returned.
analyzer_names (Optional[Sequence[str]], default: None ) –

If specified, only features from the named Analyzers will be returned.
feature_names (Optional[Sequence[str]], default: None ) –

If specified, only results for the named features will be returned.

Source code in codesurvey/core.py

def get_repo_features(self, *,
                      source_names: Optional[Sequence[str]] = None,
                      analyzer_names: Optional[Sequence[str]] = None,
                      feature_names: Optional[Sequence[str]] = None) -> List[RepoFeature]:
    """Returns RepoFeatures of surveyed Repos.

    Args:
        source_names: If specified, only features from the named Sources
            will be returned.
        analyzer_names: If specified, only features from the named Analyzers
            will be returned.
        feature_names: If specified, only results for the named features
            will be returned.

    """
    return self.get_db().get_repo_features(source_names=source_names,
                                           analyzer_names=analyzer_names,
                                           feature_names=feature_names)

`get_code_features(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> List[CodeFeature]`

Returns CodeFeatures of surveyed Codes.

Only returns results from runs where save_code_results was True.

Parameters:

source_names (Optional[Sequence[str]], default: None ) –

If specified, only features from the named Sources will be returned.
analyzer_names (Optional[Sequence[str]], default: None ) –

If specified, only features from the named Analyzers will be returned.
feature_names (Optional[Sequence[str]], default: None ) –

If specified, only results for the named features will be returned.

Source code in codesurvey/core.py

def get_code_features(self, *,
                      source_names: Optional[Sequence[str]] = None,
                      analyzer_names: Optional[Sequence[str]] = None,
                      feature_names: Optional[Sequence[str]] = None) -> List[CodeFeature]:
    """Returns CodeFeatures of surveyed Codes.

    Only returns results from runs where `save_code_results` was `True`.

    Args:
        source_names: If specified, only features from the named Sources
            will be returned.
        analyzer_names: If specified, only features from the named Analyzers
            will be returned.
        feature_names: If specified, only results for the named features
            will be returned.

    """
    return self.get_db().get_code_features(source_names=source_names,
                                           analyzer_names=analyzer_names,
                                           feature_names=feature_names)

`get_survey_tree(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> Dict`

Returns surveyed CodeFeatures and RepoFeatures structured under a tree structure of Sources, Repos, and Analyzers.

Parameters:

source_names (Optional[Sequence[str]], default: None ) –

If specified, only features from the named Sources will be returned.
analyzer_names (Optional[Sequence[str]], default: None ) –

If specified, only features from the named Analyzers will be returned.
feature_names (Optional[Sequence[str]], default: None ) –

If specified, only results for the named features will be returned.

Returns:

Dict –

A dictionary with the following structure:

{
    'sources': {
        '<source_name>': {
            'repos: {
                '<repo_key>': {
                    'analyzers': {
                        '<analyzer_name>': {
                            'features': {
                                'updated': datetime(...),
                                'occurence_count': int(...),
                                'code_occurrence_count': int(...),
                                'code_total_count': int(...),
                            },
                            # 'codes' key is only present if
                            # survey runs are performed with
                            # `save_code_features=True`
                            'codes': {
                                '<code_key>': {
                                    'features': {
                                        '<feature_name>': {
                                            'updated': datetime(...),
                                            'occurence_count': int(...),
                                        },
                                        ...
                                    }
                                },
                                ...
                            }
                        },
                        ...
                    },
                    'repo_metadata': {
                        '<metadata_key>': ...,
                        ...
                    }
                },
                ...
            }
        },
        ...
    }
}

Source code in codesurvey/core.py

def get_survey_tree(self, *,
                    source_names: Optional[Sequence[str]] = None,
                    analyzer_names: Optional[Sequence[str]] = None,
                    feature_names: Optional[Sequence[str]] = None) -> Dict:
    """Returns surveyed CodeFeatures and RepoFeatures structured under a
    tree structure of Sources, Repos, and Analyzers.

    Args:
        source_names: If specified, only features from the named Sources
            will be returned.
        analyzer_names: If specified, only features from the named Analyzers
            will be returned.
        feature_names: If specified, only results for the named features
            will be returned.

    Returns:
        A dictionary with the following structure:
            ```python
            {
                'sources': {
                    '<source_name>': {
                        'repos: {
                            '<repo_key>': {
                                'analyzers': {
                                    '<analyzer_name>': {
                                        'features': {
                                            'updated': datetime(...),
                                            'occurence_count': int(...),
                                            'code_occurrence_count': int(...),
                                            'code_total_count': int(...),
                                        },
                                        # 'codes' key is only present if
                                        # survey runs are performed with
                                        # `save_code_features=True`
                                        'codes': {
                                            '<code_key>': {
                                                'features': {
                                                    '<feature_name>': {
                                                        'updated': datetime(...),
                                                        'occurence_count': int(...),
                                                    },
                                                    ...
                                                }
                                            },
                                            ...
                                        }
                                    },
                                    ...
                                },
                                'repo_metadata': {
                                    '<metadata_key>': ...,
                                    ...
                                }
                            },
                            ...
                        }
                    },
                    ...
                }
            }
            ```

    """
    tree: Dict = {'sources': {}}
    code_features = self.get_code_features(source_names=source_names,
                                           analyzer_names=analyzer_names,
                                           feature_names=feature_names)
    for c in code_features:
        recursive_update(tree, {
            'sources': {c.source_name: {
                'repos': {c.repo_key: {
                    'analyzers': {c.analyzer_name: {
                        'codes': {c.code_key: {
                            'features': {c.feature_name: {
                                'updated': c.updated,
                                'occurrence_count': c.occurrence_count,
                                'occurrences': c.occurrences,
                            }}
                        }}
                    }},
                    'repo_metadata': c.repo_metadata,
                }}
            }}
        })
    repo_features = self.get_repo_features(source_names=source_names,
                                           analyzer_names=analyzer_names)
    for r in repo_features:
        recursive_update(tree, {
            'sources': {r.source_name: {
                'repos': {r.repo_key: {
                    'analyzers': {r.analyzer_name: {
                        'features': {r.feature_name: {
                            'updated': r.updated,
                            'occurrence_count': r.occurrence_count,
                            'code_occurrence_count': r.code_occurrence_count,
                            'code_total_count': r.code_total_count,
                        }},
                    }},
                    'repo_metadata': r.repo_metadata,
                }}
            }}
        })
    return tree

`codesurvey.RepoFeature` `dataclass`

Source code in codesurvey/database.py

@dataclass(frozen=True)
class RepoFeature:
    updated: datetime
    """Timestamp when this analysis was last updated."""

    source_name: str
    """Name of the Source that produced the target Repo."""

    repo_key: str
    """Key identifying the target Repo within the Source."""

    analyzer_name: str
    """Name of the Analyzer that produced this feature."""

    feature_name: str
    """Name of the analyzed feature."""

    occurrence_count: int
    """Number of occurrences of this feature within the Repo."""

    code_occurrence_count: int
    """Number of Codes within the Repo containing this feature."""

    code_total_count: int
    """Total number of Codes analyzed for this feature within the Repo."""

    repo_metadata: Dict[str, Any]
    """Metadata of the Repo provided by the Source."""

`updated: datetime` `instance-attribute`

Timestamp when this analysis was last updated.

`source_name: str` `instance-attribute`

Name of the Source that produced the target Repo.

`repo_key: str` `instance-attribute`

Key identifying the target Repo within the Source.

`analyzer_name: str` `instance-attribute`

Name of the Analyzer that produced this feature.

`feature_name: str` `instance-attribute`

Name of the analyzed feature.

`occurrence_count: int` `instance-attribute`

Number of occurrences of this feature within the Repo.

`code_occurrence_count: int` `instance-attribute`

Number of Codes within the Repo containing this feature.

`code_total_count: int` `instance-attribute`

Total number of Codes analyzed for this feature within the Repo.

`repo_metadata: Dict[str, Any]` `instance-attribute`

Metadata of the Repo provided by the Source.

`codesurvey.CodeFeature` `dataclass`

Source code in codesurvey/database.py

@dataclass(frozen=True)
class CodeFeature:
    updated: datetime
    """Timestamp when this analysis was last updated."""

    source_name: str
    """Name of the Source that produced the target Repo."""

    repo_key: str
    """Key identifying the target Repo within the Source."""

    analyzer_name: str
    """Name of the Analyzer that produced this feature."""

    code_key: str
    """Key idenfitying the target Code within the Repo."""

    feature_name: str
    """Name of the analyzed feature."""

    occurrence_count: Optional[int]
    """Number of occurrences of this feature within the Code, or `None` if
    analysis of this Code was skipped."""

    occurrences: Optional[List[Dict[str, Any]]]
    """Original occurrence objects returned by FeatureFinders."""

    repo_metadata: Dict[str, Any]
    """Metadata of the Repo provided by the Source."""

`updated: datetime` `instance-attribute`

Timestamp when this analysis was last updated.

`source_name: str` `instance-attribute`

Name of the Source that produced the target Repo.

`repo_key: str` `instance-attribute`

Key identifying the target Repo within the Source.

`analyzer_name: str` `instance-attribute`

Name of the Analyzer that produced this feature.

`code_key: str` `instance-attribute`

Key idenfitying the target Code within the Repo.

`feature_name: str` `instance-attribute`

Name of the analyzed feature.

`occurrence_count: Optional[int]` `instance-attribute`

Number of occurrences of this feature within the Code, or None if analysis of this Code was skipped.

`occurrences: Optional[List[Dict[str, Any]]]` `instance-attribute`

Original occurrence objects returned by FeatureFinders.

`repo_metadata: Dict[str, Any]` `instance-attribute`

Metadata of the Repo provided by the Source.

`codesurvey.logger = get_logger()` `module-attribute`

logging.Logger object that codesurvey logs events to during survey runs.

Can be used to customize logging:

import logging
from codesurvey import logger

logger.setLevel(logging.ERROR)

Core CodeSurvey

codesurvey.CodeSurvey

__init__(*, sources: Sequence[Source], analyzers: Sequence[Analyzer], db_filepath: str = ':memory:', max_workers: Optional[int] = 1, continue_on_failure: bool = True, save_code_features: bool = True, save_occurrences: bool = True, use_saved_features: bool = True)

run(*, max_repos: Optional[int] = None, max_codes: Optional[int] = None, disable_progress: bool = False, progress_analyzer_features: Optional[Mapping[str, Sequence[str]]] = None)

get_repo_features(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> List[RepoFeature]

get_code_features(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> List[CodeFeature]

get_survey_tree(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> Dict

codesurvey.RepoFeature dataclass

updated: datetime instance-attribute

source_name: str instance-attribute

repo_key: str instance-attribute

analyzer_name: str instance-attribute

feature_name: str instance-attribute

occurrence_count: int instance-attribute

code_occurrence_count: int instance-attribute

code_total_count: int instance-attribute

repo_metadata: Dict[str, Any] instance-attribute

codesurvey.CodeFeature dataclass

updated: datetime instance-attribute

source_name: str instance-attribute

repo_key: str instance-attribute

analyzer_name: str instance-attribute

code_key: str instance-attribute

feature_name: str instance-attribute

occurrence_count: Optional[int] instance-attribute

occurrences: Optional[List[Dict[str, Any]]] instance-attribute

repo_metadata: Dict[str, Any] instance-attribute

codesurvey.logger = get_logger() module-attribute

`codesurvey.CodeSurvey`

`init(*, sources: Sequence[Source], analyzers: Sequence[Analyzer], db_filepath: str = ':memory:', max_workers: Optional[int] = 1, continue_on_failure: bool = True, save_code_features: bool = True, save_occurrences: bool = True, use_saved_features: bool = True)`

`run(*, max_repos: Optional[int] = None, max_codes: Optional[int] = None, disable_progress: bool = False, progress_analyzer_features: Optional[Mapping[str, Sequence[str]]] = None)`

`get_repo_features(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> List[RepoFeature]`

`get_code_features(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> List[CodeFeature]`

`get_survey_tree(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> Dict`

`codesurvey.RepoFeature` `dataclass`

`updated: datetime` `instance-attribute`

`source_name: str` `instance-attribute`

`repo_key: str` `instance-attribute`

`analyzer_name: str` `instance-attribute`

`feature_name: str` `instance-attribute`

`occurrence_count: int` `instance-attribute`

`code_occurrence_count: int` `instance-attribute`

`code_total_count: int` `instance-attribute`

`repo_metadata: Dict[str, Any]` `instance-attribute`

`codesurvey.CodeFeature` `dataclass`

`updated: datetime` `instance-attribute`

`source_name: str` `instance-attribute`

`repo_key: str` `instance-attribute`

`analyzer_name: str` `instance-attribute`

`code_key: str` `instance-attribute`

`feature_name: str` `instance-attribute`

`occurrence_count: Optional[int]` `instance-attribute`

`occurrences: Optional[List[Dict[str, Any]]]` `instance-attribute`

`repo_metadata: Dict[str, Any]` `instance-attribute`

`codesurvey.logger = get_logger()` `module-attribute`