Skip to content

Core CodeSurvey

codesurvey.CodeSurvey

Primary interface for running surveys and inspecting their results.

A CodeSurvey is instantiated with a set of Sources to be surveyed and a set of Analyzers to count the occurrences of features within them. Each Source may fetch multiple Repos (e.g. a project directory, a git repository), each of which may contain multiple Codes (e.g. a source-code file) to be analyzed. Each Analyzer will be configured to identify a particular set of features within each Code.

Additional arguments can be passed to __init__() to control persistent storage, parallelism, and other options.

The survey can be executed with run(), which accepts options that determine the stopping condition for the survey. Multiple calls to run() will extend the results of the survey.

get_repo_features(), get_code_features(), and get_survey_tree() can be used to inspect the results of the survey.

Previous survey results can be loaded for inspection specifying the same db_filepath used for previous survey run(s).

Source code in codesurvey/core.py
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
class CodeSurvey:
    """Primary interface for running surveys and inspecting their results.

    A CodeSurvey is instantiated with a set of
    [Sources](sources/core.md) to be surveyed and a set of
    [Analyzers](analyzers/core.md) to count the occurrences of
    *features* within them. Each Source may fetch multiple *Repos*
    (e.g. a project directory, a git repository), each of which may
    contain multiple *Codes* (e.g. a source-code file) to be analyzed.
    Each Analyzer will be configured to identify a particular set of
    *features* within each Code.

    Additional arguments can be passed to
    [`__init__()`][codesurvey.CodeSurvey.__init__] to control
    persistent storage, parallelism, and other options.

    The survey can be executed with
    [`run()`][codesurvey.CodeSurvey.run], which accepts options that
    determine the stopping condition for the survey. Multiple calls to
    [`run()`][codesurvey.CodeSurvey.run] will extend the results of
    the survey.

    [`get_repo_features()`][codesurvey.CodeSurvey.get_repo_features],
    [`get_code_features()`][codesurvey.CodeSurvey.get_code_features],
    and [`get_survey_tree()`][codesurvey.CodeSurvey.get_survey_tree]
    can be used to inspect the results of the survey.

    Previous survey results can be loaded for inspection specifying
    the same `db_filepath` used for previous survey run(s).

    """

    def __init__(self, *,
                 sources: Sequence[Source],
                 analyzers: Sequence[Analyzer],
                 db_filepath: str = ':memory:',
                 max_workers: Optional[int] = 1,
                 continue_on_failure: bool = True,
                 save_code_features: bool = True,
                 save_occurrences: bool = True,
                 use_saved_features: bool = True):
        """
        Args:
            sources: Sources from which to fetch Repos of Codes
                to survey. If multiple Sources are provided, Repo fetching
                will cycle through them in a round-robin fashion.
            analyzers: Analyzers to identify features in fetched code.
            db_filepath: Path to an sqlite database file for persisting survey
                results. Creates a new sqlite database if the path does not
                exist. Defaults to a non-persistent in-memory database.
            max_workers: The maximum number of parallel worker processes for
                fetching Repos from Sources and executing Analyzers. Defaults
                to a single worker.
            continue_on_failure: If `True`, exceptions raised by Sources and
                Analyzers will be logged, but will not halt the survey.
            save_code_features: If `True`, features of individual Codes will be
                retained in the survey database. Otherwise, Code features will
                be deleted once they have been used to compute aggregate
                features of its respective Repo.
            save_occurrences: If `True`, occurrence objects returned by
                FeatureFinders will be saved in the survey database.
            use_saved_features: If `True`, re-use saved features from an
                Analyzer for a Code when they already exist in the survey
                database. Otherwise, reapply all Analyzers to all Codes.

        Raises:
            ValueError: Invalid survey configuration was specified.

        """
        duplicate_source_names = get_duplicates([source.name for source in sources])
        if duplicate_source_names:
            duplicate_sources_str = ', '.join(duplicate_source_names)
            raise ValueError(('Cannot instantiate CodeSurvey with duplicate '
                              f'source names: {duplicate_sources_str}. '
                              'Please set a unique name for each source.'))
        self.sources = {source.name: source for source in sources}

        duplicate_analyzer_names = get_duplicates([analyzer.name for analyzer in analyzers])
        if duplicate_analyzer_names:
            duplicate_analyzers_str = ', '.join(duplicate_analyzer_names)
            raise ValueError(('Cannot instantiate CodeSurvey with duplicate '
                              f'analyzer names: {duplicate_analyzers_str}. '
                              'Please set a unique name for each analyzer.'))
        self.analyzers = {analyzer.name: analyzer for analyzer in analyzers}

        self.analyzer_features = {analyzer.name: analyzer.get_feature_names()
                                  for analyzer in analyzers}
        self.db_filepath = db_filepath
        self.max_workers = max_workers or os.cpu_count() or 1
        self.continue_on_failure = continue_on_failure
        self.save_code_features = save_code_features
        self.save_occurrences = save_occurrences
        self.use_saved_features = use_saved_features

        self._runner = CodeSurveyRunner(self)

    def get_db(self):
        """Returns the Database that persists survey results."""
        return Database(self.db_filepath)

    def run(self, *,
            max_repos: Optional[int] = None,
            max_codes: Optional[int] = None,
            disable_progress: bool = False,
            progress_analyzer_features: Optional[Mapping[str, Sequence[str]]] = None):
        """Runs the survey by fetching code from sources and applying analyzers.

        If neither of the `max_repos` nor `max_codes` stopping
        conditions is specified, the survey will continue running
        until a `KeyboardInterrupt` exception.

        Args:
            max_repos: If specified, the run will stop after analysing this
                many Repos.
            max_codes: If specified, the run will stop after analysing this
                many Codes.
            disable_progress: If `True`, do not display tqdm progress bars
                counting Repos and Codes analyzed.
            progress_analyzer_features: Mapping of analyzer names to sequences
                of feature names for which progress trackers should be
                displayed to count Repos found with those features. Defaults
                to all features, but disables feature progress trackers with
                a warning when there are more than 10 features.

        """
        self._runner.run(
            max_repos=max_repos,
            max_codes=max_codes,
            disable_progress=disable_progress,
            progress_analyzer_features=progress_analyzer_features,
        )

    def get_repo_features(self, *,
                          source_names: Optional[Sequence[str]] = None,
                          analyzer_names: Optional[Sequence[str]] = None,
                          feature_names: Optional[Sequence[str]] = None) -> List[RepoFeature]:
        """Returns RepoFeatures of surveyed Repos.

        Args:
            source_names: If specified, only features from the named Sources
                will be returned.
            analyzer_names: If specified, only features from the named Analyzers
                will be returned.
            feature_names: If specified, only results for the named features
                will be returned.

        """
        return self.get_db().get_repo_features(source_names=source_names,
                                               analyzer_names=analyzer_names,
                                               feature_names=feature_names)

    def get_code_features(self, *,
                          source_names: Optional[Sequence[str]] = None,
                          analyzer_names: Optional[Sequence[str]] = None,
                          feature_names: Optional[Sequence[str]] = None) -> List[CodeFeature]:
        """Returns CodeFeatures of surveyed Codes.

        Only returns results from runs where `save_code_results` was `True`.

        Args:
            source_names: If specified, only features from the named Sources
                will be returned.
            analyzer_names: If specified, only features from the named Analyzers
                will be returned.
            feature_names: If specified, only results for the named features
                will be returned.

        """
        return self.get_db().get_code_features(source_names=source_names,
                                               analyzer_names=analyzer_names,
                                               feature_names=feature_names)

    def get_survey_tree(self, *,
                        source_names: Optional[Sequence[str]] = None,
                        analyzer_names: Optional[Sequence[str]] = None,
                        feature_names: Optional[Sequence[str]] = None) -> Dict:
        """Returns surveyed CodeFeatures and RepoFeatures structured under a
        tree structure of Sources, Repos, and Analyzers.

        Args:
            source_names: If specified, only features from the named Sources
                will be returned.
            analyzer_names: If specified, only features from the named Analyzers
                will be returned.
            feature_names: If specified, only results for the named features
                will be returned.

        Returns:
            A dictionary with the following structure:
                ```python
                {
                    'sources': {
                        '<source_name>': {
                            'repos: {
                                '<repo_key>': {
                                    'analyzers': {
                                        '<analyzer_name>': {
                                            'features': {
                                                'updated': datetime(...),
                                                'occurence_count': int(...),
                                                'code_occurrence_count': int(...),
                                                'code_total_count': int(...),
                                            },
                                            # 'codes' key is only present if
                                            # survey runs are performed with
                                            # `save_code_features=True`
                                            'codes': {
                                                '<code_key>': {
                                                    'features': {
                                                        '<feature_name>': {
                                                            'updated': datetime(...),
                                                            'occurence_count': int(...),
                                                        },
                                                        ...
                                                    }
                                                },
                                                ...
                                            }
                                        },
                                        ...
                                    },
                                    'repo_metadata': {
                                        '<metadata_key>': ...,
                                        ...
                                    }
                                },
                                ...
                            }
                        },
                        ...
                    }
                }
                ```

        """
        tree: Dict = {'sources': {}}
        code_features = self.get_code_features(source_names=source_names,
                                               analyzer_names=analyzer_names,
                                               feature_names=feature_names)
        for c in code_features:
            recursive_update(tree, {
                'sources': {c.source_name: {
                    'repos': {c.repo_key: {
                        'analyzers': {c.analyzer_name: {
                            'codes': {c.code_key: {
                                'features': {c.feature_name: {
                                    'updated': c.updated,
                                    'occurrence_count': c.occurrence_count,
                                    'occurrences': c.occurrences,
                                }}
                            }}
                        }},
                        'repo_metadata': c.repo_metadata,
                    }}
                }}
            })
        repo_features = self.get_repo_features(source_names=source_names,
                                               analyzer_names=analyzer_names)
        for r in repo_features:
            recursive_update(tree, {
                'sources': {r.source_name: {
                    'repos': {r.repo_key: {
                        'analyzers': {r.analyzer_name: {
                            'features': {r.feature_name: {
                                'updated': r.updated,
                                'occurrence_count': r.occurrence_count,
                                'code_occurrence_count': r.code_occurrence_count,
                                'code_total_count': r.code_total_count,
                            }},
                        }},
                        'repo_metadata': r.repo_metadata,
                    }}
                }}
            })
        return tree

__init__(*, sources: Sequence[Source], analyzers: Sequence[Analyzer], db_filepath: str = ':memory:', max_workers: Optional[int] = 1, continue_on_failure: bool = True, save_code_features: bool = True, save_occurrences: bool = True, use_saved_features: bool = True)

Parameters:

  • sources (Sequence[Source]) –

    Sources from which to fetch Repos of Codes to survey. If multiple Sources are provided, Repo fetching will cycle through them in a round-robin fashion.

  • analyzers (Sequence[Analyzer]) –

    Analyzers to identify features in fetched code.

  • db_filepath (str, default: ':memory:' ) –

    Path to an sqlite database file for persisting survey results. Creates a new sqlite database if the path does not exist. Defaults to a non-persistent in-memory database.

  • max_workers (Optional[int], default: 1 ) –

    The maximum number of parallel worker processes for fetching Repos from Sources and executing Analyzers. Defaults to a single worker.

  • continue_on_failure (bool, default: True ) –

    If True, exceptions raised by Sources and Analyzers will be logged, but will not halt the survey.

  • save_code_features (bool, default: True ) –

    If True, features of individual Codes will be retained in the survey database. Otherwise, Code features will be deleted once they have been used to compute aggregate features of its respective Repo.

  • save_occurrences (bool, default: True ) –

    If True, occurrence objects returned by FeatureFinders will be saved in the survey database.

  • use_saved_features (bool, default: True ) –

    If True, re-use saved features from an Analyzer for a Code when they already exist in the survey database. Otherwise, reapply all Analyzers to all Codes.

Raises:

  • ValueError

    Invalid survey configuration was specified.

Source code in codesurvey/core.py
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
def __init__(self, *,
             sources: Sequence[Source],
             analyzers: Sequence[Analyzer],
             db_filepath: str = ':memory:',
             max_workers: Optional[int] = 1,
             continue_on_failure: bool = True,
             save_code_features: bool = True,
             save_occurrences: bool = True,
             use_saved_features: bool = True):
    """
    Args:
        sources: Sources from which to fetch Repos of Codes
            to survey. If multiple Sources are provided, Repo fetching
            will cycle through them in a round-robin fashion.
        analyzers: Analyzers to identify features in fetched code.
        db_filepath: Path to an sqlite database file for persisting survey
            results. Creates a new sqlite database if the path does not
            exist. Defaults to a non-persistent in-memory database.
        max_workers: The maximum number of parallel worker processes for
            fetching Repos from Sources and executing Analyzers. Defaults
            to a single worker.
        continue_on_failure: If `True`, exceptions raised by Sources and
            Analyzers will be logged, but will not halt the survey.
        save_code_features: If `True`, features of individual Codes will be
            retained in the survey database. Otherwise, Code features will
            be deleted once they have been used to compute aggregate
            features of its respective Repo.
        save_occurrences: If `True`, occurrence objects returned by
            FeatureFinders will be saved in the survey database.
        use_saved_features: If `True`, re-use saved features from an
            Analyzer for a Code when they already exist in the survey
            database. Otherwise, reapply all Analyzers to all Codes.

    Raises:
        ValueError: Invalid survey configuration was specified.

    """
    duplicate_source_names = get_duplicates([source.name for source in sources])
    if duplicate_source_names:
        duplicate_sources_str = ', '.join(duplicate_source_names)
        raise ValueError(('Cannot instantiate CodeSurvey with duplicate '
                          f'source names: {duplicate_sources_str}. '
                          'Please set a unique name for each source.'))
    self.sources = {source.name: source for source in sources}

    duplicate_analyzer_names = get_duplicates([analyzer.name for analyzer in analyzers])
    if duplicate_analyzer_names:
        duplicate_analyzers_str = ', '.join(duplicate_analyzer_names)
        raise ValueError(('Cannot instantiate CodeSurvey with duplicate '
                          f'analyzer names: {duplicate_analyzers_str}. '
                          'Please set a unique name for each analyzer.'))
    self.analyzers = {analyzer.name: analyzer for analyzer in analyzers}

    self.analyzer_features = {analyzer.name: analyzer.get_feature_names()
                              for analyzer in analyzers}
    self.db_filepath = db_filepath
    self.max_workers = max_workers or os.cpu_count() or 1
    self.continue_on_failure = continue_on_failure
    self.save_code_features = save_code_features
    self.save_occurrences = save_occurrences
    self.use_saved_features = use_saved_features

    self._runner = CodeSurveyRunner(self)

run(*, max_repos: Optional[int] = None, max_codes: Optional[int] = None, disable_progress: bool = False, progress_analyzer_features: Optional[Mapping[str, Sequence[str]]] = None)

Runs the survey by fetching code from sources and applying analyzers.

If neither of the max_repos nor max_codes stopping conditions is specified, the survey will continue running until a KeyboardInterrupt exception.

Parameters:

  • max_repos (Optional[int], default: None ) –

    If specified, the run will stop after analysing this many Repos.

  • max_codes (Optional[int], default: None ) –

    If specified, the run will stop after analysing this many Codes.

  • disable_progress (bool, default: False ) –

    If True, do not display tqdm progress bars counting Repos and Codes analyzed.

  • progress_analyzer_features (Optional[Mapping[str, Sequence[str]]], default: None ) –

    Mapping of analyzer names to sequences of feature names for which progress trackers should be displayed to count Repos found with those features. Defaults to all features, but disables feature progress trackers with a warning when there are more than 10 features.

Source code in codesurvey/core.py
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
def run(self, *,
        max_repos: Optional[int] = None,
        max_codes: Optional[int] = None,
        disable_progress: bool = False,
        progress_analyzer_features: Optional[Mapping[str, Sequence[str]]] = None):
    """Runs the survey by fetching code from sources and applying analyzers.

    If neither of the `max_repos` nor `max_codes` stopping
    conditions is specified, the survey will continue running
    until a `KeyboardInterrupt` exception.

    Args:
        max_repos: If specified, the run will stop after analysing this
            many Repos.
        max_codes: If specified, the run will stop after analysing this
            many Codes.
        disable_progress: If `True`, do not display tqdm progress bars
            counting Repos and Codes analyzed.
        progress_analyzer_features: Mapping of analyzer names to sequences
            of feature names for which progress trackers should be
            displayed to count Repos found with those features. Defaults
            to all features, but disables feature progress trackers with
            a warning when there are more than 10 features.

    """
    self._runner.run(
        max_repos=max_repos,
        max_codes=max_codes,
        disable_progress=disable_progress,
        progress_analyzer_features=progress_analyzer_features,
    )

get_repo_features(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> List[RepoFeature]

Returns RepoFeatures of surveyed Repos.

Parameters:

  • source_names (Optional[Sequence[str]], default: None ) –

    If specified, only features from the named Sources will be returned.

  • analyzer_names (Optional[Sequence[str]], default: None ) –

    If specified, only features from the named Analyzers will be returned.

  • feature_names (Optional[Sequence[str]], default: None ) –

    If specified, only results for the named features will be returned.

Source code in codesurvey/core.py
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
def get_repo_features(self, *,
                      source_names: Optional[Sequence[str]] = None,
                      analyzer_names: Optional[Sequence[str]] = None,
                      feature_names: Optional[Sequence[str]] = None) -> List[RepoFeature]:
    """Returns RepoFeatures of surveyed Repos.

    Args:
        source_names: If specified, only features from the named Sources
            will be returned.
        analyzer_names: If specified, only features from the named Analyzers
            will be returned.
        feature_names: If specified, only results for the named features
            will be returned.

    """
    return self.get_db().get_repo_features(source_names=source_names,
                                           analyzer_names=analyzer_names,
                                           feature_names=feature_names)

get_code_features(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> List[CodeFeature]

Returns CodeFeatures of surveyed Codes.

Only returns results from runs where save_code_results was True.

Parameters:

  • source_names (Optional[Sequence[str]], default: None ) –

    If specified, only features from the named Sources will be returned.

  • analyzer_names (Optional[Sequence[str]], default: None ) –

    If specified, only features from the named Analyzers will be returned.

  • feature_names (Optional[Sequence[str]], default: None ) –

    If specified, only results for the named features will be returned.

Source code in codesurvey/core.py
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
def get_code_features(self, *,
                      source_names: Optional[Sequence[str]] = None,
                      analyzer_names: Optional[Sequence[str]] = None,
                      feature_names: Optional[Sequence[str]] = None) -> List[CodeFeature]:
    """Returns CodeFeatures of surveyed Codes.

    Only returns results from runs where `save_code_results` was `True`.

    Args:
        source_names: If specified, only features from the named Sources
            will be returned.
        analyzer_names: If specified, only features from the named Analyzers
            will be returned.
        feature_names: If specified, only results for the named features
            will be returned.

    """
    return self.get_db().get_code_features(source_names=source_names,
                                           analyzer_names=analyzer_names,
                                           feature_names=feature_names)

get_survey_tree(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> Dict

Returns surveyed CodeFeatures and RepoFeatures structured under a tree structure of Sources, Repos, and Analyzers.

Parameters:

  • source_names (Optional[Sequence[str]], default: None ) –

    If specified, only features from the named Sources will be returned.

  • analyzer_names (Optional[Sequence[str]], default: None ) –

    If specified, only features from the named Analyzers will be returned.

  • feature_names (Optional[Sequence[str]], default: None ) –

    If specified, only results for the named features will be returned.

Returns:

  • Dict

    A dictionary with the following structure:

    {
        'sources': {
            '<source_name>': {
                'repos: {
                    '<repo_key>': {
                        'analyzers': {
                            '<analyzer_name>': {
                                'features': {
                                    'updated': datetime(...),
                                    'occurence_count': int(...),
                                    'code_occurrence_count': int(...),
                                    'code_total_count': int(...),
                                },
                                # 'codes' key is only present if
                                # survey runs are performed with
                                # `save_code_features=True`
                                'codes': {
                                    '<code_key>': {
                                        'features': {
                                            '<feature_name>': {
                                                'updated': datetime(...),
                                                'occurence_count': int(...),
                                            },
                                            ...
                                        }
                                    },
                                    ...
                                }
                            },
                            ...
                        },
                        'repo_metadata': {
                            '<metadata_key>': ...,
                            ...
                        }
                    },
                    ...
                }
            },
            ...
        }
    }
    

Source code in codesurvey/core.py
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
def get_survey_tree(self, *,
                    source_names: Optional[Sequence[str]] = None,
                    analyzer_names: Optional[Sequence[str]] = None,
                    feature_names: Optional[Sequence[str]] = None) -> Dict:
    """Returns surveyed CodeFeatures and RepoFeatures structured under a
    tree structure of Sources, Repos, and Analyzers.

    Args:
        source_names: If specified, only features from the named Sources
            will be returned.
        analyzer_names: If specified, only features from the named Analyzers
            will be returned.
        feature_names: If specified, only results for the named features
            will be returned.

    Returns:
        A dictionary with the following structure:
            ```python
            {
                'sources': {
                    '<source_name>': {
                        'repos: {
                            '<repo_key>': {
                                'analyzers': {
                                    '<analyzer_name>': {
                                        'features': {
                                            'updated': datetime(...),
                                            'occurence_count': int(...),
                                            'code_occurrence_count': int(...),
                                            'code_total_count': int(...),
                                        },
                                        # 'codes' key is only present if
                                        # survey runs are performed with
                                        # `save_code_features=True`
                                        'codes': {
                                            '<code_key>': {
                                                'features': {
                                                    '<feature_name>': {
                                                        'updated': datetime(...),
                                                        'occurence_count': int(...),
                                                    },
                                                    ...
                                                }
                                            },
                                            ...
                                        }
                                    },
                                    ...
                                },
                                'repo_metadata': {
                                    '<metadata_key>': ...,
                                    ...
                                }
                            },
                            ...
                        }
                    },
                    ...
                }
            }
            ```

    """
    tree: Dict = {'sources': {}}
    code_features = self.get_code_features(source_names=source_names,
                                           analyzer_names=analyzer_names,
                                           feature_names=feature_names)
    for c in code_features:
        recursive_update(tree, {
            'sources': {c.source_name: {
                'repos': {c.repo_key: {
                    'analyzers': {c.analyzer_name: {
                        'codes': {c.code_key: {
                            'features': {c.feature_name: {
                                'updated': c.updated,
                                'occurrence_count': c.occurrence_count,
                                'occurrences': c.occurrences,
                            }}
                        }}
                    }},
                    'repo_metadata': c.repo_metadata,
                }}
            }}
        })
    repo_features = self.get_repo_features(source_names=source_names,
                                           analyzer_names=analyzer_names)
    for r in repo_features:
        recursive_update(tree, {
            'sources': {r.source_name: {
                'repos': {r.repo_key: {
                    'analyzers': {r.analyzer_name: {
                        'features': {r.feature_name: {
                            'updated': r.updated,
                            'occurrence_count': r.occurrence_count,
                            'code_occurrence_count': r.code_occurrence_count,
                            'code_total_count': r.code_total_count,
                        }},
                    }},
                    'repo_metadata': r.repo_metadata,
                }}
            }}
        })
    return tree

codesurvey.RepoFeature dataclass

Source code in codesurvey/database.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
@dataclass(frozen=True)
class RepoFeature:
    updated: datetime
    """Timestamp when this analysis was last updated."""

    source_name: str
    """Name of the Source that produced the target Repo."""

    repo_key: str
    """Key identifying the target Repo within the Source."""

    analyzer_name: str
    """Name of the Analyzer that produced this feature."""

    feature_name: str
    """Name of the analyzed feature."""

    occurrence_count: int
    """Number of occurrences of this feature within the Repo."""

    code_occurrence_count: int
    """Number of Codes within the Repo containing this feature."""

    code_total_count: int
    """Total number of Codes analyzed for this feature within the Repo."""

    repo_metadata: Dict[str, Any]
    """Metadata of the Repo provided by the Source."""

updated: datetime instance-attribute

Timestamp when this analysis was last updated.

source_name: str instance-attribute

Name of the Source that produced the target Repo.

repo_key: str instance-attribute

Key identifying the target Repo within the Source.

analyzer_name: str instance-attribute

Name of the Analyzer that produced this feature.

feature_name: str instance-attribute

Name of the analyzed feature.

occurrence_count: int instance-attribute

Number of occurrences of this feature within the Repo.

code_occurrence_count: int instance-attribute

Number of Codes within the Repo containing this feature.

code_total_count: int instance-attribute

Total number of Codes analyzed for this feature within the Repo.

repo_metadata: Dict[str, Any] instance-attribute

Metadata of the Repo provided by the Source.

codesurvey.CodeFeature dataclass

Source code in codesurvey/database.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
@dataclass(frozen=True)
class CodeFeature:
    updated: datetime
    """Timestamp when this analysis was last updated."""

    source_name: str
    """Name of the Source that produced the target Repo."""

    repo_key: str
    """Key identifying the target Repo within the Source."""

    analyzer_name: str
    """Name of the Analyzer that produced this feature."""

    code_key: str
    """Key idenfitying the target Code within the Repo."""

    feature_name: str
    """Name of the analyzed feature."""

    occurrence_count: Optional[int]
    """Number of occurrences of this feature within the Code, or `None` if
    analysis of this Code was skipped."""

    occurrences: Optional[List[Dict[str, Any]]]
    """Original occurrence objects returned by FeatureFinders."""

    repo_metadata: Dict[str, Any]
    """Metadata of the Repo provided by the Source."""

updated: datetime instance-attribute

Timestamp when this analysis was last updated.

source_name: str instance-attribute

Name of the Source that produced the target Repo.

repo_key: str instance-attribute

Key identifying the target Repo within the Source.

analyzer_name: str instance-attribute

Name of the Analyzer that produced this feature.

code_key: str instance-attribute

Key idenfitying the target Code within the Repo.

feature_name: str instance-attribute

Name of the analyzed feature.

occurrence_count: Optional[int] instance-attribute

Number of occurrences of this feature within the Code, or None if analysis of this Code was skipped.

occurrences: Optional[List[Dict[str, Any]]] instance-attribute

Original occurrence objects returned by FeatureFinders.

repo_metadata: Dict[str, Any] instance-attribute

Metadata of the Repo provided by the Source.

codesurvey.logger = get_logger() module-attribute

logging.Logger object that codesurvey logs events to during survey runs.

Can be used to customize logging:

import logging
from codesurvey import logger

logger.setLevel(logging.ERROR)