Core CodeSurvey
codesurvey.CodeSurvey
Primary interface for running surveys and inspecting their results.
A CodeSurvey is instantiated with a set of Sources to be surveyed and a set of Analyzers to count the occurrences of features within them. Each Source may fetch multiple Repos (e.g. a project directory, a git repository), each of which may contain multiple Codes (e.g. a source-code file) to be analyzed. Each Analyzer will be configured to identify a particular set of features within each Code.
Additional arguments can be passed to
__init__()
to control
persistent storage, parallelism, and other options.
The survey can be executed with
run()
, which accepts options that
determine the stopping condition for the survey. Multiple calls to
run()
will extend the results of
the survey.
get_repo_features()
,
get_code_features()
,
and get_survey_tree()
can be used to inspect the results of the survey.
Previous survey results can be loaded for inspection specifying
the same db_filepath
used for previous survey run(s).
Source code in codesurvey/core.py
476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 |
|
__init__(*, sources: Sequence[Source], analyzers: Sequence[Analyzer], db_filepath: str = ':memory:', max_workers: Optional[int] = 1, continue_on_failure: bool = True, save_code_features: bool = True, save_occurrences: bool = True, use_saved_features: bool = True)
Parameters:
-
sources
(
Sequence[Source]
) –Sources from which to fetch Repos of Codes to survey. If multiple Sources are provided, Repo fetching will cycle through them in a round-robin fashion.
-
analyzers
(
Sequence[Analyzer]
) –Analyzers to identify features in fetched code.
-
db_filepath
(
str
, default:':memory:'
) –Path to an sqlite database file for persisting survey results. Creates a new sqlite database if the path does not exist. Defaults to a non-persistent in-memory database.
-
max_workers
(
Optional[int]
, default:1
) –The maximum number of parallel worker processes for fetching Repos from Sources and executing Analyzers. Defaults to a single worker.
-
continue_on_failure
(
bool
, default:True
) –If
True
, exceptions raised by Sources and Analyzers will be logged, but will not halt the survey. -
save_code_features
(
bool
, default:True
) –If
True
, features of individual Codes will be retained in the survey database. Otherwise, Code features will be deleted once they have been used to compute aggregate features of its respective Repo. -
save_occurrences
(
bool
, default:True
) –If
True
, occurrence objects returned by FeatureFinders will be saved in the survey database. -
use_saved_features
(
bool
, default:True
) –If
True
, re-use saved features from an Analyzer for a Code when they already exist in the survey database. Otherwise, reapply all Analyzers to all Codes.
Raises:
-
ValueError
–Invalid survey configuration was specified.
Source code in codesurvey/core.py
508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 |
|
run(*, max_repos: Optional[int] = None, max_codes: Optional[int] = None, disable_progress: bool = False, progress_analyzer_features: Optional[Mapping[str, Sequence[str]]] = None)
Runs the survey by fetching code from sources and applying analyzers.
If neither of the max_repos
nor max_codes
stopping
conditions is specified, the survey will continue running
until a KeyboardInterrupt
exception.
Parameters:
-
max_repos
(
Optional[int]
, default:None
) –If specified, the run will stop after analysing this many Repos.
-
max_codes
(
Optional[int]
, default:None
) –If specified, the run will stop after analysing this many Codes.
-
disable_progress
(
bool
, default:False
) –If
True
, do not display tqdm progress bars counting Repos and Codes analyzed. -
progress_analyzer_features
(
Optional[Mapping[str, Sequence[str]]]
, default:None
) –Mapping of analyzer names to sequences of feature names for which progress trackers should be displayed to count Repos found with those features. Defaults to all features, but disables feature progress trackers with a warning when there are more than 10 features.
Source code in codesurvey/core.py
576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 |
|
get_repo_features(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> List[RepoFeature]
Returns RepoFeatures of surveyed Repos.
Parameters:
-
source_names
(
Optional[Sequence[str]]
, default:None
) –If specified, only features from the named Sources will be returned.
-
analyzer_names
(
Optional[Sequence[str]]
, default:None
) –If specified, only features from the named Analyzers will be returned.
-
feature_names
(
Optional[Sequence[str]]
, default:None
) –If specified, only results for the named features will be returned.
Source code in codesurvey/core.py
608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 |
|
get_code_features(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> List[CodeFeature]
Returns CodeFeatures of surveyed Codes.
Only returns results from runs where save_code_results
was True
.
Parameters:
-
source_names
(
Optional[Sequence[str]]
, default:None
) –If specified, only features from the named Sources will be returned.
-
analyzer_names
(
Optional[Sequence[str]]
, default:None
) –If specified, only features from the named Analyzers will be returned.
-
feature_names
(
Optional[Sequence[str]]
, default:None
) –If specified, only results for the named features will be returned.
Source code in codesurvey/core.py
627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 |
|
get_survey_tree(*, source_names: Optional[Sequence[str]] = None, analyzer_names: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None) -> Dict
Returns surveyed CodeFeatures and RepoFeatures structured under a tree structure of Sources, Repos, and Analyzers.
Parameters:
-
source_names
(
Optional[Sequence[str]]
, default:None
) –If specified, only features from the named Sources will be returned.
-
analyzer_names
(
Optional[Sequence[str]]
, default:None
) –If specified, only features from the named Analyzers will be returned.
-
feature_names
(
Optional[Sequence[str]]
, default:None
) –If specified, only results for the named features will be returned.
Returns:
-
Dict
–A dictionary with the following structure:
{ 'sources': { '<source_name>': { 'repos: { '<repo_key>': { 'analyzers': { '<analyzer_name>': { 'features': { 'updated': datetime(...), 'occurence_count': int(...), 'code_occurrence_count': int(...), 'code_total_count': int(...), }, # 'codes' key is only present if # survey runs are performed with # `save_code_features=True` 'codes': { '<code_key>': { 'features': { '<feature_name>': { 'updated': datetime(...), 'occurence_count': int(...), }, ... } }, ... } }, ... }, 'repo_metadata': { '<metadata_key>': ..., ... } }, ... } }, ... } }
Source code in codesurvey/core.py
648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 |
|
codesurvey.RepoFeature
dataclass
Source code in codesurvey/database.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
updated: datetime
instance-attribute
Timestamp when this analysis was last updated.
source_name: str
instance-attribute
Name of the Source that produced the target Repo.
repo_key: str
instance-attribute
Key identifying the target Repo within the Source.
analyzer_name: str
instance-attribute
Name of the Analyzer that produced this feature.
feature_name: str
instance-attribute
Name of the analyzed feature.
occurrence_count: int
instance-attribute
Number of occurrences of this feature within the Repo.
code_occurrence_count: int
instance-attribute
Number of Codes within the Repo containing this feature.
code_total_count: int
instance-attribute
Total number of Codes analyzed for this feature within the Repo.
repo_metadata: Dict[str, Any]
instance-attribute
Metadata of the Repo provided by the Source.
codesurvey.CodeFeature
dataclass
Source code in codesurvey/database.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
updated: datetime
instance-attribute
Timestamp when this analysis was last updated.
source_name: str
instance-attribute
Name of the Source that produced the target Repo.
repo_key: str
instance-attribute
Key identifying the target Repo within the Source.
analyzer_name: str
instance-attribute
Name of the Analyzer that produced this feature.
code_key: str
instance-attribute
Key idenfitying the target Code within the Repo.
feature_name: str
instance-attribute
Name of the analyzed feature.
occurrence_count: Optional[int]
instance-attribute
Number of occurrences of this feature within the Code, or None
if
analysis of this Code was skipped.
occurrences: Optional[List[Dict[str, Any]]]
instance-attribute
Original occurrence objects returned by FeatureFinders.
repo_metadata: Dict[str, Any]
instance-attribute
Metadata of the Repo provided by the Source.
codesurvey.logger = get_logger()
module-attribute
logging.Logger
object that codesurvey logs events to during survey runs.
Can be used to customize logging:
import logging
from codesurvey import logger
logger.setLevel(logging.ERROR)