Core Analyzers
The purpose of an Analyzer
is to find units of source-code
(typically source-code files) within a Repo
provided by a
Source
. An Analyzer
is configured with one
or more FeatureFinders
for identifying occurrences of
features of interest within a unit of source-code.
This document provides details of the core Analyzer
classes that can
be used to define custom Analyzer
types, while usage of built-in
language-specific Analyzers is documented in the following sub-pages:
Custom File Analyzers
You can define your own Analyzer to analyze languages not supported by the built-in Analyzers, or to use different approaches to parse or interpret source-code.
Most Analyzers will treat each source-code file within a Repo as the
unit of source-code. Such Analyzers should inherit from
FileAnalyzer
and implement a
prepare_file()
method that receives a FileInfo
and
returns an appropriate code representation. The code representation
could be a simple string of source-code, or a parsed structure like an
abstract syntax tree (AST) - the type should be specified as a type
argument when inheriting from FileAnalyzer
(e.g. class
CustomAnalyzer(FileAnalyzer[str])
).
Your Analyzer should specify a
default_file_glob
attribute to find source-code files of interest, and may define a set
of
default_file_filters
to exclude certain files.
Your Analyzer should also specify a default_name
class attribute
that will be used to identify your Analyzer in logs and results
(except where a name is provided for a specific Analyzer instance).
For example, to define a custom Analyzer that receives a custom_arg
,
looks for .py
files, excludes filenames beginning with an
underscore, and represents source-code files for FeatureFinders as
strings:
def leading_underscore_file_filter(file_info):
return os.path.basename(file_info.rel_path).startswith('_')
class CustomAnalyzer(FileAnalyzer[str]):
default_name = 'custom'
default_file_glob = '**/*.py'
default_file_filters = [
leading_underscore_file_filter,
]
def __init__(self, custom_arg, *,
feature_finders: Sequence[FeatureFinder], *,
file_glob: Optional[str] = None,
file_filters: Optional[Sequence[Callable[[FileInfo], bool]]] = None,
name: Optional[str] = None):
self.custom_arg = custom_arg
super().__init__(
feature_finders=feature_finders,
file_glob=file_glob,
file_filters=file_filters,
name=name,
)
def prepare_file(self, file_info: FileInfo) -> str:
with open(file_info.abs_path) as code_file:
return code_file.read()
When defining a custom Analyzer, you will also need to implement custom FeatureFinders that expect to receive the type of code representation you specify for your Analyzer.
File Analyzer Classes
codesurvey.analyzers.FileAnalyzer
Bases: Analyzer[CodeReprT]
Base class for Analyzers that analyze each source-code file as the target unit of code within a Repo.
The type argument is the representation of source-code units that will be provided to FeatureFinders used with Analyzer instances.
Source code in codesurvey/analyzers/core.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 |
|
default_file_glob: str
instance-attribute
Default glob pattern for finding source-code files. To be assigned to FileAnalyzers of this type if a custom glob is not specified.
default_file_filters: Sequence[Callable[[FileInfo], bool]] = []
class-attribute
instance-attribute
Default filters to identify files to exclude from analysis. To be assigned to FileAnalyzers of this type if custom filters are not specified.
__init__(feature_finders: Sequence[FeatureFinder], *, file_glob: Optional[str] = None, file_filters: Optional[Sequence[Callable[[FileInfo], bool]]] = None, name: Optional[str] = None)
Parameters:
-
feature_finders
(
Sequence[FeatureFinder]
) –The FeatureFinders for analyzing each source-code file.
-
file_glob
(
Optional[str]
, default:None
) –Glob pattern for finding source-code files within the Repo.
-
file_filters
(
Optional[Sequence[Callable[[FileInfo], bool]]]
, default:None
) –Filters to identify files to exclude from analysis. Each filter is a function that takes a
FileInfo
and returnsTrue
if the file should be excluded. file_filters cannot be lambdas, as they need to be pickled when passed to sub-processes. -
name
(
Optional[str]
, default:None
) –Name to identify the Analyzer. If
None
, defaults to the Analyzer type's default_name.
Source code in codesurvey/analyzers/core.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 |
|
prepare_file(file_info: FileInfo) -> CodeReprT
abstractmethod
Given a FileInfo
identifying the
location of a target source-code file, returns a
representation of the code that can be passed to the
FeatureFinders of this
Analyzer.
Source code in codesurvey/analyzers/core.py
222 223 224 225 226 227 228 |
|
test(code_snippet: str, *, test_filename: str = 'test_file.txt') -> Dict[str, Feature]
Utility for directly analyzing a string of source-code.
A Repo will be created in a temporary directory to perform
analysis of a file created with the given code_snippet
.
Parameters:
-
code_snippet
(
str
) –String of source-code to analyze.
-
test_filename
(
str
, default:'test_file.txt'
) –Optional custom filename used for the test file.
Returns:
Source code in codesurvey/analyzers/core.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 |
|
codesurvey.analyzers.FileInfo
dataclass
Details identifying a source-code file within a Repo.
Source code in codesurvey/analyzers/core.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
repo: Repo
instance-attribute
Repo that the file belongs to.
rel_path: str
instance-attribute
Relative path to the file from the Repo directory.
abs_path: str
property
Absolute path to the file.
Other Custom Analyzers
For Analyzers that don't treat each file as a unit of source-code, you
can define an Analyzer that inherits from
Analyzer
and defines
prepare_code_representation()
and
code_generator()
.
prepare_code_representation()
returns a representation (such as a simple string, or a parsed
structure like an abstract syntax tree) for a specific unit of
source-code, while
code_generator()
returns Code results of analyzing each unit
of source-code, or CodeThunks that
can be executed in a parallelizable sub-process in order to analyze a
unit of source-code.
Core Classes
codesurvey.analyzers.Analyzer
Bases: ABC
, Generic[CodeReprT]
Analyzes Repos to produce Code feature analysis results of individual units of source-code (such as source-code files).
The type argument is the representation of source-code units that will be provided to FeatureFinders used with Analyzer instances.
Source code in codesurvey/analyzers/core.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
default_name: str
instance-attribute
Name to be assigned to Analyzers of this type if a custom name is not specified.
__init__(*, feature_finders: Sequence[FeatureFinder[CodeReprT]], name: Optional[str] = None)
Parameters:
-
feature_finders
(
Sequence[FeatureFinder[CodeReprT]]
) –The FeatureFinders for analyzing each source-code unit.
-
name
(
Optional[str]
, default:None
) –Name to identify the Analyzer. If
None
, defaults to the Analyzer type's default_name
Source code in codesurvey/analyzers/core.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
prepare_code_representation(repo: Repo, code_key: str) -> CodeReprT
abstractmethod
Returns a representation of a source-code unit that can be passed to the FeatureFinders of this Analyzer.
Parameters:
-
repo
(
Repo
) –Repo containing the source-code to be analyzed.
-
code_key
(
str
) –Unique key of the source-code unit to be analyzed within the Repo.
Source code in codesurvey/analyzers/core.py
86 87 88 89 90 91 92 93 94 95 96 97 |
|
code_generator(repo: Repo, *, get_code_features: Callable[[str], Sequence[str]]) -> Iterator[Union[Code, CodeThunk]]
abstractmethod
Generator yielding Codes analysis results of source-code units within the given Repo, or CodeThunks that can be executed to perform such analyses.
Parameters:
-
repo
(
Repo
) –Repo containing the source-code to be analyzed.
-
get_code_features
(
Callable[[str], Sequence[str]]
) –A function that will be called by
code_generator()
with a Code's key to get the subset ofget_feature_names()
that should be analyzed for that Code.
Source code in codesurvey/analyzers/core.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
analyze_code(repo: Repo, code_key: str, features: Sequence[str]) -> Code
Produces a Code analysis for a single unit of source-code within a Repo.
Parameters:
-
repo
(
Repo
) –Repo containing the source-code to be analyzed.
-
code_key
(
str
) –Unique key of the source-code unit to be analyzed within the Repo.
-
features
(
Sequence[str]
) –Names of features to include in the analysis. A subset of the names returned by
get_feature_names()
.
Source code in codesurvey/analyzers/core.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
get_feature_names() -> Sequence[str]
Returns the names of all features analyzed by this Analyzer instance.
Source code in codesurvey/analyzers/core.py
148 149 150 |
|
code(**kwargs) -> Code
Internal helper to generate a Code for this Analyzer. Takes the same arguments as Code except for analyzer.
Source code in codesurvey/analyzers/core.py
152 153 154 155 |
|
code_thunk(**kwargs) -> CodeThunk
Internal helper to generate a CodeThunk for this Analyzer. Takes the same arguments as CodeThunk except for analyzer.
Source code in codesurvey/analyzers/core.py
157 158 159 160 |
|
codesurvey.analyzers.Code
dataclass
Results of analyzing a single unit of source-code from a Repo (e.g. a file of source-code) for occurrences of target features.
Source code in codesurvey/analyzers/core.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
analyzer: Analyzer
instance-attribute
The Analyzer that performed the analysis.
repo: Repo
instance-attribute
The Repo that the Code belongs to.
key: str
instance-attribute
The unique key of the Code within its Repo.
features: Dict[str, Feature]
instance-attribute
A mapping of feature names to Feature survey results.
codesurvey.analyzers.CodeThunk
dataclass
An executable task to be run asynchronously to produce a Code analysis.
Source code in codesurvey/analyzers/core.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
analyzer: Analyzer
instance-attribute
The Analyzer that will perform the analysis.
repo: Repo
instance-attribute
The Repo that the Code belongs to.
key: str
instance-attribute
The unique key of the Code within its Repo.
features: Sequence[str]
instance-attribute
The names of features to be analyzed.
thunk: Callable[[], Code]
instance-attribute
Function to be called to perform the analysis.