|
|
1 year ago | |
|---|---|---|
| .. | ||
| commentparser | 1 year ago | |
| internal/sets | 1 year ago | |
| licenses | 1 year ago | |
| serializer | 1 year ago | |
| stringclassifier | 1 year ago | |
| tools | 1 year ago | |
| v2 | 1 year ago | |
| .travis.yml | 1 year ago | |
| CHANGELOG | 1 year ago | |
| CONTRIBUTING.md | 1 year ago | |
| LICENSE | 1 year ago | |
| METADATA | 1 year ago | |
| OWNERS | 1 year ago | |
| README.md | 1 year ago | |
| classifier.go | 1 year ago | |
| classifier_test.go | 1 year ago | |
| file_system_resources.go | 1 year ago | |
| forbidden.go | 1 year ago | |
| go.mod | 1 year ago | |
| go.sum | 1 year ago | |
| license_type.go | 1 year ago | |
README.md
License Classifier
Introduction
The license classifier is a library and set of tools that can analyze text to
determine what type of license it contains. It searches for license texts in a
file and compares them to an archive of known licenses. These files could be,
e.g., LICENSE files with a single or multiple licenses in it, or source code
files with the license text in a comment.
A "confidence level" is associated with each result indicating how close the
match was. A confidence level of 1.0 indicates an exact match, while a
confidence level of 0.0 indicates that no license was able to match the text.
Adding a new license
Adding a new license is straight-forward:
-
Create a file in
licenses/.- The filename should be the name of the license or its abbreviation. If the license is an Open Source license, use the appropriate identifier specified at https://spdx.org/licenses/.
- If the license is the "header" version of the license, append the suffix
"
.header" to it. Seelicenses/README.mdfor more details.
-
Add the license name to the list in
license_type.go. -
Regenerate the
licenses.dbfile by running the license serializer:$ license_serializer -output licenseclassifier/licenses -
Create and run appropriate tests to verify that the license is indeed present.
Tools
Identify license
identify_license is a command line tool that can identify the license(s)
within a file.
$ identify_license LICENSE
LICENSE: GPL-2.0 (confidence: 1, offset: 0, extent: 14794)
LICENSE: LGPL-2.1 (confidence: 1, offset: 18366, extent: 23829)
LICENSE: MIT (confidence: 1, offset: 17255, extent: 1059)
License serializer
The license_serializer tool regenerates the licenses.db archive. The archive
contains preprocessed license texts for quicker comparisons against unknown
texts.
$ license_serializer -output licenseclassifier/licenses
This is not an official Google product (experimental or otherwise), it is just code that happens to be owned by Google.