Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/ast: add func IsGenerated(*File) bool #28089

Closed
dmitshur opened this issue Oct 9, 2018 · 17 comments
Closed

go/ast: add func IsGenerated(*File) bool #28089

dmitshur opened this issue Oct 9, 2018 · 17 comments
Assignees
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Proposal Proposal-Accepted Proposal-FinalCommentPeriod
Milestone

Comments

@dmitshur
Copy link
Contributor

dmitshur commented Oct 9, 2018

Given that #13560 has been accepted, resolved, and by now, widely accepted by the Go community, I think it can be helpful to have a Go parser for it that tools written in Go could use (if desired).

It's relatively easy to write an ad hoc parser using the regexp package, but it's also possible to write a more specialized one that has less overhead.

I already wrote one a while ago, and it currently lives at github.com/shurcooL/go/generated.

I want to move it out of the repository it's currently in, which contains many miscellaneous Go packages of lower utility and quality. I was originally planning to move it out into a standalone repository on my personal site, but then I thought it might be a good fit under x/tools subrepo, specifically, in the x/tools/go directory, since it deals with Go code. The proposed import path would be:

import "golang.org/x/tools/go/generated"

Hence this proposal. If accepted, I'm happy to maintain it/be the owner. The scope is very narrow, so it should be very low volume of work.

Not sure how this intersects with #17244.

If not accepted, I would likely move it here instead:

import "dmitri.shuralyov.com/go/generated"

(The code is currently MIT licensed, but in either case, I'd relicense it under the Go license.)

/cc @andybons @bradfitz @alandonovan @matloob @ianthehat

@dmitshur dmitshur added Proposal NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Oct 9, 2018
@gopherbot gopherbot added this to the Proposal milestone Oct 9, 2018
@mvdan
Copy link
Member

mvdan commented Oct 9, 2018

Is the only concern about the regular expression its speed? If so, I'd rather improve the speed of regexp instead. There are some performance issues for the package already, so one of them may already cover this particular expression: #11646 #26623 #21463

I realise that the regex package may take a while to be fast even for this one case, but I don't think there's a sense of urgency to have a faster implementation of it in x/tools.

@alandonovan
Copy link
Contributor

Can the predicate be expressed as a function of the AST? Very often the tools that need this predicate have already parsed the file.

package astutil

// Generated reports whether the file was generated by a program,
// not handwritten, following the conventions described in #13560.
// The syntax tree must have been invoked with the ParseComment flag.
// Example:
//   f, err := parser.ParseFile(fset, filename, parser.ParseComment|parser.ImportOnly)
//   if err != nil { ... }
//   g := Generated(f)
func Generated(f *ast.File) bool { ... }

It seems like a good fit for the x/tools/go/astutil package since it doesn't add any dependencies.

@alandonovan
Copy link
Contributor

alandonovan commented Oct 9, 2018

Regular expressions are great for experimenting with complicated patterns, but you don't need the regexp package for an expression this trivial:

strings.HasPrefix(line, "// Code generated ") && strings.HasSuffix(line, " DO NOT EDIT.")

@mark-rushakoff
Copy link
Contributor

This POSIX-compliant grep command has worked fine for my needs in shell scripts:

grep -Exq '^// Code generated .* DO NOT EDIT\.$' "$file"
# Exits 0 if $file matches.

In my case, I'm not concerned about a false positive of that line occurring in a block comment or in a raw string literal.

@dmitshur
Copy link
Contributor Author

dmitshur commented Oct 9, 2018

Is the only concern about the regular expression its speed?

No. I wanted to avoid the regexp dependency because it was easy to implement the matching behavior myself.

The regexp package can still be optimized independently of this code.

Can the predicate be expressed as a function of the AST?

Good question. I'd be curious to try. I expect it should be possible, but I'm not sure if it'd be faster.

Note that not only comments would have to be checked in the AST, but raw string literals as well. A .go file with the following code needs to be reported as a positive match for the pattern:

package p

import "fmt"

func Foo() {
	const s = `this is a raw string literal. it happens to contain
some text in column 1, including a string that matches
the "Code generated ... DO NOT EDIT." comment, like so:
// Code generated  DO NOT EDIT.
to product a correct result, it needs to be detected`
	fmt.Println(len(s))
}

Since it does in fact contain a line of text that matches the regular expression ^// Code generated .* DO NOT EDIT\.$.

This POSIX-compliant grep command has worked fine for my needs in shell scripts:

Indeed. It's very quick and easy to implement a parser for https://golang.org/s/generatedcode format in any language that has support for regular expressions. This package is meant to be available for use by tools that are written in Go, and prefer to avoid incurring the cost of spawning a process or importing the regexp package.

In my case, I'm not concerned about a false positive of that line occurring in a block comment or in a raw string literal.

Those are not false positives. According to the spec, a file is considered generated if a matching line of text appears anywhere in the file, which can include block comments and raw string literals.

@mark-rushakoff
Copy link
Contributor

Those are not false positives.

Good to know that grepping this way shouldn't hit any false positives then :)

This package is meant to be available for use by tools that are written in Go

Anecdotally: my most frequent case for needing to check whether a file is auto-generated has been to manually filter through go list output in a shell script, to decide whether a file should be run through go fmt. So for me, it would be more helpful if I could use {{if .Generated}} as part of a template passed to go list -f. But changing go list is not this issue.

@rsc
Copy link
Contributor

rsc commented Oct 10, 2018

This would be a 1-declaration 1-line package. We try to avoid those. Please propose a new method or top-level function in an existing package, like probably go/ast.

@rsc
Copy link
Contributor

rsc commented Oct 17, 2018

Ping @dmitshur; see previous comment.

@rsc rsc added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 17, 2018
@dmitshur
Copy link
Contributor Author

dmitshur commented Oct 18, 2018

I agree with your comment @rsc that this would be a small package, and it's better to avoid that in x/tools.

I've been thinking about where this functionality could fit into existing packages, and I have some initial thoughts I can share. Please note this isn't a final proposal; just an iteration I want to share. I'm not completely happy with it and wouldn't want it implemented as is.

WIP Counter-Proposal Revision 1

Out of the existing packages, this functionality seemed to belong most in go/parser. The API surface of the change would be somewhat large. It would involve a new parser.Mode:

// A Mode value is a set of flags (or 0).
// They control the amount of source code parsed and other optional
// parser functionality.
//
type Mode uint

const (
	PackageClauseOnly Mode             = 1 << iota // stop parsing after package clause
	ImportsOnly                                    // stop parsing after import declarations
	ParseComments                                  // parse comments and add them to AST
+	GeneratedComment                               // determine whether the file contains a "// Code generated … DO NOT EDIT." line comment
	Trace                                          // print a trace of parsed productions
	DeclarationErrors                              // report declaration errors
	SpuriousErrors                                 // same as AllErrors, for backward-compatibility
	AllErrors         = SpuriousErrors             // report all errors (not just the first 10 on different lines)
)

When the mode is turned on, parser.ParseFile would perform additional work to determine if the .go file has a "// Code generated … DO NOT EDIT." line comment.

(There is some precedent to this in the go/build package, specifically the ImportComment import mode.)

In order for parser.ParseFile to report the result, ast.File in go/ast package would also need to be modified:

// A File node represents a Go source file.
//
// [...]
type File struct {
	Doc        *CommentGroup   // associated documentation; or nil
	Package    token.Pos       // position of "package" keyword
	Name       *Ident          // package name
	Decls      []Decl          // top-level declarations; or nil
	Scope      *Scope          // package scope (this file only)
	Imports    []*ImportSpec   // imports in this file
	Unresolved []*Ident        // unresolved identifiers in this file
	Comments   []*CommentGroup // list of all comments in the source file
+	Generated  *Comment        // a "// Code generated … DO NOT EDIT." line comment; or nil
}

At first, this seemed like the most appropriate place in existing packages to add this functionality. However, after considering the details more closely, it doesn't seem like a good fit.

The "// Code generated … DO NOT EDIT." comment lives in parallel to the actual AST structure of a file. According to the spec, such a comment can appear inside a block comment, and it would still count. It can also appear within a raw string literal, and match. The work that must be done to determine if a .go file has such a comment has very little to do with parsing the actual AST of a .go file.

WIP Counter-Proposal Revision 2

An alternative proposal to consider is to add this functionality to go/build package in the following way. Add a GeneratedComment import mode, similar to the existing build.ImportComment:

 // If ImportComment is set, parse import comments on package statements.
 // Import returns an error if it finds a comment it cannot understand
 // or finds conflicting comments in multiple source files.
 // See golang.org/s/go14customimport for more information.
 ImportComment
+
+// If GeneratedComment is set, determine which .go files contain
+// a "// Code generated … DO NOT EDIT." line comment,
+// and populate GeneratedGoFiles with the results.
+// See golang.org/s/generatedcode for more information.
+GeneratedComment

The result would be populated in a new build.Package field:

type Package struct {
	// ...

	// Source files
	GoFiles          []string // .go source files (excluding CgoFiles, TestGoFiles, XTestGoFiles)
	CgoFiles         []string // .go source files that import "C"
	IgnoredGoFiles   []string // .go source files ignored for this build
	InvalidGoFiles   []string // .go source files with detected problems (parse error, wrong package name, and so on)
+	GeneratedGoFiles []string // .go source files containing a contains a "// Code generated … DO NOT EDIT." line comment; populated only if GeneratedComment mode is set
	CFiles           []string // .c source files
	// ...

This approach seems to be cleaner than adding something to go/parser and ast.File, but I'm not sure if it's good either. The information about whether or not a file is generated feels like it should belong at the file level, not the package level.

There's also the factor that golang.org/x/tools/go/packages is meant to be used instead of go/build by now, so adding something new to go/build doesn't seem like a good idea.

A "WIP Counter-Proposal Revision 3" section can/should be added where a similar approach as in revision 2 can be evaluated for the golang.org/x/tools/go/packages package.

Conclusion

I would lean towards either putting this proposal on hold (or waiting to iterate on it further), or outright declining it. I made the proposal because I was planning to move the existing package, and wanted to check if there'd be a more fitting place for it in x/tools. It seems the existing package doesn't meet the criteria for inclusion.

Additionally, given how easy it is to parse the generated comment with just regexp, there isn't a high demand for this functionality as far as I can tell (e.g., github.com/shurcooL/go/generated currently has only 1 known public importer), so it's unclear if the additional work of integrating it into an existing package is justified at this time.

@dmitshur dmitshur removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 18, 2018
@griesemer griesemer self-assigned this Oct 31, 2018
@griesemer
Copy link
Contributor

@dmitshur Let's discuss this in person. Per @golang/proposal-review we're ok with adding a utility function iff we can decide on a good place for it and a good API.

@griesemer griesemer changed the title proposal: x/tools/go/generated: parser for https://golang.org/s/generatedcode format x/tools/go/generated: parser for https://golang.org/s/generatedcode format Oct 31, 2018
@griesemer griesemer modified the milestones: Proposal, Unplanned Oct 31, 2018
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/158817 mentions this issue: cmd/go: copy missing bit of documentation about code generated comment

gopherbot pushed a commit that referenced this issue Jan 22, 2019
This CL attempts to restore the clarity of the original specification
at https://golang.org/s/generatedcode that the line may appear
anywhere. It is preferable (for human readability), and most common
for it to be early in the file, but that is merely a convention, not
a strict well-specified requirement. Document it as so.

Background

Issue #13560 was a proposal define a standard for marking files as
generated, one that is suitable to be recognized both by humans
and machine tools. It was accepted, and the final specification
was documented at https://golang.org/s/generatedcode. Its text,
copied exactly:

	Generated files are marked by a line of text that matches
	the regular expression, in Go syntax:

		^// Code generated .* DO NOT EDIT\.$

	The .* means the tool can put whatever folderol it wants in there,
	but the comment must be a single line and must start with Code generated
	and end with DO NOT EDIT., with a period.

	The text may appear anywhere in the file.

The https://golang.org/s/generatedcode link points to a comment
in a very large GitHub issue. That makes it harder to find.
Issue #25433 was opened about moving that information somewhere else.
It was resolved via CL 118756, which added text to cmd/go documentation
at https://golang.org/cmd/go/#hdr-Generate_Go_files_by_processing_source:

	To convey to humans and machine tools that code is generated,
	generated source should have a line early in the file that
	matches the following regular expression (in Go syntax):

		^// Code generated .* DO NOT EDIT\.$

The CL description noted that "This change merely moves that
information to a more visible place." The intention was to preserve
the specification unmodified.

The original specification was very clear that "The text may appear
anywhere in the file." The new text in cmd/go documentation wasn't
very clear. "A line early in the file" is not a precise enough criteria
to be recognized by a machine tool, because there isn't a precise
definition of what lines are "early in the file".

Updates #13560
Updates #25433
Updates #28089

Change-Id: I4e374163b16c3f972f9591ec2647fd3d5a2dd5ae
Reviewed-on: https://go-review.googlesource.com/c/158817
Reviewed-by: Rob Pike <[email protected]>
@dmitshur dmitshur added Proposal-Accepted NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Proposal-Accepted labels Mar 9, 2019
dmitshur added a commit to shurcooL/go that referenced this issue Jul 4, 2019
This package has been moved to a new home. Its new import path
is dmitri.shuralyov.com/go/generated. Deprecate this old copy
in favor of the new one.

Replace the implementation with a forwarding stub.

Updates golang/go#28089
@rsc
Copy link
Contributor

rsc commented Jan 13, 2021

I'm reluctant to add more fields to structs - it seems like a bit overkill.
And adding GeneratedGoFiles to go/build means removing those files from GoFiles, which would break existing users.

It seems like we could use the current parser with PackageClauseOnly | ParseComments
(or the whole file if you are not concerned about how much gets parsed)
and then just provide a

package ast

func IsGenerated(f *ast.File) bool

What do you think?

@rsc
Copy link
Contributor

rsc commented Jan 13, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@dmitshur
Copy link
Contributor Author

dmitshur commented Jan 20, 2021

The suggestion in #28089 (comment) seems good to me so far. I'll elaborate on some minor details below.

Since there's no error return value, I understand it would be a pre-condition that the input provided to IsGenerated has been computed correctly. That is, IsGenerated will return false even if the original file was actually generated, but the AST parser wasn't configured correctly (e.g., it left out ParseComments). This should be fine as long as it's documented and there's a simple usage example.

I think using *ast.File instead of something like io.Reader or filename string (what the dmitri.shuralyov.com/go/generated API uses) makes more sense after the spec redefinition in #41196, since the entire file no longer needs to be scanned.

I wrote this package primarily as an exercise to test that it would be possible to implement a parser for the spec proposed in #13560, since that was an important goal. I haven't yet used it anywhere. pkg.go.dev reports one known importer in github.com/fatih/faillint/faillint (CC @fatih), see faillint.go#L115-L119. It's an analysis.Pass that has access to parsed AST, so this supports that func IsGenerated(f *ast.File) bool is a good signature.

@rsc
Copy link
Contributor

rsc commented Jan 20, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc changed the title x/tools/go/generated: parser for https://golang.org/s/generatedcode format go/ast: add IsGenerated Jan 27, 2021
@rsc rsc changed the title go/ast: add IsGenerated go/ast: add func IsGenerated(*File) bool Jan 27, 2021
@rsc
Copy link
Contributor

rsc commented Jan 27, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Feb 15, 2021
@rsc rsc moved this to Accepted in Proposals Aug 10, 2022
@rsc rsc added this to Proposals Aug 10, 2022
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/487935 mentions this issue: go/ast: add IsGenerated(*File) predicate

@adonovan adonovan self-assigned this Apr 23, 2023
@dmitshur dmitshur modified the milestones: Unplanned, Go1.21 May 20, 2023
@rsc rsc removed this from Proposals Apr 24, 2024
@golang golang locked and limited conversation to collaborators May 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Proposal Proposal-Accepted Proposal-FinalCommentPeriod
Projects
None yet
Development

No branches or pull requests

8 participants