Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax summary #11

Open
shelby3 opened this issue Sep 23, 2016 · 1,191 comments
Open

Syntax summary #11

shelby3 opened this issue Sep 23, 2016 · 1,191 comments

Comments

@shelby3
Copy link

shelby3 commented Sep 23, 2016

I will maintain in this OP a summary of current proposed syntax as I understand it to be. Note this is not authoritative, subject to change, and it may be inaccurate. Please make comments to discuss.

: Type is always optional.

  1. Sum, Recursive, Record, and Product parametrized data type with optional Record member names and optional Record (e.g. Cons) and Product (e.g. MetaData) names:

    data List<T> = MetaData(Nil | Cons{head: T, tail: List<T>}, {meta: Meta<T>})

  2. Typeclass interface and implementation:

    typeclass Name<A<B>, ...>   // or 'pluggable'?; optional <B> for higher-kinds¹?
      method(B): A<B>           // TODO: add default arguments, named arguments?
    
    List<A> implements Name
      method(x) => ...
    
  3. References:

    let x:Type = ...      // final assignment, not re-assignable (same as const in ES5/6)
    var x:Type = ...      // (same as let in ES5/6)
    
  4. Functions:

    Type parameters do not require declaration <A,B>.

    someCallback(x:Type y:Type(:Type) => x + y)     //            also  ():Type => x + y
    var f = x:Type y:Type(:Type) => x + y           // not named, also  ():Type => x + y
    f = x:Type y:Type(:Type) => x + y               // not named, also  ():Type => x + y
    let f(x:Type, y:Type):Type => x + y             // named f,   also f():Type => x + y
    let parametrized(x: A, y: B):A|B => x || y
    let parametrizedWhere(x: A, y: B):A|B where ... => x || y
    

    Note that iterator types can be specified for the return value to return a lazy list as a generalized way of implementing generators. The optional (:Type) is necessited for generator functions. Note the (x: Type y: Type): Type => x + y form is unavailable.

  5. Assignment-as-expression:

    if ((x = 0))      // force extra parenthesis where expected type is Boolean
    

† Not yet included in the syntax, as would be a premature optimization. May or may not be added.
¹ #10

@keean
Copy link
Owner

keean commented Sep 23, 2016

I am not sure I like pluggable for a type-class. If it's going to have that many letters either interface or typeclass would be better.

interface List<A>

I am not sure we want to use | for both sum types and union types.

I prefer having 'implementation' before the type-class having the type class first for implements seems inconsistent to me. I also prefer to treat all type class parameters equally. The first is not special so why give it special syntax.

implement List<A>

I am not sure why you put types in the function call syntax? I don't think you need or want them, you only want typed in function definitions.

I don't like that the method syntax is different from the function definition syntax. I think we should have a unified record/struct syntax. If we have:

data List<A> = List(
    append : (l1 : A, l2 : A) : A
)

data MyList
let mylist : List<MyList> = List(
    append : (l1 : MyList, l2 : MyList) : MyList =>
        ... definition ...
)

A record above is like a type-class but you can pass it as a first class value.

If we can 'promote' this to implicit use, we can have a single unified definition syntax. Maybe:

let list3 = mylist.append(list1, list2) // use explicitly
use mylist // make mylist implicit
let list6 = append(list4, list5) // use implicitly

@shelby3
Copy link
Author

shelby3 commented Sep 23, 2016

@keean wrote:

I am not sure I like pluggable for a type-class. If it's going to have that many letters either interface or typeclass would be better.

Can't be interface because it would be confused with the way interface works in many other OOP languages. To me as a n00b, typeclass means class, so more misunderstandings. pluggable has some meaning to a n00b such as myself. Sorry I am not an academic and they are only something like 0.01 - 0.1% of the population.

Q: "What is a pluggable API?"
A: "It means that you can replace the implementation."

I personally can tolerate typeclass.

I am not sure we want to use | for both sum types and union types.

Why not? Sum types are an "or" relationship. Unions are an "or" relationship.

I prefer having 'implementation' before the type-class having the type class first for implements seems inconsistent to me.

Inconsistent with what? implementation Thing Nil or implementation Nil Thing are not sentences and it is not clear which one is which. Nil implements Thing is a sentence and very clear which is the typeclass.

I am not sure why you put types in the function call syntax?

Afaik, I didn't. What are you referring to?

@keean
Copy link
Owner

keean commented Sep 23, 2016

Ah I see:

someCallback(x:Type,y:Type => x + y)

This is ambiguous... is it calling someCallback with 'x' as the first parameter and y => x + y as the second? This would seem less ambiguous:

someCallback((x:Type,y:Type) => x + y)

@shelby3
Copy link
Author

shelby3 commented Sep 23, 2016

@keean wrote:

This is ambiguous... is it calling someCallback with x as the first parameter and y => x + y as the second?

Good catch. I missed that one. It indeed conflicts with comma delimited groups in general, not just function calls. I will remove after sleep.

You didn't point out that problem to me when I suggested it. Remember I was trying to make the inline syntax shorter, to avoid the _ + __ shorthand problems.

Edit: there is another option (again :Type are optional):

someCallback(x:Type y:Type => x + y)

But that is still NFG! Because it is LL(inf) because without the leading ( it must backtrack from the space, unless we require Type to be a single token in that context (i.e. use type if need to define complex type as one token).

@keean
Copy link
Owner

keean commented Sep 23, 2016

Personally I would rather have a single syntax for function definitions. If that is (with Type optional):

let f = (x:Type, y:Type) : Type => x + y

Then passing to a callback would be:

someCallback((x:Type, y:Type) : Type => x + y)

and then things are consistent. I think keeping things short is important, but I think consistency is even more important.

@shelby3
Copy link
Author

shelby3 commented Sep 23, 2016

@keean the only point was to have an optional shorthand syntax (instead of the inconsistent semantics of _ + _ or the obfuscating _ + __) for inline functions and to get rid of those gaudy juxtaposed parenthesis someCallback((....

Thus we don't need : Type for the shorthand syntax, so I propose the following optional shorthand syntax which eliminates the LL(inf) problem as well:

someCallback(x y => x + y)

Which is shorter than and removes the garish symbol soup (( from:

someCallback((x,y) => x + y)

That being generally useful shorthand, enables your request for an optional syntax in the special case of single argument (which I was against because it was only for that one special case):

someCallback(x => x + x)

Instead of:

someCallback((x) => x + x)

However, it isn't that much shorter and the reduction in symbol soup isn't so drastic, so I am wondering if it is violating the guiding principle that I promoted?

Short inline functions might be frequent? If yes, then I vote for having the shorthand alternative since it it would be the only way to write a more concise and less garish inline function in general for a frequent use case. Otherwise I vote against.

@keean
Copy link
Owner

keean commented Sep 23, 2016

Are we optimising too soon? I have implemented the basic function parser for the standard syntax, is that good enough for now? I think maybe we should try writing some programs before coming back to optimise the notation. I would suggest sticking to "only one way to do things" for now, because that means there is only one parser for function definitions, which will keep the implementation simpler for now. What do you think?

@shelby3
Copy link
Author

shelby3 commented Sep 23, 2016

Thanks for reminding me about when I reminded you about when you reminded others to not do premature optimization.

I agree with not including the shorthand for now. Then we can later decide if we really benefit from it. I'll leave it in the syntax summary with a footnote.

@keean
Copy link
Owner

keean commented Sep 23, 2016

The compiler can now take a string like this

id = (x) => x
id(42)

compile it to:

id=function (x){return x;};id(42);

Next thing to sort is the block indenting, and then it should be able to compile multi-line function definitions and application.

@keean
Copy link
Owner

keean commented Sep 23, 2016

I think we should have an provisional section, so we can split the syntax into currently implemented, and under consideration.

@shelby3
Copy link
Author

shelby3 commented Sep 23, 2016

@keean wrote:

I think we should have an provisional section, so we can split the syntax into currently implemented, and under consideration.

I'll do if the † instances become numerous enough to justify duplication.

@shelby3
Copy link
Author

shelby3 commented Sep 23, 2016

Link to discussion about unification of structural lexical scope syntax.

@keean
Copy link
Owner

keean commented Sep 23, 2016

Are we sure having keywords let and var is the right way to go? If we have keywords for these we might want to have a keyword for functions? I quite like Rust's fn for introducing function definitions?

@shelby3
Copy link
Author

shelby3 commented Sep 23, 2016

@keean wrote:

Are we sure having keywords let and var is the right way to go? If we have keywords for these we might want to have a keyword for functions? I quite like Rust's fn for introducing function definitions?

Instead I have proposed unified functions around let and var.

What would be the alternative to not having let and var? I can't think of one that makes any sense. How would you differentiate re-assignment from initialization? Remember we already decided we can't prefix the type for reference initialization, because types are optionally inferred.

@shelby3
Copy link
Author

shelby3 commented Sep 24, 2016

@keean wrote:

I think structs/objects would probably start with an upper case letter.

Agreed.

My suggestions on types of name tokens for the lexer to produce:

  • data and pluggable (or typeclass?): [A-Z]+[a-z][a-zA-Z]*

    (start uppercase, at least one lowercase, only alphabet)

  • named function references: [a-z][a-zA-Z]*

  • non-function and unnamed function references: [_a-z]+

  • type parameters: [A-Z]+

The exclusivity for type parameters for all uppercase is so they don't have to be declared with <A,B...>.

Edit: the distinction between named functions and non-functions references will be useful, because unnamed functions references should be rarer. However, I was incorrectly thinking that it wouldn't make any sense to give function naming to unnamed function references (which have re-assignable references) because the reference would indicate it is for a function but I had the incorrect thinking the reference could be reassigned a non-function type (but reference types can never change after initial assignment). So I think it would be safe to change the above to:

  • named and unnamed function references: [a-z][a-zA-Z]*
  • non-function references: [_a-z]+

The other advantage of that is the lexer can tell the parser to expect a function, which is more efficient and provides two-factor error checking.

Note the compiler must check that the inferred type of the reference matches the function versus non-function token for the name.

@keean
Copy link
Owner

keean commented Sep 24, 2016

(Aside: Very few languages have a clean lexer and often you end up with lexer state depending on compiler state (string literals are a classic example). One of the advantages of parser combinators like Parsec is that you can write lexer-less parsers, and that cleans up the spaghetti of having the lexer depend on the state of the parse. )

  • If we do not introduce type-variables, we need to have different cases for type variables and types.
  • I don't like camel case :-( and prefer values and functions to_be_named_like_this.
  • There are not enough cases, as I would like to have something different for variables, types, and type-classes...

Conclusion, nothing is going to be perfect.

My favourite would be:

datatypes and typeclasses : [A-Z][a-zA-Z_0-9][']+
functions and variables : [a-z][a-zA-Z_0-9]
[']+

This would have both type variables and value variables lower case.

I like the mathematical notation of having a 'prime' variable:

let x' = x + y

@shelby3
Copy link
Author

shelby3 commented Sep 24, 2016

Comment about function syntax. Edited the OP to reflect this change.

@keean note where is already documented for functions in the OP.

@shelby3
Copy link
Author

shelby3 commented Sep 24, 2016

@keean wrote:

  • If we do not introduce type-variables, we need to have different cases for type variables and types.

Agreed.

For readers, by "If we do not introduce type-variables" you mean if we do not prefix <A, B, ...> in front of functions. I am not proposing to remove that when it is a suffix of a type name.

  • There are not enough cases, as I would like to have something different for variables, types, and type-classes...

You have a point, but it is not an unequivocal one. We can require typeclasses begin with a lowercase i or uppercase I followed by a mandatory uppercase letter. If we choose the lowercase variant, we can disallow this pattern for function names.

It not only helps to read the code without syntax highlighting (and even 'with', if don't want rainbow coloring of everything), it also speeds up the parser (because the lexer provides context).

Very few languages have a clean lexer and often you end up with lexer state depending on compiler state (string literals are a classic example)

If the string literal delimiters are context-free w.r.t. to the rest of the grammar, then the lexer can solely determine what is inside a string literal and thus not parse those characters as other grammar tokens (aka terminals). Edit: the proposed paired delimiters will resolve this issue.

I believe if the grammar is context-free (or at least context-free w.r.t. to name categories) this will reduce conflation of lexer and parser state machines. That is why I suggested that we must check the grammar in a tool such as SLK, so that we can be sure it has the desirable properties we need for maximum future optimization. I am hoping we can also target JIT and ZenScript become the replacement for JavaScript as the language of the world. Perhaps the type checker for our system will be simpler than subclassing and thus performant. Even Google now realizes that sound typing is necessary to solve performance and other issues.

One of the advantages of parser combinators like Parsec is that you can write lexer-less parsers, and that cleans up the spaghetti of having the lexer depend on the state of the parse.

I still need to come up to speed on the library you are using to know what I think about tradeoffs. Obviously I am in favor of sound principles of software engineering, but I really can't comment about the details yet due to lack of sufficient understanding. I will just say I am happy you are working on implementing and I hoping to catch up and also look at other aspects you may or may not be thinking about.

I don't like camel case :-( and prefer values and functions to_be_named_like_this.

The _ is verbose (also symbol soup) and I try to avoid where ever I can. I try to use non-function references that are single letters or words. But function references very often can't be single words. Also calling methods with . gets symbol soup noisy when there are also _ symbols in there. I do understand that camel case for values (references) is similar to the camel case that is in type names and only difference being the proposed upper vs. lowercase first letter (and then further overloaded by the i variant of the proposal above for distinguishing typeclasses); but this is irrelevant because function names do not appear in typing declarations (unless we opt for nominal typing of functions which I am not sure what that would mean).

Note I had a logic error in my prior comment, in that single word function and non-function names were indistinguishable in what I proposed. But that doesn't destroy the utility of the cases where function names are camel case.

datatypes and typeclasses : [A-Z][a-zA-Z_0-9][']+

I want to make what I think should be a convincing rational point about proper names.

I don't like _ in type names. For me a type name should read like a proper name, headline or title where each word has its first letter capitalized. We don't put such punctuation in a title normally in English. Simulating spaces with _ is ugly symbol soup. It removes the elegance of a title. It is better to just keep the first-letter capitalization and smash together without the spaces. Instead you prefer to remove the first-letter capitalization and convert spaces to _, which is removing the first-letter capitalization attribute of a title which is the sole attribute that differentiates a proper name from other forms of English. Spaces are not the differentiating attribute of proper names. If you instead proposed to retain first-letter capitalization after each _, you would have a more consistent argument (but I would still argue that the _ is noise symbol soup redundancy since have the camel case to distinguish words).

So I can objectively conclude your preference is not consistent to types as proper names, headlines, or titles, which is what they are.

<joke>You are British, so you should be more proper than me, lol.</joke> Although my last name is "Moore" and first name was a family name "Shelby" originating from north England meaning "willow manor". And I've got "Hartwick" (German), "Primo" (southern France/Italian) and "Deason" (diluted Cherokee native American) ancestry as well.

I like the mathematical notation of having a 'prime' variable:

let x' = x + y

I don't think I have an objection to this as a suffix only. Why not allow unicode subscript characters as well?(Edit: we have array indices for this)

Edit: however one issue with camel case and no underscores is when an entire word which is an acronym is not delimited by the capitalization of the word which follows it, e.g. NLFilter (for NL as an acronym for newline). In that example, I might prefer to name it NL_Filter, i.e. the underscore only allowed when it follows and is followed by a capitalized letter.

@shelby3
Copy link
Author

shelby3 commented Sep 24, 2016

@keean wrote:

datatypes and typeclasses : [A-Z][a-zA-Z_0-9][']+
functions and variables : [a-z][a-zA-Z_0-9][']+

You didn't differentiate from ALLCAPS type parameters above. Also your regular expression seems incorrect, as + means 1 or more. Perhaps you are employing a syntax that is peculiar to your Parsec library?

Note that JavaScript allows $ in names, so if we want full interoperability then we need to allow it. Perhaps there are other ways we could work around and support interoperability with the $? Note JavaScript also supports some Unicode, but if we support that we are allowing ZenScript source code to resemble Dingbats art. Perhaps we should only allow $ and Unicode in names that have been declared as FFI?

So the ' will be emitted to JavaScript names as $prime same as for PureScript because it (nor the correct  ′ symbol) is not a valid character in identifier names? Or we could convert these to single and double x̿ (x̿) overline characters (or single and double vertical line above) characters which are valid for JavaScript identifiers names. Should we also offer the π, τ, , , (or more correctly gamma γ), 𝑒, and φ symbols or entire Greek alphabet αβγδεζηθικλμνξοπρςτυφχψω as identifier names since they are valid for JavaScript? Ditto double-struck alphanumerics 𝕒𝕓𝕔𝕕𝕖𝕗𝕘𝕙𝕚𝕛𝕜𝕝𝕞𝕟𝕠𝕡𝕢𝕣𝕤𝕥𝕦𝕧𝕨𝕩𝕪𝕫𝔸𝔹ℂ𝔻𝔼𝔽𝔾ℍ𝕀𝕁𝕂𝕃𝕄ℕ𝕆ℙℚℝ𝕊𝕋𝕌𝕍𝕎𝕏𝕐ℤ𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡, mathematical gothic 𝔞𝔟𝔠𝔡𝔢𝔣𝔤𝔥𝔦𝔧𝔨𝔩𝔪𝔫𝔬𝔭𝔮𝔯𝔰𝔱𝔲𝔳𝔴𝔵𝔶𝔷𝔄𝔅ℭ𝔇𝔈𝔉𝔊ℌℑ𝔍𝔎𝔏𝔐𝔑𝔒𝔓𝔔ℜ𝔖𝔗𝔘𝔙𝔚𝔛𝔜ℨ (also 𝖆𝖇𝖈𝖉𝖊𝖋𝖌𝖍𝖎𝖏𝖐𝖑𝖒𝖓𝖔𝖕𝖖𝖗𝖘𝖙𝖚𝖛𝖜𝖝𝖞𝖟𝕬𝕭𝕮𝕯𝕰𝕱𝕲𝕳𝕴𝕵𝕶𝕷𝕸𝕹𝕺𝕻𝕼𝕽𝕾𝕿𝖀𝖁𝖂𝖃𝖄𝖅), and mathematical script 𝓪𝓫𝓬𝓭𝓮𝓯𝓰𝓱𝓲𝓳𝓴𝓵𝓶𝓷𝓸𝓹𝓺𝓻𝓼𝓽𝓾𝓿𝔀𝔁𝔂𝔃𝓐𝓑𝓒𝓓𝓔𝓕𝓖𝓗𝓘𝓙𝓚𝓛𝓜𝓝𝓞𝓟𝓠𝓡𝓢𝓣𝓤𝓥𝓦𝓧𝓨𝓩 (also 𝒶𝒷𝒸𝒹ℯ𝒻ℊ𝒽𝒾𝒿𝓀𝓁𝓂𝓃ℴ𝓅𝓆𝓇𝓈𝓉𝓊𝓋𝓌𝓍𝓎𝓏𝒜ℬ𝒞𝒟ℰℱ𝒢ℋℐ𝒥𝒦ℒℳ𝒩𝒪𝒫𝒬ℛ𝒮𝒯𝒰𝒱𝒲𝒳𝒴𝒵)?

Here is what I arrive at now in compromise:

  • type parameter: [A-Z][A-Z0-9]*
  • data: (?:(?:(?:[A-H]|[J-Z])[A-Z]*)|I)[a-z][a-zA-Z0-9]*[']*
  • typeclass:I[A-Z]+[a-z][a-zA-Z0-9]*[']*
  • function references: [a-z_$][a-zA-Z_0-9$]*[']*
  • non-function references: [a-z_$][a-z_0-9$]*[']*

I like the leading I on typeclasses, so we capture the notion they are interfaces without conflating the keyword with the incompatible semantics of interface in other programming languages.

Edit: no need to allow uppercase in non-function references. Who on God's earth is using camel case for variable (i.e. non-function) reference names? 😆

@shelby3
Copy link
Author

shelby3 commented Sep 24, 2016

@keean wrote:

Please no all caps, it's like shouting in my text editor :-(

👀

Type parameters will nearly always be a single letter. We both must compromise to what is rational. I have compromised above forsaking required camel case on functions. I also compromised (well more like I fell in love once we eliminated need for subclassing syntax) and accepted Haskell's data unification of Sum, Product, Record, and Recursive types.

I don't want the noise of declaring <A, B...> on functions. That is egregiously more DNRY noisy, than any choice between uppercase and lowercase single letters. Function declarations are too cluttered.

Also the lowercase letter choice for type parameters is not idiomatic and it is has no visual contrast in the x: a arguments. You can't compare to Haskell, because Haskell puts the function type declaration on a separate line. Sorry the lower case type names don't work once we merge typing into the same line.

Also type parameters are types, thus they should not be lowercase. That would be inconsistent with our uppercase first-letter on all types.

The lowercase type parameters of Haskell (combined with lack of <>) still causes me to not be able to read Haskell code quickly. It took me many attempts at learning Haskell where I failed, because of differences like that from mainstream Java, C++ languages.

If you are making a Haskell language, I don't think it will be popular. I am here to make a popular language, thus I will resist you on this issue.

One of my necessary roles here is to provide the non-Haskell perspective.

Let's do something very cool and eliminate the need to declare <A, B...>. We need advantages to our language in order to attract love and attention. Programmers love DNRY.

ML prefixes type variables with a character (happens to be 'a but could be anything)

😧

I absolutely hate that. First time I saw that, I was totally confused. And I hate Rust's lifetime annotations littering the code with noise. I don't like Haskell and ML syntax. Not only am I lacking familiarity (not second-nature) with their syntax, but I dislike much of the syntax (and even some of the concepts) of those academic languages for logical reasons which I have explained in prior comments. I realize their target market is the 0.01 - 0.1% of the population that are academics (and what ever subset of that which are programmers). If you want to bring in most of the syntax and the obtuseness from those languages, then I think we have different understanding of what the mainstream wants.

I am not a verbal thinker. I always score higher on IQ tests that are measuring visual mathematical skills, rather than verbal skills. My I/O engine is weaker than my conceptual thought engine (I think this is why I get fatigued with discussions because my I/O engine can't keep up with my thoughts). My reading comprehension of English is 99th percentile, but my articulation and vocabulary are in the high 80s or low 90s. So apparently I dislike complex linguistic computation. I seem to struggle more with sequencing or the flattening out what I "see" in multi-dimensions into a sequential understanding. My math and conceptual engine is higher (more rare) than 99th percentile, but not genius.

So someone with more highly developed linguistic computation than myself, would probably find my desire for linguistic structure to be arbitrary and unnecessary. I've been working on my weakness, but I do find it takes energy away from my thought engine, which is where I feel more happy and efficient.

Also I note you want to get rid of the type parameter list, you do realise sometimes you need to explicitly give the parameters if they cannot be inferred like:

let x = f<Int>(some_list)

Please differentiate between function declaration and function call.

I had written about that 3 days ago:

It is much less noisy and often these will be inferred at that function call site, so we won't often be doing f<Int,Int,Int>(0, 1) so the explicit correspondence to <A, B, C> [on function declaration] probably isn't needed for aiding understanding.


@keean wrote:

This also makes me realise another ambiguity in our syntax, it is hard to tell the difference between assigning the result of a function and assigning the function itself as a value. In the above you might have to pass to the end of the line to see if there is a '=>' at the end. This requires unlimited backtracking which you said you wanted to avoid.

Please catch up with recent corrections to the syntax.

@keean
Copy link
Owner

keean commented Sep 24, 2016

@shelby3 wrote:

I absolutely hate that. First time I saw that, I was totally confused. And I hate Rust's lifetime annotations littering the code with noise.

So I am totally with you on the above. The problem is without introducing the type variables, how do we distinguish between types and variables, for example:

let x(a : A, b : B) : C

Are they single letter types, or type variables?

We often want to re-use type variables like 'A' a lot consider:

let f(x : A) =>
    let g(y : A) =>
        // is 'y' the same type as 'x' ?

The problem with making type variables all uppercase is it does not distinguish type names. Do we insist that all type names have more than one letter?

@shelby3
Copy link
Author

shelby3 commented Sep 24, 2016

@keean wrote:

Are they single letter types, or type variables?

Type variable per the regular expressions I proposed.

💭 I see you are preferring "type variables" to the term "type parameters". I suppose this is to distinguish from function parameter (arguments).

Do we insist that all type names have more than one letter?

Yes. However...

I see now our conflict in preferences. I am thinking type names should be informational; single letter proper names are extremely rare and not self documenting, so I thought it was okay to just not allow them. You are apparently thinking of supporting math notation in code. Which is evident by your data R example and your suggestion to allow ' at end of all names.

Mainstream programmers typically don't (or rarely) want do math notation in code.

In my proposal they can still get math notation with data R' instead of data R.

💡

I think there is another solution which would give you single-letter data, and keep my desire to eliminate the <A, B...> declaration noise. When a data type is intended, then for the first mention of the single-letter, put x: data.R. If the first mention is a product (tuple) or record constructor in code, then data.R(...) or data.R{...}.

🔨

And when there is a single-letter data name in conflict with a type parameter in scope, then I think we should have a compiler warning that must be turned off with compiler flag. The warning should tell the programmer to use data. prefix if that is what is intended (which turns off the warning) else use compiler flag or remove the conflict from scope. Or alternatively, we could not allow single-letter data names in scope, unless compiler flag is turned on.

Would that solve the problem for you? I don't think the single-letter data will be used by most or often, so those who need it can pay this slight verbosity and special case cost, so that everyone else can enjoy brevity and simplicity more often.

@shelby3
Copy link
Author

shelby3 commented Sep 24, 2016

@shelby3 wrote:

Also I note you want to get rid of the type parameter list, you do realise sometimes you need to explicitly give the parameters if they cannot be inferred like:

let x = f<Int>(some_list)

Please differentiate between function declaration and function call.

I had written about that 3 days ago:

It is much less noisy and often these will be inferred at that function call site, so we won't often be doing f<Int,Int,Int>(0, 1) so the explicit correspondence to <A, B, C> [on function declaration] probably isn't needed for aiding understanding.

There is a problem remaining. The order of the type parameters in the optional <...> list on function calls may be ambiguous? I think we can adopt the rule that it is the order in which they appear in the function declaration. (Edit: I propose instead alphabetical order so that the programmer has more flexibility to order them so that the implicit ones can be first on function call <…> annotations, and this will also defeat some refactoring bugs.)

⚠️

Edit: and that leads to a very obscure and probably very rare programmer error, in that if not all the type variables are specified in the argument list (i.e. some are only in the where clause) and if some change is made to the where clause which doesn't change call site type, but changes the order of the type variables. But all programming languages have some sort of rare obscure programmer errors.

Edit#2: and note it should be quite odd and extremely rare that the programmer wants to constrain at the call site, a type variable that is not in the argument list or result value. Also the following function call is much more informational than f<Int,Int,Int>(0, 1):

💡

f(x:Int, y:Int): Int

And allows us to specify only some constraints:

f(x:Int,y):Int

And it is more consistent with the syntax of function declaration.

Of if we prefer:

(Int)f((Int)x,y)

So maybe we can disallow <A,B...> on functions (declaration and call site) except for specifying type variables which don't appear in the argument list or result? Which should be almost never.

@keean
Copy link
Owner

keean commented Sep 24, 2016

The number and function of the type parameters is not the same as the number and type of the arguments, some type parameters may only occur in the where clause. Consider:

f<A>(x : B) where Subtype<B, A>

Note, Rust would not allow this, as you have to introduce all type parameters, which makes them less useful.

Really we have to have the type parameters if we want to have parametric types (that is types that are monomorphisable). If we are happy to give up monomorphisation we can have universally quantified types instead, and then there is no need to have type parameters at all.

In some regards I would prefer universally quantified types from a purely type system perspective, but it is much easier to implement monomorphisation with parametric types.

If you really want to get rid of the type parameters, then lets switch to universal quantification.

@keean
Copy link
Owner

keean commented Sep 24, 2016

I would rather say minimum two letters the second of which must be lower case for datatypes, and all caps for type variables.

Also we can use universal quantification to get rid of type parameters (although it does change what types are valid in the type system).

@keean
Copy link
Owner

keean commented Sep 24, 2016

I would suggest lexical scoping for type variables, so in my example above the A would be the same for both.

I think this satisfies the principle of least surprise.

@shelby3
Copy link
Author

shelby3 commented Sep 24, 2016

@keean wrote:

The number and function of the type parameters is not the same as the number and type of the arguments, some type parameters may only occur in the where clause. Consider:

f<A>(x : B) where Subtype<B, A>

Did you not read the comment of mine immediately before yours?

I also explained that exact issue and offered a solution.

@keean
Copy link
Owner

keean commented Sep 24, 2016

Here's an interesting one, we need to write the type of a function, and we agree function definition should be an expression.

let x : (Int, Int) : Int = x y => ...

This should be possible too because I will want to pass functions to other functions:

let f(g : (Int, Int) : Int) : Int =>

@shelby3
Copy link
Author

shelby3 commented May 6, 2017

but it's no good for typing at programming speed

Even with it working, it’s no good for typing at programming speed. But I bet we could write a utility which programmers could download to get their system to do easy mnemonic keystrokes.

Also custom "click keys" keyboards are really something a serious programmer should have. These can be configured to dedicate a key for a custom action such as a Unicode character. They even sell customized key tops with the character laser printed on it.

@shelby3
Copy link
Author

shelby3 commented May 8, 2017

Regarding recent upthread discussion of JSON vs. XML which began far upthread, after further thought I agree it would be better to model for example HTML with the full generality our programming language such as actual data types (e.g. Div() for <div>); and maybe even we will eventually develop a better DOM (personally I’d like an improved markdown for text and using program language data types for an improved DOM). In terms of a textual serialization format, the advantage of JSON is a simplistic least common denominator with ad hoc sematics in order to simplify usage in varied environments. I would say we not try to improve on JSON for that use case. I think we should however consider having our language itself be a textual serialization format, if we think it is going to be that ubiquitous and if we’ve fixed the script injection attack holes. As for security, the code which is compiling and running code dynamically (i.e. at runtime) need only restrict the imports of libraries to cordon off the access permissions of the dynamically run code.

@shelby3
Copy link
Author

shelby3 commented May 30, 2017

shelby3/Lucid#1

@shelby3
Copy link
Author

shelby3 commented Jun 3, 2017

Indented blocks instead of curly brace-delimited blocks is one of (the lesser of) the significant motivations for me to create a transpiler instead of coding in TypeScript. The motivation was to reduce the noisy clutter of curly braces and gain more lucid structuring of the code for maximizing readability in the open source era. I have formed I think a firm opinion that indented (i.e. recursive) blocks within expressions constitutes code that is too much crammed into one conceptual unit and thus difficult to read. Also it creates a dangling terminating ) (see also and also) and the interaction of state machines between the parser and lexer becomes more complex.

  1. I want to back off slightly from another of the significant motivations—the everything as an expression concept—in order to prevent blocks within expressions that should either be all on one line or contain only some line continuations (and no indented blocks) in order to discourage cramming too much, e.g. prevent for function calls. Specifically the block-indented variant of those expressions which also have a non-block-indented variant (including the one line and block-indented variant of function definition expressions), should only be allowed at the left margin of blocks and on the RHS of top-level assignment expressions (and thus not passed as assignment to function arguments including not for operator expressions). By top-level, I mean not contained with any function call or operator expression. In assignment expressions, the left margin of the block for the block-indented variant of else is forced to the column of the if (the parser communicates this mode to the lexer):

    x := if y
           doA(y)
         else
           doB()
    

    The main use case above is for assigning to a const variable. In other cases where one would need for example block-indented if-else as an inline expression, one could either assign it to a variable and use the variable instead where the value remain constant, or wrap the block-indented if-else in a function and call it for each instance (noting a smart compiler could optimize away these as inline functions in the future).

  2. Line continuations could only be allowed on certain tokens and with some structuring. For example, they could be allowed after comma tokens (and before operator tokens) and must align to the start of one of the (comma or operator delimited) operands on the prior line(s) of the expression (the parser communicates this mode to the lexer).

  3. One exception could be made (but not yet sure if I will) for the last argument of a top-level function call, thus which may be a block-indented variant, but then the trailing ) for the function call is not written. Note it is possible that I might choose to only support this for block-indented function definition expressions. The motivation for this exception is to enable for function calls that resemble language syntax (following filter and map example employs the concept that . operator may be used to call the RHS operand with the first argument being the LHS operand and the for is overloaded by its arguments for different operations such as map and filter as an alternative to Python’s list and generator comprehensions):

    mapIt := x →                                       : ValueType → MappedData
      …
      MappedData(…)        // note I am proposing no `new` required for instantiating data
    iter.for(x → x.attribute ≠ true, x → mapIt(x))
    iter.for(x → x.attribute ≠ true, x →               : ValueType → MappedData
      …
      MappedData(…)
    

    I might prefer the syntax:

    iter.for: x → x.attribute ≠ true, x →              : ValueType → MappedData
      …
      MappedData(…)
    

    Tangentially note that if we supported a short-hand lambda syntax (but I think we decided (also) not to), that could be written:

    iter.for(_.attribute ≠ true, x → mapIt(x))
    

    But not:

    iter.for(_.attribute ≠ true, mapIt(_))
    
  4. Because of rules in #​2, a block indent of more than 2 spaces would not be allowed in the case of block-indented if. Thus block indents of more than 2 spaces could not be used consistently in the source code, so I am thinking the block indent will be forced to be consistent at 2 spaces which maximizes readability across a diverse set of programmers (analogous to standardizing on the de facto international language). @keean last year you had expressed (also and here) the desire for programmer flexibility in choosing a block indent. What motivating reason do you have other than programmer obstinance? White-space is useful yet also a limited resource, and any programmer who can not see 2 spaces of indent is also probably incapable of seeing other details in the source code.

Edit: unless for modifies the source, then it must input a second iterator for the destination container, else it would require HKT in order for the input iterator to have an iterator factory API (i.e. refer to the container type parametrised by value type) (Edit#3: apparently a factory can be achieved w/o HKT for the concrete container type, e.g. Array<Int>, but a generic function which maps the ELEM type will depend on the caller to supply a factory for second concrete container type), yet TypeScript (and thus initially our language) does not support HKT. Also afaics, HKT forms of typeclasses are fugly, odd, and may have corner case issues.

Note for could return the destination iterator rewinded to support functional chaining, and assuming the intermediate edits are discardable then the same destination iterator can be reused subsequently as both the mutable source and destination so we definitely need both forms of the API else writing generic code in nested function hierarchies without HKT would be problematic due to needing to passing down the function hierarchy so many instances of destination container iterations. Note a filter algorithm can not modify the source unless mutable iterators support delete operations; which may be a violation of the algebraic semantics of iterators which prevent corner case issues?

Edit#2: note the option (c.f. the 1 footnote at the linked post) for potentially allowing semicolons.

@keean
Copy link
Owner

keean commented Jun 3, 2017

One reason to allow flexible indenting is to allow alignment with characters, for example:

let a = x +
        y + z // we want this to line up to the right of the '='

@shelby3
Copy link
Author

shelby3 commented Jun 3, 2017

@keean that is a line continuation and is supported by the #​2 rules in my prior comment post excepting that I specified the line breaks be before the operators. I was thinking of && and || for the conditional-boolean-expression of a block-indented if. Perhaps for other operators, the line continuation breaks should come after the operator.

The proposed 2 spaces rule (with the contextual exceptions for #​1) for block indenting is for delineating blocks not for line continuations.


Edit: another advantage of forcing operators to be at the end of lines (that are followed by line continuations) is that they can thus never appear at the start of line. Thus we could reuse one of them such as | for identifying comment lines, which I think is superior to a single space of extra indent because (other than perhaps at column position 1) comment lines would not be aligned with block indents and would not be delineated well enough. However, comments at the end of lines containing code on the LHS, would still require // or one of those Unicode suggestions such as ¦. Maybe it is better to be consistent, but the | looks much cleaner:

 my multi-line
 comment
somecode …
| my multi-line
| comment
somecode …
// my multi-line
// comment
somecode …

The ¦ is also clean but difficult to type:

¦ my multi-line
¦ comment
somecode …

@keean
Copy link
Owner

keean commented Jun 3, 2017

If we specify line breaks after operators, then we always know when the line is continued. If before there are some circumstances when it can be ambiguous. JavaScript has this problem and needs semi-colons in certain circumstances.

@keean
Copy link
Owner

keean commented Jun 3, 2017

If we specify line breaks after operators, then we always know when the line is continued. If before there are some circumstances when it can be ambiguous. JavaScript has this problem and needs semi-colons in certain circumstances. I think you are okay with && and ||, but it could be confusing to make exceptions for them.

@shelby3
Copy link
Author

shelby3 commented Jun 3, 2017

If before there are some circumstances when it can be ambiguous.

Agreed. I also know about that. A context-free grammar prevents those ambiguities (one of the reasons I am writing the LL(k) grammar and checking it with SLK), so any such issue would have some to light when checking the grammar.

In this case however, I think the grammatical ambiguity can not occur regardless whether the line continuation is allowed before or aft the operator, because the enforced block indenting differentiates a new expression from a line continuation.

I think you are okay with && and ||, but it could be confusing to make exceptions for them.

Agreed. Good point. In the past with curly brace-delimited blocks, it seemed advantageous to not have them hanging off the far RHS at the end of lines given that conditional expressions can be quite long at times, and having at the start of lines helped to distinguish them as line continuations versus block indents (when the opening curly brace { would be at the end of the line prior to the start of the block). But with a consistent 2 spaces for block indents, perhaps it is not necessary to have them at the start of line (and actually might be more noisy and unaligned/inconsistent with them at the start of the line continuations).

@shelby3
Copy link
Author

shelby3 commented Jun 3, 2017

@keean made the point to me in private that we will need imperative control over iterators to achieve some more complex algorithms such as a binary search because of the need to treat iterators as points instead of ranges. I agreed and pointed out that I think I prefer my prior example HOF (higher-order function) approach for the range cases because it is less verbose and I think perhaps a smart compiler can in some cases convert those HOF variants to imperatively optimized code (e.g. filter and map algorithms), thus indicating that imperatively expressing them would be boilerplate. In general converting all HOF encoded algorithms to imperative algorithms is not practical, so I am not proposing that.

For extensibility of the concept beyond some hard-coded optimization in the compiler for filter and map, I ponder if there is some way a HOF library could teach the compiler how to imperatively optimize the HOF userland (aka non-library) code employing the library. If it is boilerplate, then the transformation should be deterministic and probably can be done with an AST macro. @keean claimed we never need AST macros. Maybe we never want AST macros in the abstraction layer above the language (due to debugging obfuscation, enables unreadable non-standard syntax, etc), but below the language perhaps they are essentially compiler plugins. Hmm.

@shelby3
Copy link
Author

shelby3 commented Jun 8, 2017

I am modifying my original proposed syntax for import.

Given the proposed simplification of one module per file, then we do not need the hierarchical . syntax I had proposed. So we can use a string to identify the module, which TypeScript’s import also does. This is also compatible with my intent for all references to modules to be repositories similar to Rust Crates and have the file management be handled mostly automatically and as a separate concern. Differences from the TypeScript syntax are the string precedes the list of imports and the . is used as the separator instead of from. No side-effect import modules.

When the non-type annotation uses of the imports are always accessed via a (const aka :=) reference to the which the import was assigned as a required qualified namespace, then our compiler should annotate the imported type annotations in the transpiled TypeScript with typeof (how noisy! wish TypeScript had an automated mode that could be toggled) so that TypeScript will not automatically require the module. The module will be loaded asynchronously (via a Promise) to the said reference namespace. To access (even the same as optionality) exports from the same module in a different namespace (or no namespace), then employ multiple import. At least if they are adjacent in the code, the compiler will be smart enough to optimize the transpiled output so that the reference is aliased (copied) instead of asynchronously loaded more than once and that qualified namespace references only used in type annotations do not actually load the module in the transpiled code.

@keean
Copy link
Owner

keean commented Jun 8, 2017

Personally I don't want one module per file. I have done that in Java, and I didn't like it. You can end up with a directory full of very small files, which makes the code hard to read.. A module is about data hiding and API as well as separate compilation.

I would also not require asynchronous loading, it tends to make applications very slow, and people only tend to use it in development preferring to use tree-shaking tools like rollup to put everything in a single file for production (and minifying it too). I don't even use dynamic loading in development because it makes page load very slow.

@shelby3
Copy link
Author

shelby3 commented Jun 8, 2017

Personally I don't want one module per file. I have done that in Java, and I didn't like it. You can end up with a directory full of very small files, which makes the code hard to read.

We already discussed your disagreement on this issue. Java’s case was more egregious because as you pointed out that Java requires the filename named according to the class thus forcing a proliferation of files, and I have not proposed such a limitation.

I understand that if modules will be quite small then a proliferation of small files means more files to open instead of just scrolling down. But in theory an IDE can solve this problem and offering to scroll vertically across multiple modules making them appear as though they are in one file. Did no Java IDE ever offer such an improvement? (if not, what has happened to our profession, that programmers are not able to meet the needs of huge ecosystem markets!)

Adding multiple modules within files requires adding a module keyword and a indented-block, and complicates the import syntax. Additionally if require that modules are not qualified by file name, then modules can be moved to different files and repositories which afaics will further complicated matters on many levels (including version control coherence).

A module is about data hiding and API as well as separate compilation.

And thus I see no valid reason to have more than one in the same file. A module is a unit of modularity, and thus separating modular items into files makes for clean repository and version control system.

One module per file is compatible with JavaScript and TypeScript. The least amount of unnecessary discord with the target output language is very desirable.

PL design is about extreme prioritization and leaving as much as possible out of the language and not including every little nuanced option we might want to throw in the kitchen sink.

You recently stated that you are coding a lot in TypeScript. How are you coping with this “major downgrade” to one module per file?

I would also not require asynchronous loading, it tends to make applications very slow

Code should be loaded once and then run many times over the course of application, thus any overhead of a Promise is irrelevant. If the code is on the local disk, then latency and load time should be rather insignificant. I think the overhead you must be referring to is the HTTP overhead of loading many small files over the wire (or even perhaps local file system latency for accessing many small files)? In that case, the compiler can take care of optimizing by merging small files as you noted (and which I was aware of and already contemplated in my design) and handling this automatically, but the syntax of the programming language need (and should) not be specialized:

preferring to use tree-shaking tools like rollup to put everything in a single file for production (and minifying it too)

We do not want specialization in the source code. K.I.S.S. principle (do not eat the complexity budget). I think you are putting design in the wrong abstraction layer:

I don't even use dynamic loading in development because it makes page load very slow.

@keean
Copy link
Owner

keean commented Jun 8, 2017

You recently stated that you are coding a lot in TypeScript. How are you coping with this “major downgrade” to one module per file?

Typescript allows multiple namespaces in the same file. It all depends on what you specifically mean by a "module".

Code should be loaded once and then run many times over the course of application, thus any overhead of a Promise is irrelevant.

Local disc latency is still a problem. You should have a play with dynamic loading tools like SystemJS, obviously it has to load over HTTP, how else can JavaScript load anything (unless you are talking about running in Node.js rather than in the browser, then you can output 'require' statements and have dynamic loading, but it won't work in the browser).

We do not want specialization in the source code. K.I.S.S. principle (do not eat the complexity budget). I think you are putting design in the wrong abstraction layer:

If you compile to TypeScript, then you can use the output options to choose CommonJS, ES6, SystemJS for loading, so it can work in Node or in the browser with dynamic loading or bundled. You would just use the normal TypeScript 'import' statements.

@shelby3
Copy link
Author

shelby3 commented Jun 9, 2017

Typescript allows multiple namespaces in the same file. It all depends on what you specifically mean by a "module".

And afaics it is not really recommended to combine them. It is adding complexity and I do not understand the claimed big win of doing so?

I know sometimes perhaps we need to express privacy access inter-module, so if we supported that feature then we might like to have more than one module in a file (even though it still would not be absolutely necessary). But I did not yet propose offering that feature and probably want to avoid it unless we really need it.

If you compile to TypeScript, then you can use the output options to choose CommonJS, ES6, SystemJS for loading, so it can work in Node or in the browser with dynamic loading or bundled. You would just use the normal TypeScript 'import' statements.

Are you just stating that to document in detail your understanding of what I wrote I would do? Or is there some other reason you mention the details of my solution?

The compiler will optimize it. If I have to fork TypeScript, then so be it. But hopefully I can work within the feature set of TypeScript.

@keean
Copy link
Owner

keean commented Jun 9, 2017

Are you just stating that to document in detail your understanding of what I wrote I would do?

You wrote you wanted to use promises to dynamically load modules. I stated you were focusing on the wrong abstraction layer and should just output static import statements and let typescript handle it (or implement the same kind of compiler options as typescript if going straight to JavaScript).

Edit: I think I see what you mean, you want to represent module loading as a promise in the source language, not the target language. This is problematic because it means modules must be first-class entities. This means that all module dependencies and relationships need to be expressible in the core type system so that modules a typeable value. This is possible, and is something I have been looking at doing, but it requires careful selection and combining of type system features to make sure the type system is sound.

@shelby3
Copy link
Author

shelby3 commented Jun 9, 2017

@keean wrote (in private):

I believe programs should be readable, and that means statically readable, like when printed on paper. What makes programming languages more powerful than graphical application builders is we can use the language centres of our brains to process them. To facilitate that we must be able to navigate a program like we find sections in a book. This means a static layout, not dynamic and changing. Predictability is important, because we can auto-pilot things that are predictable, and it causes minimal cognitive load.

I was thinking about this also before I read your message. I then thought for those who want these groupings of modules, we could offer a grouping keyword and they could place this in a separate file. A smart editor would know when loading the said file, to also load the referenced modules as a linear text view. With this separation-of-concerns, we K.I.S.S. principle the base format for modules keeping them coherent with EMCAScript modules (one per file) and prevent clutter of module keyword and block-indents in module files. Yet we can also then orthogonally meet the need of those who want a static grouped view of related modules. And this interops better with version control in that the grouping changsets are orthogonal to the module changesets.

Win-win design via separation-of-concerns.

I understand it looses the simplicity of being able to see the linear text grouping with any unspecialized text editor (e.g. Notepad). But it does not loose the power of reasoning about language orthogonal to GUIs. And if force everything to be literally copied in text, this is less powerful than structure by named reference, e.g. we would prefer import by named reference and not by copying the entire text of an external module into the top of every module that references a module.

@shelby3
Copy link
Author

shelby3 commented Jun 10, 2017

Are you just stating that to document in detail your understanding of what I wrote I would do?

You wrote you wanted to use promises to dynamically load modules. I stated you were focusing on the wrong abstraction layer and should just output static import statements and let typescript handle it (or implement the same kind of compiler options as typescript if going straight to JavaScript).

Edit: I think I see what you mean, you want to represent module loading as a promise in the source language, not the target language. This is problematic because it means modules must be first-class entities. This means that all module dependencies and relationships need to be expressible in the core type system so that modules a typeable value. This is possible, and is something I have been looking at doing, but it requires careful selection and combining of type system features to make sure the type system is sound.

I understand maybe I did not sufficiently explain my proposal.

Copying TypeScript’s method where all imported typings from the module are resolved statically and if none of the runtime code of the module is accessed then nothing is loaded at runtime. Any runtime code (and state) is loaded via an asynchronous model before it can be accessed (either via an explicit assignment of the import to a reference in our language's source code or otherwise an implicit such reference which appears in the transpiled output code). And runtime loading must be optimized (e.g. for some target environments, the compiler might even have everything statically loaded or load modules in large amalgamated chunks).

I do not understand why you think this introduces type soundness issues? TypeScript does not seem to think so? Perhaps you are thinking that the types in the module would first-class? I did not intend to propose that. I am proposing that only the object of (non-private) runtime objects (e.g. the constructor functions and other functions and any state) is first-class; and its type is determined by the items imported. The types are statically resolved. Does that clear up the issue? Perhaps you have a more complex model of module dependencies in mind? Essentially my proposal is just runtime linking and afaics should not have any impact on the type system.

@keean
Copy link
Owner

keean commented Jun 10, 2017

A module is a unit of data hiding like an object. It doesn't necessarily have anything to do with files. See 'ML modules'.

@shelby3
Copy link
Author

shelby3 commented Jun 10, 2017

A module is a unit of data hiding like an object.

Also type hiding.

It doesn't necessarily have anything to do with files.

Why have files? Let’s just put the entire program in one file. Or better yet, let’s put all programs that were ever created and will ever be created in one file. Obviously modularity has nothing to do with files.

I think the more salient question is not whether we need more than one module per file, rather whether we need more than one file per module? I am approaching modules conceptually as units of reusable repositories in an open source ecosystem similar to Rust Crates or npmjs.

See 'ML modules'.

ML must be one of the most popular languages. Why are we creating a language. There is already an OCaML to JavaScript transpiler. I believe we discussed that typeclasses can be simulated with OCaML’s features. One of the reasons I provided was because we would lose integration with JavaScript such as Promises and the concepts and syntax are too foreign. I basically wanted to home in on a very elegant K.I.S.S. design that would not be too obtuse for a mainstream, popular language. 80/20 prioritization.

So instead of appealing to authority of ML, could you instead provide examples of which features where our modules will require multiple modules per file or multiple files per module? Because I am not really healthy enough to expend the effort to research ML modules and try to figure out myself which uses cases dictate such.

Edit: modularity-of-encapsulation might not be equivalent to modularity-of-reuse. A module-of-reuse might need more than one module-of-encapsulation, i.e. that each of modules of encapsulation is incomplete without each other. Yet this is expressed by typing and the consumer of the module-of-reuse has to import all the required modules of encapsulation. Yet maybe we wish to be able to import all as a grouping. This is easy to do. A module can import other modules, which satisfies my prior suggestion for a way to group modules. As I previously suggested, a smart editor can offer an option to display multiple grouped modules in a single vertical scrolling amalgam.

The reason to put each module-of-encapsulation in a separate file is so that it can be referenced orthogonally by another module-of-reuse, which reuses only some of the modules of encapsulation. K.I.S.S.. rather than add complexity. If a lot of small files makes it difficult to see the modules of reuse that group these small modules of encapsulation, then put them in a separate directory (folder) in the file system.

@shelby3 shelby3 mentioned this issue Jan 8, 2018
@shelby3
Copy link
Author

shelby3 commented Jun 24, 2018

This is LL(k=2) thus far. Only 1 token of lookahead. I will update this as I progress to add the rest of the grammar. Eventually this will be added to a new Zer0 repository.

/*
This file can be checked with shell command: ./slk -Ga -k=2 grammar

The lexer MUST:
   •  Not allow tab characters.

   •  Never issue the '<' token if it’s preceded by whitespace, because `<` (i.e. ' < ') is a distinct token for the “less than” operator to distinguish it from the use of the '<' token in other contexts in a context-free grammar.

      This resolves the LL(∞) ambiguity between the the “less than” operator and the `type-params` production. This is preferrable than wasting the `[` and `]` square bracket pair for type parameters. This also has the benefit of preventing the “less than” operator without whitespace, e.g. `1 < 2` is preferred instead of `1<2`.

      For consistency, never issue the '>' token if it’s preceded by whitespace, because `>` (i.e. ' > ') is a distinct token for the “greater than” operator to distinguish it from the use of the '>' token in other contexts in a context-free grammar. For further consistency, never issue the '<' token if it has trailing whitespace, so that formatting is consistent with the '>' token not allowing leading whitespace. However, do send the '>' token when it has trailing whitespace.

      For consistency, never issue the '(' token if it has trailing whitespace nor the ')' token if it’s preceded by whitespace. It’s extraneous whitespace anyway.

   •  Discard blank lines (including those with comments).

   •  Error if line contains only horizontal whitespace.

   •  Error if empty line (without a comment) follows INDENT or line(s) containing only comments that followed INDENT.

   •  Issue INDENT and OUTDENT tokens only for columns at multiple of 3 spaces, e.g. 6 spaces indenting issues two consecutive INDENT tokens.

   •  Not issue two consecutive INDENT that didn’t have an intervening CR because these are line continuations.

   •  Issue OUTDENT tokens for all in the INDENT stack before EOF.

   •  Issue one CR token before every token that starts a line and is positioned at the column of the current INDENT stack.

   •  Error if the token that starts a line is before the column of the current INDENT stack (and this is in the context of any OUTDENT(s) that have been issued for that line).

   •  Not require the token to be enclosed in spaces if it’s written in the grammar below enclosed in single quotes instead of backticks.

   •  The comma token ',' must have trailing whitespace followed by non-whitespace token (other than CR) or end the line (i.e. followed by CR).
*/

module:
   { CR member } EOF             // TODO: Optionally parameterised modules may require more than just prefixed with “[ pid-list ]”: https://github.com/keean/zenscript/issues/39#issuecomment-412850869

   member:
      data
      export                     // Separately list exports instead annotating them in declarations. This reduces clutter on declarations and makes the export attribute a separate concern. Some may protest that they forget to check two places in a file. Uncluttered readability is a higher priority.
      value
      variable

      data:                      // Not supporting typeclass interface bounds (aka `requires`) unless someone explains why associated type synonyms can’t suffice without associated data types: https://github.com/keean/zenscript/issues/8#issuecomment-412343352
         'data' TID [ pid-list ] [ struct-or-disjoint-union ]

         struct-or-disjoint-union:
            input-list type
            INDENT CR struct { CR struct }+ OUTDENT   // Disjoint union. I can’t think of any valid reason to have type bounds on the optional PIDs (constructor functions can be employed for that purpose).

            struct:
               TID [ input-list type ]    // If `type` has an output (aka return) type then it’s the type of a variant of the disjoint union which is a GADT:
                                          //    http://lambda-the-ultimate.org/node/1134#comment-44653
                                          //    https://en.wikipedia.org/wiki/Generalized_algebraic_data_type#Overview
                                          // Apparently GADTs are closed analogous to closed type families and thus not anti-modular: https://github.com/keean/zenscript/issues/8#issuecomment-410219702
                                          // Apparently though it’s associated types combined with existential quantification (not open associated type families) that cause anti-modularity: https://github.com/keean/zenscript/issues/8#issuecomment-410513143
                                          // And existential quantification may be less useful in many cases than union types (unions work with multi-parameter typeclasses and open to new typeclass bounds): https://github.com/keean/zenscript/issues/8#issuecomment-412332373
                                          // And because we can also employ explicit cast subsumption and supersumption of union types: https://github.com/keean/zenscript/issues/8#issuecomment-410894784 (c.f. bottom of post)

               input-list:
                  '(' ID [ `=` prefix4 ] { ',' ID [ `=` prefix4 ] } ')'

            pid-list:
               '<' PID { ',' PID } '>'    // PID is a type parameter:                     [A-Z]+

      export:
         'export' instance-or-type { ',' instance-or-type }

         instance-or-type:
            ID                   // ID is for values and variables including functions:   (?:[_a-z][_a-zA-Z0-9]*|[αβγδεζηθικλμνξοπρςτυφχψωℎℏ𝑒])(?:'*|′*)
            TID [ members ]      // TID is for data types and typeclasses:                [A-Z](?:_[A-Z])?(?:[a-zA-Z0-9]|[A-Z]_[A-Z])*[a-z0-9]
                                 // Since IDs/TIDs can’t begin with upper/lowercase respectively, an exported ID/non-exported TID can be compiled with the prefix “G”/“g” respectively in Go: https://tour.golang.org/basics/3

            members:
               '.*'                                                  // Export all member instances or constructors of the disjoint union with all their members.
               '(' instance-or-type { ',' instance-or-type } ')'     // Export specific member instances or constructors of the disjoint union.

statement:
   'break' [ 'if' infix14 [ 'else' expression ] ]
   'loop' [ 'if' infix14 [ 'else' expression ] ]
   'for' [ for-variant ] INDENT block OUTDENT
   'return' expression           // Shouldn’t be employed on the last expression of a function, so it’s consistent with not being used as last expression in the block of a conditional expression. It’s for short-circuiting to avoid writing a more complex conditional expression.
   expression
   value
   variable

   for-variant:                  // https://github.com/keean/zenscript/issues/11#issuecomment-402863461
      expression
      ';' [ expression ] ';' expression
      ID { ',' ID } in-or-opt-cond

      in-or-opt-cond:
         'in' expression [ ':' type ]
         `:=` expression [ ':' type ] ';' [ expression ] ';' [ expression ]

   block:
      { CR statement }+

   value:                        // Prevents re-assignment to LHS after construction. This is analogous to `const` in JavaScript, but not `const` in C++. It’s not the writable attribute the object assigned from the RHS— which is instead encoded in the `type`.
      ID `::=` typed-expr        // TODO: if ever we want to construct multiple ID values by destructuring the assignment to a `tuple` of ID, then move “ID `::=` typed-expr” to `expr-suffix` and compiler must parse for correct LHS.
      TID `::=` function         // Replaces the default type constructor and is only allowed in same module as declaration of the type.

      function:
         infix14 [ func-suffix ]

   variable:
      ID { ',' ID } `:=` typed-expr    // Optionally construct multiple ID variables with the same value. For destructuring a tuple, the TODO for `value` applies.

      typed-expr:
         expression [ type ]     // Compiler must check for syntax error if redundant type on function declaration expression. Type can also be an explicit cast which is required for subsumption and supersumption: https://github.com/keean/zenscript/issues/8#issuecomment-410499037

expression:
   infix14 [ expr-suffix ]

   expr-suffix:
      `=` infix14                                     // Compiler must check that the LHS is an expression that can receive an assignment.
      [ 'loop' ] 'if' infix14 [ 'else' expression ]   // Optional 'loop' here is for a single-expression, do-while loop.
      func-suffix

      func-suffix:
         `=>` function-decl                           // Pure function
         `->` function-decl                           // Impure function, aka a procedure
         // An additional optional short-hand function declaration is the interpretation of an expression: https://github.com/keean/zenscript/issues/6#issuecomment-248631325
         // For the function declarations above, the compiler must check that the LHS is an equivalent to an `input-list`.
         // The alternative (in the first version of this grammar) to structure the grammar more strictly (to only allow a tuple) was to employ some brackets other than parentheses for the LHS, but this looked unfamiliar (although it optionally allowed for eliminating the arrows but that looked even more unfamiliar). Keeping in mind Eric S. Raymond’s point about inbound learning curve attrition.
         // Note the `tuple` isn’t preceded by a list of PIDs because it would complicate the grammar given we want only one way to write a function declaration with constructed assignment; and because it’s noisy redundany on already very noisy function declarations.
         // The PIDs can be deterministically identified in the `type` of the function declaration and ordered alphabetically. PIDs aren’t often explicitly written in an ordered `pid-types` list on function calls anyway.

         function-decl:
            expression
            type [ interfaces ] INDENT block OUTDENT  // Lexer issues a CR after last OUTDENT. Thus if misapplied within another expression it’s a syntax error in the containing expression.

            interfaces:
               'requires' interface { ',' interface }

               interface:
                  TID [ '<' pid-or-associated-type-list '>' ]  // TID is a typeclass

                  pid-or-associated-type-list:
                     pid-or-associated-type { ',' pid-or-associated-type }

                     pid-or-associated-type:
                        PID
                        interface '.' TID             // TID is an associated type of a typeclass


   // The grammar is identical for left-to-right and vice versa, and the implicit grouping is interpreted during compilation of the AST.
   // Left-to-right precedence operators have only one implicit `)` on the right side of each operator.
   // Right-to-left precedence operators have only one implicit `(` on the left side of each operator.
// infix15:
//    infix14 { ID infix14 }     // Infix variant of function or method application.

      infix14:
         infix13 { '||' infix13 }

         infix13:
            infix12 { '&&' infix12 }

            infix12:
               infix11 { '|' infix11 }

               infix11:
                  infix10 { '^' infix10 }

                  infix10:
                     infix9 { '&' infix9 }

                     infix9:
                        infix8 { op9 infix8 }

                        op8:
                           `==`     // Don’t need `===` because no implicit conversions.
                           `!=`

                        infix8:
                           infix7 { op8 infix7 }

                           op8:
                              `<`
                              `<=`
                              `>`
                              `>=`

                           infix7:
                              infix6 { op7 infix6 }

                              op7:
                                 '<<'
                                 '>>'

                              infix6:
                                 infix5 { op6 infix5 }

                                 op6:
                                    '+'      // Don’t implicitly convert between types such as strings to numbers or vice versa, e.g. in JavaScript `"string" + 1 + 2 === "string12"`.
                                    '-'      // Implicit conversion would require a specific concatenation operator for each type, e.g. `"string" ++ (1 + 2) === "string3"` and `"string" ++ 1 ++ 2 == "string12"`.
                                             // The standard library should have a `To` typeclass: https://github.com/keean/zenscript/issues/39#issuecomment-399445596.

                                 infix5:
                                    prefix4 { op5 prefix4 }

                                    op5:
                                       '*'
                                       '/'
                                       '%'

                                    prefix4:
                                       { unary4 } instance

                                       unary4:
                                          '!'      // Right-to-left precedence.
                                          '-'      // Right-to-left precedence.
                                          '&'      // Right-to-left precedence; returns a pointer reference.

                                       instance:
                                          literal
                                          postfix3

                                          literal:
                                             INTEGER
                                             FLOAT
                                             STRING
                                             collection

                                             collection:
                                                '[' expression { ',' expression } ']'     // Type of the collection is inferred, an explicit type cast, or an explicit type of a constructed assignment.
                                                                                          // The element type might be inferred from the types of the elements.
                                                                                          // An alternative to explicit type annotation is to use the named constructor, e.g. `List(…)`, or for explicit element type, e.g. `List<…>(…)`.
                                                                                          // Any key names are the LHS of assigments in the element expression(s).
                                                                                          // In the absence of any inferred or explicit constraint, unification should default to `HashMap` if keys exist, else `Array`.

                                          postfix3:
                                             infix2 { [ pid-types ] apply }               // Optionally apply function or constructor. Assignments in the tuple are named arguments.

                                             pid-types:
                                                '<' { ',' } type { { ',' }+ type } '>'    // Optional list of types (allow empty entries) corresponding to the alphabetical order of PIDs in the `type` of the function declaration.
                                                                                          // Necessary for PIDs which can’t be inferred (or when not inferred as intended); the types may contain PIDs from lexical context.
                                                                                          // Also consider optionally allowing Unicode angled brackets ⟨⟩: https://github.com/keean/zenscript/issues/11#issuecomment-412888588

                                             apply:
                                                tuple
                                                ':' INDENT { CR expression }+ OUTDENT     // Indenting in-lieu of parentheses when the nesting is deep: https://github.com/keean/zenscript/issues/18#issuecomment-402733836

                                             infix2:
                                                container { access }

                                                access:
                                                   '[' expression ']'      // Keyed or indexed element of a collection.
                                                   '.' id-or-tid           // TID here is for module and/or typeclass qualifiers. ID is for data type member access not for typeclass members.

                                                container:
                                                   id-or-tid
                                                   tuple

                                                   tuple:
                                                      '(' [ expression [ ':' type ] { ',' expression [ ':' type ] } ] ')'     // Optional explicit type casts only allowed for arguments of function or procedure application.

@shelby3
Copy link
Author

shelby3 commented Jul 5, 2018

I wrote in 2016:

Do we support for? Only for iterators, or also in traditional JavaScript form? Do we have a for for enumerables?

Your ideas for syntax?

I think we should retain the ability to construct a var within a for, but I would argue it should have the semantics of ES5's let so the reference doesn't escape the for's scope.

I propose for Zer0 to adopt Go’s decision to only support for loops (c.f. also how to simulate while and do…while), and to expand the syntax options:

for                  // infinite loop; employ `break if condition` to terminate loop
for cond
for init; cond; post // any 1 or 2 of the 3 may be omitted except both `init` and `post`
for value in iterable
for key, value in iterable
for key, _ in iterable

The optional init variables (or constant values) only exist in the lexical scope of the for statement.

We will have iterators implemented in libraries instead of Go’s ranges:

foo :: (array) =>                 : [Int] => Int?
   last := null                   // : Int?; Dunno if will need to annotate this type or can be inferred
   if !array.isEmpty
      iter := iter(array)
      last = iter.value
      for next(iter)
         last = iter.value
   last
foo([1, 2, 3])

If instead are willing to sacrifice performance by making either iter or iter.value nullable, or allowing exceptions on illegal access for iter.value, then the code could be written more concisely:

foo :: (array) =>                 : [Int] => Int?
   last := null
   for iter := iter(array); iter; next(iter)
      last = iter.value
   last
foo([1, 2, 3])

So I think we should adopt a shorthand syntax sugar which our compiler will expand to the optimally performant encoding for the first example above:

foo :: (array) =>                 : [Int] => Int?
   last := null
   for value in array
      last = value
   last
foo([1, 2, 3])

Similar to Go, let’s also offer optional syntax for the key (aka index) for iterables that have it:

foo :: (array) =>                 : [Int] => (Int?, Int?)
   last, lastKey := null
   for key, value in array
      last = value
      lastKey = key
   (last, lastKey)
foo([1, 2, 3])

And when only the key is desired:

foo :: (array) =>                 : [Int] => Int?
   lastKey := null
   for key, _ in array
      lastKey = key
   lastKey
foo([1, 2, 3])

See the example which exemplifies how much more readable this is as compared to the traditional for in C, C++, JavaScript, and Java.

@shelby3
Copy link
Author

shelby3 commented Aug 4, 2018

I wrote on Quora:

Regularity is a more objectivized requirement we can strive for.

Aesthetics and style can be a matter of taste, but there are objective arguments. For example (at least vertical) whitespace is an anti-pattern, because not needing to scroll is more efficient. Another is that symbol (and/or one character names) soup is more difficult to parse. Soup meaning it dominates the context. Some symbols and one letter local variable names can be beneficial. Ditto parentheticals soup. Ditto overly verbose keywords and naming fatigue. Instead balance in context.
So regularity of indent levels is objectively superior. And by the whitespace is an anti-pattern objective, then indent levels should be 2 or 3 spaces.

I wish you would have linked to some of your more specific objective claims about style and other design choices.

I wrote on Quora:

Your answer ties into my comment about regularity:

https://www.quora.com/As-a-programmer-what-is-your-most-controversial-opinion-related-to-programming/answer/Tikhon-Jelvis/comment/69510824 (https://www.quora.com/As-a-programmer-what-is-your-most-controversial-opinion-related-to-programming/answer/Tikhon-Jelvis/comment/69510824)

You’re emphasizing readability. Regularity is significant factor in readability.

@shelby3
Copy link
Author

shelby3 commented Aug 14, 2018

I have updated the LL(k) grammar for the proposed syntax of Zer0. It’s not yet completed. Currently it’s still k = 2 which is one token of lookahead and no backtracking. I really don’t want backtracking because I want the fastest possible compiler.

I haven’t yet added the syntax for type system and typeclass. That's next to do.

Note one of the main changes in this update to the work-in-progress is making the function declaration syntax not so unfamiliar, which is also reflected in the edit of the examples in my July 6 post.

Note I have chosen angle brackets for type parameters because:

  • Contrary to the complaint about parsing ambiguities, I removed the ambiguity by forcing the less than usage to have a whitespace token on both sides and not allowing a space on both sides for all types of brackets. This enables me to reuse the angle left bracket for paired brackets, so as to not waste one of the three bracket pairs (<>, [], {}) available from standard English QWERTY keyboards. I would also like to have optional Unicode support for those who wish to use the real brackets ⟨⟩.

  • I’d like to use the square brackets for constructing collections and collection keyed (or indexed) access.

  • I don’t currently have any proposed usage for the curly brackets pair. Does anyone have any ideas? Remember Zer0 is to have a Python-like indenting instead of curly bracketed code blocks.

Does anyone have any opinion or objection about the choice of brackets?

Also I have -> for the impure procedure (but this is pure in Haskell) and => for pure functions (because = is more like a function). Would anyone prefer ~> for impure function? In which case there is also a Unicode equivalent .

P.S. I will probably start a new repository for Zer0 soon and perhaps on Gitlab instead of Github, since Microsoft acquired Github and I have observed Skype becoming more unusable after Microsoft acquired it.

@shelby3
Copy link
Author

shelby3 commented Aug 17, 2018

I wrote in the Subtyping thread:

@keean wrote:

The last syntax is more reasonable. I would prefer m.Monoid, m:Monoid or m=Monoid to indicate associating distinct identifiers with distinct subtypeclass instances of the same supertypeclass.

I was thinking of emulating 'Go' syntax as there does not seem to be a need for the ':' and it just adds noise if it is in every declaration.

Spaces are noise too. The . doesn’t occupy any more screen real estate and it’s more clear as to the semantics. Space is for separating separate things, not for joining related things. That is why I dislike Haskell’s syntax.

With spaces as the join operator then the reader has to consider the context to know what the implicit grouping is. That requires more mental effort and adds another grammar interpretation activity the brain has to do at the same time as interpreting the semantics of the code. It’s also why I hate LISP and Perl. And I also hate those aspects of the Go syntax (including the []Int which is highly irregular). Regularity and ease of grammar and lexical scope processing is very important to me.

Some of us have lower processing of grammar. I am one of those. I even have a mild dyslexia. I can rarely type something correctly the first time. I have high level of processing of concepts and anything that is already loaded into my head, but I am only in the 85 - 88% percentile in verbal I/O. My reading comprehension is high because I process concepts at a high level, but my I/O engine is not at a very high level. So thus I relate to an average programmer better when considering syntax choices. I would not relate well to a below average programmer though.

In terms of the PL conceptual paradigms, I am more apt to agree with choices made by those high IQ math gurus who like Haskell. Which is why for example, I have come to accept many of the paradigms in Haskell. But syntax….

I agree the Haskell syntax is less cluttered. But it’s too spartan and doesn’t read as efficiently for someone with not such high level of grammar processing capabilities. OTOH, I think Ceylon went too far the other direction with too many overly verbose words.

@shelby3
Copy link
Author

shelby3 commented Jun 21, 2020

I like camel case for at least for type names, and I think the following does make an argument for employing hyphens instead of underscores at least between uppercase camel humps are separate words:

https://yosefk.com/blog/ihatecamelcase.html

https://wiki.theory.org/index.php/YourLanguageSucks#Poor_Design

  • Camel case sucks:
    XMLHttpRequest
    HTMLHRElement
    

I think I also prefer camel case but with first letter lowercase for function and procedure names, because camel case employs less characters than hyphens or underscores for delimiting words.

When transpiling to languages which don’t support hyphens in names then need to convert to underscores and check for name collisions if underscores have also been allowed.

m_ext = jhbjh
m-ext = jhbjh
TCPIPConnector
TCP_IP_Connector
TCP-IP-Connector
myIP_FunctionName
myIP-FunctionName
myFunctionName
my_function_name
my-function-name

However the use of dashes instead of underscores doesn’t conserve any horizontal space with a monospace font and it resembles a crunched infix minus operator. So maybe I should just stick with underscores, if at least for type names?

P.S. The author of the aforelinked blog is also the author of the C++ Frequently Questioned Answers satire.


EDIT: another point about verbalization of case:

https://stackoverflow.com/questions/1740116/for-what-reason-do-we-have-the-lower-case-with-underscores-naming-convention/1740152#1740152

I guess you'll understand why relying on case sensitivity is a horrible idea only if and when you have to rely on a screen reader to read code out to you -- most screen readers do a horrible job at pinpointing case issues, and there's no really good way, no strong or easy convention to convert case differences to easy auditory clues (while translating underscores to "clicks", in a good configurable screen reader, makes it a breeze). For people without any visual impairments whatsoever, which is no doubt 90% or more, you don't need to care (unless you want to be inclusive and help people who don't share your gift of perfect vision... naah, who cares about those guys, right?!).

And being he was born in Russia, might explain why Yossi Kreinin hates camel case so much:

https://stackoverflow.com/questions/1740116/for-what-reason-do-we-have-the-lower-case-with-underscores-naming-convention/1740131#1740131

I've heard it stated in other contexts that words_with_underscores are easier to separate for non-native English readers than are wordByCamelCase. Visually it requires less effort to parse the separate, foreign words.

But I think much of this is perhaps pointless because names shouldn’t be so long and juxtaposed acronyms should be separated with an underscore:

https://stackoverflow.com/questions/1740116/for-what-reason-do-we-have-the-lower-case-with-underscores-naming-convention/1740364#1740364

Was it Meyers who came up with aLongAndTotallyUnreadableMethodeName vs an_even_longer_but_perfectly_readable_method_name comparison?

https://stackoverflow.com/questions/1740116/for-what-reason-do-we-have-the-lower-case-with-underscores-naming-convention/1744749#1744749

LowerCaseWithUnderScoresAreSuperiorBecauseTheTextYouNormallyReadInABookOrNewsPaperForExampleIsNotWrittenLikeThis.

Yet the following example convinces that underscores can be easier-to-read than camelCase in specific cases, even though a study found generally they’re roughly equivalent:

https://softwareengineering.stackexchange.com/questions/27264/naming-conventions-camelcase-versus-underscore-case-what-are-your-thoughts-ab/27293#27293

isIllicitIgloo

is_illicit_igloo

Camel case has much less symbol soupy noise:

https://whatheco.de/2011/02/10/camelcase-vs-underscores-scientific-showdown/

if ( thisLooksAppealing )
{
    youLikeCamelCase = true;
    votePoll( camelCaseFormatting );
}
else if ( this_looks_appealing )
{
    you_like_underscores = true;
    vote_poll( underscore_formatting );
}
else if ( this-looks-appealing )
{
    you-like-hyphens = true;
    vote-poll( hyphen-formatting );
}

In a block indenting, no curly brace style:

if thisLooksAppealing
    youLikeCamelCase = true
    votePoll( camelCaseFormatting )
elseif this_looks_appealing
    you_like_underscores = true
    vote_poll( underscore_formatting )
elseif this-looks-appealing
    you-like-hyphens = true
    vote-poll( hyphen-formatting )

[…]

  • Classes can be kept camel case, giving a clearer difference between them and identifiers/functions. E.g.: CamelRider.ride_camel() vs CamelRider.rideCamel().

[…]

  • Camel case is easier to type, and underscores are hard to type.
  • Camel case makes paragraphs easier to read. my_first_variable = my_second_variable-my_third_variable vs myFirstVariable = mySecondVariable - myThirdVariable
  • Camel case is shorter.
  • Camel case is used by convention in a lot of major languages and libraries.

Editor considerations:

https://csswizardry.com/2010/12/css-camel-case-seriously-sucks/

Hyphens work better in text editors

This has been true of every text editor I have used.

This is an odd one, but one that definitely, definitely irks me. I can’t Ctrl+Shift+[Arrow key] single words in a camel case string. Take the following screenshot:

Selecting camel case strings

Here I use the Ctrl+Shift+Left arrow keys to select chunks of text–rather than one character–at at time. The problem here is that the camel case string is treated as one word. What if I just wanted to select tweet and change it to facebook? I can’t do that with Ctrl+Shift+[Arrow key]


EDIT#2: here’s the decision I made thus far for the lexer.

Essentially I decided to prioritize hyphens instead of underscores. Yet I still prefer camel case over hyphens where possible, except that identifiers which are not functions nor procedures should be all lowercase and if multiple word, must employ underscores to distinguish them from functions and procedures types.

Thus:

XML-HttpRequest
HTML-HR-Element
 *  PID token is a type parameter:                              [A-Z]+
    ID token is for values and variables including functions:   (?:[αβγδεζηθικλμνξοπρςτυφχψωℎℏ𝑒]|(?:[a-z]|_[a-z])(?:[a-zA-Z0-9]|[a-z0-9](?:_|-)[a-z0-9])*)'*
    TID token is for data types, interfaces and constructors:   (?:[A-Z]|_[A-Z])(?:[a-zA-Z0-9]|[A-Z][A-Z]-[A-Z])+[a-z0-9]'*
    https://github.com/keean/zenscript/issues/11#issuecomment-647137141

    Non-exported IDs/TIDs begin with an underscore which is consistent with Pony. An exported ID/non-exported TID can be compiled with the prefix “G”/“g” respectively in Go: https://tour.golang.org/basics/3
    Private fields aren’t useful with type class interfaces.
    TID only allows hyphens between capital letters, at least one side of which comprises an acronym because our convention is that type names contain only capitalized words. Conversely our convention is that IDs may contain non-capitalized words and/or capitalized words except for first word.
    A TID must contain at least one non-capitalized letter to distinguish it from a PID.
    The compiler must enforce that an ID for a function or procedure type may contain capitalization and hyphens but not underscores other than the first character; whereas, an ID not for a function or procedure type may not contain capitalization nor hyphens but may contain underscores. This policy discourages use of underscores and making multiple word IDs which don’t have a function or procedure type, while also making a visual distinction between multiple word IDs that have a function or procedure type and those ID which don’t have the said type.
    Neither TID nor ID may terminate (a word) with a single uppercase letter nor an underscore nor hyphen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@jdegoes @shelby3 @skaller @keean and others