r/oilshell Feb 12 '22

Comment about type checking

(Reddit is so bad at saving comments I have to test this out in a top level post)

Hm I think you are right in the sense that I could probably just start hacking on the AST and things will start working ... however I don't have that much confidence that I'll be satisfied with the end result :-) This is replacing something on top of MyPy, which is messy and reaching its limits (partly due to my hacks, partly due to MyPy's non-public AST representation). While MyPy is a much more sophisticated system than what I need, I think it suffers from some tech debt that they wish they could go back and fix.

So I kind of want to have a picture up front if possible. I like hacking on someone else's implementation first -- I did that for a Python lexer / parser / bytecode compiler / bytecode VM about ~5 years ago and it was very instructive (Cobbling Together a Python Interpreter). Although I was naive about how fast it would be, and how hard it would be optimize, which is why I'm working on the type checking problem now!

I found a similar thread from 3 years ago, and a couple people mention implementing simple unidirectional type checkers:

https://old.reddit.com/r/ProgrammingLanguages/comments/87c5dw/what_are_the_common_ways_of_performing/


(part 2, working around reddit still)

I believe I can do everything unidirectionally, but as I'm reading TAPL, it seems like there are a lot of subtle tradeoffs and interactions. Languages are very mutually recursive and interdependent, it seems like type systems are even more so (?) . So trying to add one feature could change the whole architecture.

The one issue that seems to come up a lot is the combination of subtyping and List<T>, which gives you the covariance / contravariance issue. And I think it comes up for function argument types and return types, as far as I remember.

And stuff like this: Java Is Unsound: The Industry Perspective (one of these is related to covariance, and there is another one that went unnoticed for a long time. I guess you could argue it doesn't matter if people didn't notice for so long :-/ )

Similar issue with TypeScript: https://news.ycombinator.com/item?id=15660015 . If memory serves, TypeScript actually made the "wrong" choice and corrected it with an optional flag in a later version? I remember there was a blog post from professor Jeremy Siek (who apparently invented gradual typing), and one of his students worked on TypeScript, and there was some controversy about that ... (I could be misremembering this)

I also remember watching a talk about MyPy, and someone asked Guido about this issue ... MyPy seems to give you the choice of covariant or contravariant lists as far as I remember. I might be able to avoid this issue since my code might not rely on it, but I think it actually does use subtyping and arg / return types.

Casting also seems like something that is tricky too.


Bottom line is that I would pay for a type checking addendum to Crafting Interpreters :) How much code do you think it would add to Lox? Lox has subtyping and dictionaries, so I think the covariance issue might come into play.

Even though I had covered a lot of the same material (except GC) through my Python interpreter experiment, it was still very useful to read everything cohesively explained, and the rationale behind the choices you made. As the book was coming out, I thought I knew most of it, but I learned something in every chapter. (I think we followed a similar evolution, as I was reading the Lua papers and reading the code ~10 years ago?)

That is, there's a lot of value in having the engineering approach. In TAPL, ALL the features seemed to be crammed into chapter 11 -- ascription, let bindings, tuples, records, sums, enums, recursion, lists, etc. And then there is also a chapter about mutable references -- so most of the type systems in the book don't even have mutable references! Which is funny because 99% of programs are written in such languages, even the type checkers themselves (OCaml). So I think there is too big a gap between theory and practice.

I think there's something of an analogy to parsing ... In academia they care about grammars and the generative approach, and interesting algorithms like LR parsing. But in engineering recursive descent covers a huge majority of cases, along with Pratt Parsing, which I don't believe is even mentioned in the Dragon book or Appel's books. Even though MOST front ends use pratt parsing, e.g. Clang and Lua, they don't even tell you it exists!

Similarly I think the textbooks are biased towards the "generative" / set-based view of type systems, but what programmers really care about are the algorithms -- checking and error reporting. It's funny that TAPL claims to be implementation focused, and it has been very helpful, but there's still a big jump to type checkers used in industry. It reminds me of the words "front end" and "back end" or "high level" or "low level". Those mean very different things to different people :) That is, what is "implementation focused" in academia is often not useful or not complete in engineering...

There's also some bias towards Hindley-Milner and unification (by Robinson), since those are "interesting" algorithms like LR parsing.


So I think there's still a lot of "implementer lore" that needs to be documented. Another place I've found that to be true is the relationship between types and precise GC, e.g. I had a hard time finding info like this in books: https://blog.mozilla.org/javascript/2013/07/18/clawing-our-way-back-to-precision/ And I have the GC handbook and have read a lot of it.

Overall, implementing languages has emphasized to me that "tinkering precedes theory" (I think Taleb's books talk a lot about this). For example, Java was developed before Cardelli's Object Calculus, and then others studied and formalized Featherweight Java, described in TAPL. I remember Guy Steele saying that he was paid to write the spec for Java, and he would go over to James Gosling's office and try to convince him to change some if statement somewhere in the code, because that would make the spec easier to write ...

So the implementers are actually the bigger force in software, and so in that sense you can argue that writing about implementations is more important or just as important than writing about theory!

3 Upvotes

3 comments sorted by

3

u/oilshell Feb 12 '22

In reply to this:

https://old.reddit.com/r/ProgrammingLanguages/comments/sq3z3u/good_textbook_with_implementations_of_oo_type/hwk4pv0/

Apparently I edited this long comment so many times that Reddit thought it was spam!

3

u/NoahTheDuke Feb 12 '22

I’m excited to read the book you write on this topic after you’re done with all of this bothersome research and implementation!

2

u/oilshell Feb 12 '22

Haha, if only I had infinite time and bandwidth! Still working on catching up the blog ...

It is weird that there seems to be a big hole; languages aren't as well understood as we think