armchair_progamer

joined 1 year ago
MODERATOR OF
 

Background: the authors are developing a static analysis library (or perhaps framework) called Codex and publishing papers on it. This post summarizes their most recent paper, which got accepted to OOPSLA 2024. The full paper and an artifact (Docker container) are both linked, and Codex is on GitHub with a demo.

Excerpt:

One of the main challenges when analyzing C programs is the representation of the memory. The paper proposes a type system, inspired by that of C, as the basis for this abstraction. While initial versions of this type system have been proposed in VMCAI'22 and used in RTAS'21, this paper extends it significantly with new features like support for union, parameterized, and existential types. The paper shows how to combine all these features to encode many complex low-level idioms, such as flexible array members or discriminated unions using a memory tag or bit-stealing. This makes it possible to apply Codex to challenging case studies, such as the unmodified Olden benchmark, or parts of OS kernels or the Emacs Lisp runtime.

 

The language itself: https://crystal-lang.org/. Crystal is heavily inspired by Ruby but with static typing and native compilation (via LLVM). To make up for not being dynamic like Ruby, it has powerful global type inference, meaning you're almost never required to explicitly specify types. The linked "Notes on..." page gives much more details.

 

Abstract:

A computer program describes not only the basic computations to be performed on input data, but also in which order and under which conditions to perform these computations. To express this sequencing of computations, programming language provide mechanisms called control structures. Since the "goto" jumps of early programming languages, many control structures have been deployed: conditionals, loops, procedures and functions, exceptions, iterators, coroutines, continuations… After an overview of these classic control structures and their historical context, the course develops a more modern approach of control viewed as an object that programs can manipulate, enabling programmers to define their own control structures. Started in the last century by early work on continuations and the associated control operators, this approach was recently renewed through the theory of algebraic effects and its applications to user-defined effects and effect handlers in languages such as OCaml 5.

 

Background: What are denotational semantics, and what are they useful for?

Also: Operational and Denotational Semantics

Denotational semantics assign meaning to a program (e.g. in untyped lambda calculus) by mapping the program into a self-contained domain model in some meta language (e.g. Scott domains). Traditionally, what is complicated about denotational semantics is not so much the function that defines them; rather it is to find a sound mathematical definition of the semantic domain, and a general methodology of doing so that scales to recursive types and hence general recursion, global mutable state, exceptions and concurrency^1^^2^.

In this post, I discuss a related issue: I argue that traditional Scott/Strachey denotational semantics are partial (in a precise sense), which means that

  1. It is impossible to give a faithful, executable encoding of such a semantics in a programming language, and
  2. Internal details of the semantic domain inhibit high-level, equational reasonining about programs

After exemplifying the problem, I will discuss total denotational semantics as a viable alternative, and how to define one using guarded recursion.

I do not claim that any of these considerations are novel or indisputable, but I hope that they are helpful to some people who

  • know how to read Haskell
  • like playing around with operational semantics and definitional interpreters
  • wonder how denotational semantics can be executed in a programming language
  • want to get excited about guarded recursion.

I hope that this topic becomes more accessible to people with this background due to a focus on computation.

I also hope that this post finds its way to a few semanticists who might provide a useful angle or have answers to the conjectures in the later parts of this post.

If you are in a rush and just want to see how a total denotational semantics can be defined in Agda, have a look at this gist.

 

This presents a method to reduce the overhead of the garbage collector, in a language with multi-stage programming (specifically two-level type theory) using regions.

[–] armchair_progamer@programming.dev 3 points 1 month ago* (last edited 1 month ago)
 

Key Features

  • Multiple types (number, bool, datetime, string and error)
  • Memory managed by user (no allocs)
  • Iterator based interface
  • Supporting variables
  • Stateless
  • Expressions can be compiled (RPN stack)
  • Fully compile-time checked syntax
  • Documented grammar
  • Standard C11 code
  • No dependencies

Examples

# Numerical calculations
sin((-1 + 2) * PI)

# Dates
datetrunc(now(), "day")

# Strings
"hi " + upper("bob")  + trim("  !  ")

# Conditionals
ifelse(1 < 5 && length($alphabet) > 25, "case1", "case2")

# Find the missing letter
replace($alphabet, substr($alphabet, 25 - random(0, length($alphabet)), 1), "")
 

From homepage:

Hy (or "Hylang" for long) is a multi-paradigm general-purpose programming language in the Lisp family. It's implemented as a kind of alternative syntax for Python. Compared to Python, Hy offers a variety of new features, generalizations, and syntactic simplifications, as would be expected of a Lisp. Compared to other Lisps, Hy provides direct access to Python's built-ins and third-party Python libraries, while allowing you to freely mix imperative, functional, and object-oriented styles of programming. (More on "Why Hy?")

Some examples on the homepage:

Hy:

(defmacro do-while [test #* body]
  `(do
    ~@body
    (while ~test
      ~@body)))

(setv x 0)
(do-while x
  (print "Printed once."))

Python:

x = 0
print("Printed once.")
while x:
    print("Printed once.")

Interestingly programming.dev's Markdown renderer highlights ```hy code blocks. Maybe it knows the language (highlight.js has it). Maybe it's using Hybris (another language that could get its own post, one of its extensions is *.hy).

GitHub

Online REPL

1.0 announcement

 

GitHub (source code for all languages), also linked above.

The GitHub says "50 lines of code" but the largest example is 74 lines excluding whitespace and comments.

22
Dune Shell: bash + lisp (adam-mcdaniel.github.io)
 

Dune is a shell designed for powerful scripting. Think of it as an unholy combination of bash and Lisp.

You can do all the normal shell operations like piping, file redirection, and running programs. But, you also have access to a standard library and functional programming abstractions for various programming and sysadmin tasks!

screenshot

 

Fennel is a programming language that brings together the simplicity, speed, and reach of Lua with the flexibility of a lisp syntax and macro system.

  • Full Lua compatibility: Easily call any Lua function or library from Fennel and vice-versa.
  • Zero overhead: Compiled code should be just as efficient as hand-written Lua.
  • Compile-time macros: Ship compiled code with no runtime dependency on Fennel.
  • Embeddable: Fennel is a one-file library as well as an executable. Embed it in other programs to support runtime extensibility and interactive development.

Anywhere you can run Lua code, you can run Fennel code.

Example:

;; Sample: read the state of the keyboard and move the player accordingly
(local dirs {:up [0 -1] :down [0 1] :left [-1 0] :right [1 0]})

(each [key [dx dy] (pairs dirs)]
  (when (love.keyboard.isDown key)
    (let [[px py] player
          x (+ px (* dx player.speed dt))
          y (+ py (* dy player.speed dt))]
      (world:move player x y))))
 

Abstract:

We say that an imperative data structure is snapshottable or supports snapshots if we can efficiently capture its current state, and restore a previously captured state to become the current state again. This is useful, for example, to implement backtracking search processes that update the data structure during search.

Inspired by a data structure proposed in 1978 by Baker, we present a snapshottable store, a bag of mutable references that supports snapshots. Instead of capturing and restoring an array, we can capture an arbitrary set of references (of any type) and restore all of them at once. This snapshottable store can be used as a building block to support snapshots for arbitrary data structures, by simply replacing all mutable references in the data structure by our store references. We present use-cases of a snapshottable store when implementing type-checkers and automated theorem provers.

Our implementation is designed to provide a very low overhead over normal references, in the common case where the capture/restore operations are infrequent. Read and write in store references are essentially as fast as in plain references in most situations, thanks to a key optimisation we call record elision. In comparison, the common approach of replacing references by integer indices into a persistent map incurs a logarithmic overhead on reads and writes, and sophisticated algorithms typically impose much larger constant factors.

The implementation, which is inspired by Baker's and the OCaml implementation of persistent arrays by Conchon and Filliâtre, is both fairly short and very hard to understand: it relies on shared mutable state in subtle ways. We provide a mechanized proof of correctness of its core using the Iris framework for the Coq proof assistant.

 

Back from break, I'm going to start posting regularly again.

This is an interesting node-based visual programming environment with very cool graphics. It's written on Clojure but the various "cards" (nodes) can include SQL, R, DALL-E API calls, or anything else.

RVBBIT

[–] armchair_progamer@programming.dev 79 points 3 months ago (1 children)

But is it rewritten in Rust?

“I’ve got 10 years of googling experience”.

“Sorry, we only accept candidates with 12 years of googling experience”.

Author's comment on lobste.rs:

Yes it’s embeddable. There’s a C ABI compatible API similar to what lua provides.

[–] armchair_progamer@programming.dev 26 points 6 months ago* (last edited 6 months ago)

C++’s mascot is an obese sick rat with a missing foot*, because it has 1000+ line compiler errors (the stress makes you overeat and damages your immune system) and footguns.

EDIT: Source (I didn't make up the C++ part)

[–] armchair_progamer@programming.dev 5 points 6 months ago* (last edited 6 months ago) (2 children)

I could understand method = associated function whose first parameter is named self, so it can be called like self.foo(…). This would mean functions like Vec::new aren’t methods. But the author’s requirement also excludes functions that take generic arguments like Extend::extend.

However, even the above definition gives old terminology new meaning. In traditionally OOP languages, all functions in a class are considered methods, those only callable from an instance are “instance methods”, while the others are “static methods”. So translating OOP terminology into Rust, all associated functions are still considered methods, and those with/without method call syntax are instance/static methods.

Unfortunately I think that some people misuse “method” to only refer to “instance method”, even in the OOP languages, so to be 100% unambiguous the terms have to be:

  • Associated function: function in an impl block.
  • Static method: associated function whose first argument isn’t self (even if it takes Self under a different name, like Box::leak).
  • Instance method: associated function whose first argument is self, so it can be called like self.foo(…).
  • Object-safe method: a method callable from a trait object.
[–] armchair_progamer@programming.dev 3 points 7 months ago* (last edited 7 months ago) (1 children)

I find writing the parser by hand (recursive descent) to be easiest. Sometimes I use a lexer generator, or if there isn’t a good one (idk for Scheme), write the lexer by hand as well. Define a few helper functions and macros to remove most of the boilerplate (you really benefit from Scheme here), and you almost end up writing the rules out directly.

Yes, you need to manually implement choice and figure out what/when to lookahead. Yes, the final parser won’t be as readable as a BNF specification. But I find writing a hand-rolled parser generator for most grammars, even with the manual choice and lookahead, is surprisingly easy and fast.

The problem with parser generators is that, when they work they work well, but when they don’t work (fail to generate, the generated parser tries to parse the wrong node, the generated parser is very inefficient) it can be really hard to figure out why. A hand-rolled parser is much easier to debug, so when your grammar inevitably has problems, it ends up taking less time in total to go from spec to working hand-rolled vs. spec to working parser-generator-generated.

The hand-rolled rules may look something like (with user-defined macros and functions define-parse, parse, peek, next, and some simple rules like con-id and name-id as individual tokens):

; pattern			::= [ con-id ] factor "begin" expr-list "end"
(define-parse pattern
  (mk-pattern
    (if (con-id? (peek)) (next))
    (parse factor)
    (do (parse “begin”) (parse expr-list) (parse “end”))))

; factor 			::= name-id 
; 			 | symbol-literal
; 			 | long-name-id 
; 			 | numeric-literal 
;	 		 | text-literal 
; 			 | list-literal 
; 			 | function-lambda 
; 			 | tacit-arg
; 			 | '(' expr ')' 
(define-parse factor
  (case (peek)
    [name-id? (if (= “.” (peek2)) (mk-long-name-id …) (next))]
    [literal? (next)]
    …))

Since you’re using Scheme, you can almost certainly optimize the above to reduce even more boilerplate.

Regarding LLMs: if you start to write the parser with the EBNF comments above each rule like above, you can paste the EBNF in and Copilot will generate rules for you. Alternatively, you can feed a couple EBNF/code examples to ChatGPT and ask it to generate more.

In both cases the AI will probably make mistakes on tricky cases, but that’s practically inevitable. An LLM implemented in an error-proof code synthesis engine would be a breakthrough; and while there are techniques like fine-tuning, I suspect they wouldn’t improve the accuracy much, and certainly would amount to more effort in total (in fact most LLM “applications” are just a custom prompt on plain ChatGPT or another baseline model).

[–] armchair_progamer@programming.dev 4 points 7 months ago* (last edited 7 months ago)

My general take:

A codebase with scalable architecture is one that stays malleable even when it grows large and the people working on it change. At least relative to a codebase without scalable architecture, which devolves into "spaghetti code", where nobody knows what the code does or where the behaviors are implemented, and small changes break seemingly-unrelated things.

Programming language isn't the sole determinant of a codebase's scalability, especially if the language has all the general-purpose features one would expect nowadays (e.g. Java, C++, Haskell, Rust, TypeScript). But it's a major contributor. A "non-scalable" language makes spaghetti design decisions too easy and scalable design decisions overly convoluted and verbose. A scalable language does the opposite, nudging developers towards building scalable software automatically, at least relative to a "non-scalable" language and when the software already has a scalable foundation.

[–] armchair_progamer@programming.dev 35 points 7 months ago* (last edited 7 months ago) (3 children)
public class AbstractBeanVisitorStrategyFactoryBuilderIteratorAdapterProviderObserverGeneratorDecorator {
    // boilerplate goes here
}
[–] armchair_progamer@programming.dev 8 points 8 months ago* (last edited 8 months ago) (1 children)

I believe the answer is yes, except that we’re talking about languages with currying, and those can’t represent a zero argument function without the “computation” kind (remember: all functions are Arg -> Ret, and a multi-argument function is just Arg1 -> (Arg2 -> Ret)). In the linked article, all functions are in fact “computations” (the two variants of CompType are Thunk ValType and Fun ValType CompType). The author also describes computations as “a way to add side-effects to values”, and the equivalent in an imperative language to “a value which produces side-effects when read” is either a zero-argument function (getXYZ()), or a “getter” which is just syntax sugar for a zero-argument function.

The other reason may be that it’s easier in an IR to represent computations as intrinsic types vs. zero-argument closures. Except if all functions are computations, then your “computation” type is already your closure type. So the difference is again only if you’re writing an IR for a language with currying: without CBPV you could just represent closures as things that take one argument, but CBPV permits zero-argument closures.

Go as a backend language isn’t super unusual, there’s at least one other project (https://borgo-lang.github.io) which chosen it. And there are many languages which compile to JavaScript or C, but Go strikes a balance between being faster than JavaScript but having memory management vs. C.

I don’t think panics revealing the Go backend are much of an issue, because true “panics” that aren’t handled by the language itself are always bad. If you compile to LLVM, you must implement your own debug symbols to get nice-looking stack traces and line-by-line debugging like C and Rust, otherwise debugging is impossible and crashes show you raw assembly. Even in Java or JavaScript, core dumps are hard to debug, ugly, and leak internal details; the reason these languages have nice exceptions, is because they implement exceptions and detect errors on their own before they become “panics”, so that when a program crashes in java (like tries to dereference null) it doesn’t crash the JVM. Golang’s backtrace will probably be much nicer than the default of C or LLVM, and you may be able to implement a system like Java which catches most errors and gives your own stacktrace beforehand.

Elm’s kernel controversy is also something completely different. The problem with Elm is that the language maintainers explicitly prevented people from writing FFI to/from JavaScript except in the maintainers’ own packages, after allowing this feature for a while, so many old packages broke and were unfixable. And there were more issues: the language itself was very limited (meaning JS FFI was essential) and the maintainers’ responses were concerning (see “Why I’m leaving Elm”). Even Rust has features that are only accessible to the standard library and compiler (“nightly”), but they have a mechanism to let you use them if you really want, and none of them are essential like Elm-to-JS FFI, so most people don’t care. Basically, as long as you don’t become very popular and make a massively inconvenient, backwards-incompatible change for purely design reasons, you won’t have this issue: it’s not even “you have to implement Go FFI”, not even “if you do implement Go FFI, don’t restrict it to your own code”, it’s “don’t implement Go FFI and allow it everywhere, become very popular, then suddenly restrict it to your own code with no decent alternatives”.

view more: next ›