Computer Science Research

16 readers
1 users here now

founded 2 years ago
MODERATORS
1
 
 

cross-posted from: https://lemm.ee/post/5467810

In 1997, a contest began to develop a new encryption algorithm to become the Advanced Encryption Standard. After years of debate, one algorithm was chosen as the AES. But how does AES work? And what makes for a secure encryption algorithm?


Spanning Tree is an educational video series about computer science and mathematics. See more at https://spanningtree.me

To be notified when a new video is released, sign up for the Spanning Tree mailing list at https://spanningtree.substack.com/

Spanning Tree is created by Brian Yu. https://brianyu.me/

Email me at brian@spanningtree.me to suggest a future topic.


  • 0:00 The Contest
  • 1:02 Encryption
  • 3:57 Confusion and Diffusion
  • 5:44 Block Cipher
  • 6:55 KeyExpansion
  • 7:34 AddRoundKey
  • 8:14 Substitution Cipher
  • 8:55 SubBytes
  • 11:30 MixColumns
  • 12:53 ShiftRows
  • 13:21 The Algorithm

Aug 22, 2023

2
3
 
 

Parsing plays an important role in static program analy- sis: during this step a structural representation of code is created upon which further analysis is performed. Parser generator tools, being pro- vided with syntax specification, automate parser development. Language documentation often acts as such specification. Documentation usually takes form of ambiguous grammar in Extended Backus-Naur Form which most parser generators fail to process. Automatic grammar transforma- tion generally leads to parsing performance decrease. Some approaches support EBNF grammars natively, but they all fail to handle ambigu- ous grammars. On the other hand, Generalized LL parsing algorithm admits arbitrary context-free grammars and achieves good performance, but cannot handle EBNF grammars. The main contribution of this pa- per is a modification of GLL algorithm which can process grammars in a form which is closely related to EBNF (Extended Context-Free Gram- mar). We also show that the modification improves parsing performance as compared to grammar transformation-based approach.

Also contains a full GLL implementation.

4
 
 

This paper introduces sets of binary subtree representations as an alternative to shared packed parse forests as the output of a generalised parser, and shows how these may be generated by Earley's algorithm, by a new GLL-style parser and by Johnson's continuation passing combinator style parsers. The set based output removes the clerical overhead associated with graph constructions, making the parsers simpler.

5
 
 

Earley parser is a well-known parsing method used to analyse context-free grammars. While being less efficient in practical contexts than other generalized context-free parsing algo- rithms, such as GLR, it is also more general. As such it can be used as a foundation to build more complex parsing algorithms. We present a new, virtual machine based approach to parsing, heavily based on the original Earley parser. We show how to translate grammars into virtual machine instruction sequences that are then used by the parsing algorithm. Additionally, we introduce an optimization that merges shared rule prefixes to increase parsing performance. Finally, we present and evaluate an implementation of Scannerless Earley Virtual Machine called north.

6
 
 

Social computing prototypes probe the social behaviors that may arise in an envisioned system design. This prototyping practice is currently limited to recruiting small groups of people. Unfortunately, many challenges do not arise until a system is populated at a larger scale. Can a designer understand how a social system might behave when populated, and make adjustments to the design before the system falls prey to such challenges? We introduce social simulacra, a prototyping technique that generates a breadth of realistic social interactions that may emerge when a social computing system is populated. Social simulacra take as input the designer's description of a community's design -- goal, rules, and member personas -- and produce as output an instance of that design with simulated behavior, including posts, replies, and anti-social behaviors. We demonstrate that social simulacra shift the behaviors that they generate appropriately in response to design changes, and that they enable exploration of "what if?" scenarios where community members or moderators intervene. To power social simulacra, we contribute techniques for prompting a large language model to generate thousands of distinct community members and their social interactions with each other; these techniques are enabled by the observation that large language models' training data already includes a wide variety of positive and negative behavior on social media platforms. In evaluations, we show that participants are often unable to distinguish social simulacra from actual community behavior and that social computing designers successfully refine their social computing designs when using social simulacra.

7
 
 

Research has demonstrated that integrating software testing in programming courses has benefits and drawbacks. This work presents Test Informed Learning with Examples (TILE), a proposal to improve teaching of testing in introductory programming courses. We will argue why we think TILE can solve most of the known drawbacks.

8
 
 

We propose a method for using a large language model, such as GPT-3, to simulate responses of different humans in a given context. We test our method by attempting to reproduce well-established economic, psycholinguistic, and social experiments. The method requires prompt templates for each experiment. Simulations are run by varying the (hypothetical) subject details, such as name, and analyzing the text generated by the language model. To validate our methodology, we use GPT-3 to simulate the Ultimatum Game, garden path sentences, risk aversion, and the Milgram Shock experiments. In order to address concerns of exposure to these studies in training data, we also evaluate simulations on novel variants of these studies. We show that it is possible to simulate responses of different people and that their responses are largely consistent with prior human studies from the literature. Using large language models as simulators offers advantages but also poses risks. Our use of a language model for simulation is contrasted with anthropomorphic views of a language model as having its own behavior.

9
 
 

Abstract: We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model’s factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed- forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task. We also evaluate ROME on a new dataset of difficult counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. Website at https://rome.baulab.info/

10
 
 

FSE panel on the state of SE research