LING 471: Computational Methods for Linguists
Information · Policies · Schedule
Information
Course Description
The course Computational Methods for Linguists focuses on learning how computational methods and tools can be applied in linguistics. One of the main goals is to familiarize students with the basics of programming and general technical versatility skills needed for programming tasks. Assignments are organized around linguistics or linguistically annotated data and presentations are graded based on how well they connect to linguistic and social concepts. This course assumes no background of computer science or linguistics.
Should I take this class or another computational linguistics class?
You could take this class if
- You want a hands-on, beginner-friendly class about computational linguistics
- You don’t have any coding experience and would like to learn coding
- You already know coding but would like to learn Python
You could take LING 472 if
- You want a class with more theoretical grounding about computational linguistics (formal foundations, earlier statistical/grammer-based approaches, etc.)
- You can already code in Python
You could take LING 570 if
- You have already taken a range of computer science, statistics, and math classes
- You are comfortable with coding and want a systematic path to learn about computational linguistics
Learning outcomes
By the end of the course, you will be able to:
- Write computer programs in Python
- Discuss what counts as data in computational linguistics
- Connect linguistic theory to computational method choice
- Reflect on the ethical and social implications of data use
- Use a command line interface effectively
- Apply version control (Git)
- Perform data cleaning, vectorization, modeling, training, interpretation, and visualization
Meeting Times & Format
This course is taught in person. Recordings will not be made. In-class activities require a computer, so it is strongly recommended to bring your laptop with you to every class.
Days | Time | Location |
---|---|---|
Tue & Thur | 3:30-5:20 PM | TBD |
Teaching Staff
Role | Name | Contact | Office | Office Hours |
---|---|---|---|---|
Instructor | Siyu Liang | liangsy at uw.edu | GUG 407 and Zoom | TBD |
Texts
All readings for this class are available at no cost to you, either through open access material or through UW’s library licensing of academic content. We will be doing reading from the following books (among other resources):
- Downey (2015). Think Python (2nd ed.)
- Jurafsky & Martin (2025). Speech and Language Processing (3rd ed.)
Recommended text (for those who have not taken LING 200/400): Language Files 13 (or Essentials of Linguistics) for linguistics review.
Policies
Assessment & Grading
The class is organized around a series of biweekly programming assignments. These assignments target different concepts and skills but are all connected to working with linguistic data or corpora (e.g., IMDB reviews data set). There are no exams.
Additionally, students will:
- Give a presentation related to their assignment work (15%).
- Maintain a reading dossier (10%) with questions and reflections on assigned readings.
Up to 2% adjustment for significant in-class participation.
Here is a breakdown of the grading components:
Component | Weight |
---|---|
Programming Assignments | 75% |
Presentation | 15% |
Reading Dossier | 10% |
Participation | 2% |
Total | 102% |
Grading scale: ≥95% = 4.0; 94% = 3.9; 93% = 3.8; and so on.
Assignments
The assignments will be roughly biweekly. The goal is to give you some breathing space in between and sufficient time to work on them in a reasonable pace. However, it will be very important for you to remember that the assignments which are technical in nature may take a long time due to technical issues (which is very normal, and dealing with it is one of the main learning goals in this class). This means that if you delay starting the assignment, you are very likely to not finish it by the time it is due.
I highly recommend the following algorithm for the technical assignments:
- Start on the day assignment is published (or the next day).
- Get a feel of how fast you are progressing. Take a note of the first block you are facing. Post about it on the Discussion Board. Then go do something else.
- Come back to it next day. See if you get unblocked. If not, go to the next office hour and see if that helps you get unblocked.
- Repeat from step 2.
Once you start feeling like you have not been making progress for 20-30 minutes, stop. Come back to it later (e.g. next day).
Having 2 weeks or so for each assignment can mean you can take nice breaks from the class, but that will only work if you start early. When progress feels slow, take frequent but short breaks (e.g. leave the assignment and come back to it next day). When you are almost done and you feel like you understand almost everything and will be able to finish the assignment quickly, then it becomes possible to take a few days break. But not before, or you will not succeed in earning a good grade.
Most assignments will be submitted using a combination of GitHub and Canvas. Assignments will not be considered fully submitted until there is a working link on Canvas to the appropriate submission on GitHub.
Readings
There will be some assigned readings for most lectures. Some of them will just be blog posts and websites for beginner programmers etc. They are just as good for learning about these things as books. Other readings will include scholarly papers; reading those is more difficult, so, try to identify some specific goals as you read. E.g. “I am reading this to understand what “Data Statements” are and I want to form an opinion about whether they are useful in some particular context.”
Reading dossier
For each group of readings, you need to formulate two questions and one comment about the readings. The questions can be about basic understanding of technical concepts, but they can also be conceptual questions or questions connecting the concepts to language, previous readings, or other knowledge you have. The comments should be expository remarks that capture your thoughts about the readings; perhaps you want to critique the methods, or you think they are particularly impressive, or you learned something you didn’t even know existed. The comments must be at least 3 sentences long each.
You only need to do the questions and comment for each group of readings, not each individual reading. So, for example, for the readings due April 6, you only need 2 questions and 1 comment in total, not 2 questions and 1 comment for each reading.
The reading dossier will be due at the end of the course. It will be your responsibility to keep it up to date or get caught up if you fall behind.
Late Homework Policy
Assignments are due at 11:59 PM on the posted date. For late submissions, the following penalty applies:
- ≤1 day late → maximum credit: 80%
- ≤2 days late → maximum credit: 70%
- >2 days late → no credit
To request an extension, you must contact the instructor before the deadline (ideally the day before). For unexpected emergencies that prevent timely communication, exceptions may be considered on a case-by-case basis.
Because the assignments are technical, you should expect them to take longer than anticipated due to normal debugging issues. It is recommended to start the day the assignment is released. If you get stuck, use the discussion board, office hours, or peer collaboration (where permitted) instead of waiting until the last moment.
Submissions & Work Practices
- All assignments must be submitted via GitHub + Canvas. A submission is not complete until there is a working Canvas link to the correct GitHub repository.
- You are encouraged to use the following workflow for technical assignments:
- Start on the day the homework is posted.
- Track your first “blocker” and post about it on the discussion board.
- Return the next day and attempt to unblock; if unsuccessful, bring it to office hours.
- Repeat until resolved.
If you find yourself spending 20–30 minutes stuck, stop and return later—this is a normal part of the learning process.
Communication & Discussion Boards
- Each assignment has a dedicated Canvas discussion board. Use these for technical and logistics questions so that others can benefit from answers.
- Email should be reserved for private matters such as grades or personal circumstances.
- The instructor will respond to Canvas posts or emails within 48 business hours (excluding weekends).
Note on using LLMs
While there is no control over your use of Large Language Models (LLMs) like ChatGPT, extensive reliance on these tools will be detrimental to learning in this course. This class is fundamentally about learning the basics of programming and developing the problem-solving skills, debugging abilities, and technical intuition that can only come through direct engagement with coding challenges. The struggle of working through technical problems is an essential part of the learning process, not an obstacle to avoid. Students should use discussion boards, office hours, or websites such as official documentation for packages or StackOverflow as primary resources when stuck.
Accessibility and Disability Accommodations
Your experience in this class is important. It is the policy and practice of the University of Washington to create inclusive and accessible learning environments consistent with federal and state law. If you have already established accommodations with Disability Resources for Students (DRS), please activate your accommodations via myDRS so we can discuss how they will be implemented in this course.
If you have not yet established services through DRS, but have a temporary health condition or permanent disability that requires accommodations (conditions include but not limited to; mental health, attention-related, learning, vision, hearing, physical or health impacts), contact DRS directly to set up an Access Plan. DRS facilitates the interactive process that establishes reasonable accommodations. Contact DRS at (www.disability.uw.edu).
Safety
Call SafeCampus at 206-685-7233 anytime – no matter where you work or study – to anonymously discuss safety and well-being concerns for yourself or others. SafeCampus’s team of caring professionals will provide individualized support, while discussing short- and long-term solutions and connecting you with additional resources when requested.
Religious Accommodations
Washington state law requires that UW develop a policy for accommodation of student absences or significant hardship due to reasons of faith or conscience, or for organized religious activities. The UW’s policy, including more information about how to request an accommodation, is available at Religious Accommodations Policy (https://registrar.washington.edu/staffandfaculty/religious-accommodations-policy/). Accommodations must be requested within the first two weeks of this course using the Religious Accommodations Request form (https://registrar.washington.edu/students/religious-accommodations-request/).”
Schedule
Class schedule (subject to change)
Items in the “Reading” column are to be read before the class meeting they are associated with. Items that are due on a particular day are due at 11:59 PM on that day.
Week | Date | Topic | Reading | Due |
---|---|---|---|---|
1 | Jan 6 | Introduction, course structure | ||
Jan 8 | Conceptual overview, data science Guest lecture | - What is data science? | - Online survey (on Canvas) - Request an account on the Patas cluster | |
2 | Jan 13 | Basic system and programming concepts | - Think Python Ch. 1: The Way of the Program | |
Jan 15 | VSCode basics, version control | - The IMDB reviews dataset paper - Data statements for NLP - Version control (read conceptually; ignore RStudio stuff, etc.) | ||
3 | Jan 20 | Variables, scope, control flow, FizzBuzz | This looks like a lot, but many of these are only 4 pages long or so. - Think Python Ch. 2: Variables, Expressions, and Statements - Think Python Ch. 5: Conditionals and Recursion (5.2-5.7) - Think Python Ch. 8: Strings (8.1-8.2, 8.4-8.5) - Think Python Ch. 10: Lists (10.1-10.5) - De Morgan’s law | Assignment 1 |
Jan 22 | Loops, dictionaries, input/output | - Think Python Ch. 4: Case Study: Interface Design (4.1-4.2) - Think Python Ch. 7: Iteration (7.1-7.4) - Think Python Ch. 11: Dicts (11.1-11.3, 11.5) - Input/output (7.1-7.2.1) | ||
4 | Jan 27 | Text processing | - Regular expressions - Tokenization - I would not recommend installing Keras or Gensim just to follow along, though we will probably use Keras later - Unicode - Modules (6.1-6.4.1) | |
Jan 29 | Text processing, unicode, evaluation, PyCharm settings | - Think Python Ch. 19: Goodies (19.2-19.3) | ||
5 | Feb 3 | Metrics, precision, recall | - Precision and recall - Can stop when you get to the section headed “In binary classification settings” - Precision and recall 2 | Assignment 2 |
Feb 5 | Data science, probability, maximum likelihood estimation | - Speech and Language Processing Ch. 3: N-Gram Language Models - Read for conceptual, not technical understanding | ||
6 | Feb 10 | Statistics, distributions, Gaussian distribution | - Start stats tutorial, read through “measures of spread” | |
Feb 12 | Baye’s theorem, data frames | - Finish tutorial - Skip the section about R but make sure to read about the Bayes Theorem - You can also skip: Entropy and Information Gain; Inferential statistics. Read those if you like (they are generally important), but we probably don’t have time for them. | ||
7 | Feb 17 | Machine learning and matrices, linear regression | - Regression and classification | Assignment 3 |
Feb 19 | Machine learning, logistic regression, Naive Bayes | - Logistic regression - Naive Bayes | ||
8 | Feb 24 | Language models, nonlinearity, neural networks | - Deep learning for NLP - Nonlinear problems - Testing NLP models | |
Feb 26 | Deep learning, linguistic knowledge in NLP | - Ettinger et al. (2017) | ||
9 | Mar 3 | Working with linguistic corpora | - Aijmer (2021) or Stange (2021) (both found on Canvas→Files→papers) | Assignment 4 |
Mar 5 | Visualization, communication | - To dissect an octopus - Keras word embedding tutorial (a working version of this is part of your HW5 skeleton) - Visualization with Pandas | ||
10 | Mar 10 | Presentations | ||
Mar 12 | Presentations | Assignment 5, Reading dossier |
Last updated: 2025-09-28