Intermediate · Evening
NLP for Japanese Text
Address morphological complexity, encoding issues, and baseline models for JP text. Includes bilingual annotation guidelines.
Request informationFeatures
- MeCab and fugashi tokenization labs
- TF-IDF and embedding baselines
- Hugging Face transformers intro
- Annotation quality rubric
- Bias and representation discussion session
- Small domain adaptation project
Outcomes
- Build a JP text classifier with documented metrics
- Compare tokenizer choices on the same dataset
- Present annotation guidelines for a sample corpus
Naomi Fujita
Computational linguist; JP-EN corpus experience.
FAQ
Reading ability for course materials required; instruction primarily in English.
Reviews
"Tokenizer comparison lab clarified why our old bag-of-words model failed on product reviews. — Osaka"