Index Symbols | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | V | W | X | Y Symbols *Book Topics About the DSC (DSC 1) AI & text generation - GPT-2 era (DSC 9) AntConc (DSC 4) Archives (DSC 17) Collaboration (DSC 3) Copyright & fair use (DSC 7) Corpora (DSC 19) DH is people (DSC 13) DMCA text & data mining exemption (DSC 14) DSC audience and goals (DSCSS 1) Environments (DSC 21) Finding textual differences (DSC 3 Dear Reader) HathiTrust (DSC 18) Learning to code (DSC 12) Named Entity Recognition (DSC M2) Principal component analysis (DSC 10) Scanning and OCR (DSC M2) Searching for book metadata (DSC M1) Sentiment analysis (DSC 11) Software studies (DSC 16) TEI (DSC 5) Text classification (DSC 15) Text comparison (DSC 8) Topic modeling (DSC 20) Training NER, translation alignment & WordNet (DSC M4) Understanding NLP models and their training data Vibe coding with AI (DSC 23) Voyant (DSC 6) Web scraping & OpenRefine (DSC M3) Zines (DSC 22) A AI social impact AI ethics Ann M. Martin papers authorship team evidence for gay Ducky faxes finding aid lack of correspondence synopses temporality evident in synopses Annie Swafford Annotation detail in images forcing engagement with texts TEI why annotation decisions matter AntConc checking for problems with data installing on HathiTrust data capsule, [1] KWIC running queries in a HathiTrust data capsule word frequency lists Archives *respect des fonds* collection organization digital data emails fiction vs reality finding aids first exposure to archives fragility of digital data importance for multilingual mysteries loss of email records myth of "discovery" paperwork and restrictions printing digital data rarity of definitive conclusions research questions skimming talking to archivists what's absent Audience convincing technophobes potential audiences who is the DSC audience? B Baby-Sitters Club Friendship Kit, [1] BSC journal BSC journal as documented by Amy A. Cowan casting choices difficulty of games file names in the stationery kit games journal written in cursive kids review the game letters as Mad Libs played by modern children portrayal of Mallory precursor to web-based communication sense of space Sidewalk Studios stationery kit Back to the Future Biplot analysis interpretation interpretation using TF-IDF zeta score Book re-releases content changes Brands brand-name foods in BSC Brokenness Brothers' Grimm BSC Bible BSC slang dibble introduction BY Times acquiring the books childhood reading experience depicting "out-of-town" communities depicting a different small world Ilana the California Girl interview with the author Los Angeles Orthodox community mashup of BSC and Sweet Valley New York as hub of Orthodox Jewish community use of California stereotype C California Diaries evidence for gay Ducky in Ann M. Martin papers Chapter 2 phenomenon clustering by author clustering by narrator evidence in Ann M. Martin papers network of similarity occurrence over time top documents in topic model visibility in topic modeling visualization with cosine distance Charlie and the Chocolate Factory Children working while children nap Children's literature cultural adaptations race-bending social norms Clothes depiciton in youth literature Code applying other people's code choosing a programming language DH Twitter survey magic numbers making different choices when adapting moving between Python and R in a notebook packages, libraries, programming languages Reading R as a Python user running R in Jupyter notebooks Collaboration awkward conversations domain expertise, [1] domain expertise vs computation getting stuck language knowledge new collaborators to get out of a rut with easily-distracted collaborators yes-and Comparing text comparison using distance measures Computer science humanities engagement Conferences CSDH-SCHN 2021 DH 2023 Copyright as set of exclusive rights Building LLDTM workshop duration evolving facts aren't copyrightable implications for people with disabilities licensing promoting progress see also fair use Sonny Bono Copyright Extension Act specific rights for creators Coreference resolution BookNLP Corpora challenges of mid-20th century corpora choosing a cut-off Corpus of Contemporary American English (COCA) finding one for your research question how much is enough how to choose what to include importance of domain expertise importance of reading interaction with research questions no bad corpora thinking about comparison sets vs lived experience when you can't get everything Corpus linguistics reference corpora Cost expense of DH work Covid burn-out with no end in sight canceling the DSC road trip super-special data-wrangling as a locus of control fall 2020 Katia leaves isolation for BSC books March 2020 Omicron wave post-vaccine optimism time warp from 2020 to mid-2021 D Data analyzing how to clean data caution when slicing up data too much complexity data cleaning for NER decision-making around cleaning difficulty of extracting dialogue inevitability of mess messy metadata sheet things uncaptured by extant data ubiquity of data wrangling DH is people as a source of support collective trauma remembering & sharing stories sometimes things take time DH Twitter after acquisition by Elon Musk as a source of memories code survey for meeting people Python help Digital humanities as a discipline second-generation digital humanists Disability as depicted in Jessi's Secret Language Distance measures cosine distance across BSC book chapters cosine similarity Euclidean distance heatmap visualization importance of word frequencies insight using distance measures for text comparison word frequencies using scikit-learn Distinctive words Divorce in BSC books DMCA compromise as part of exemption process conditions for breaking encryption cracking ebooks emotional roller-coaster of the exemption exemption granted in 2021 exemption petition process extreme security requirements how the exemption process works limitations on use of decrypted materials security requirements for decrypted data see also Copyright types of works where you can break encryption DRM cracking encryption DSC Multilingual Mystery series E Emotions angst about PCA burn-out Environments Anaconda and Miniconda as self-care changing yours Conda vs Pip definition digital and physical spaces exporting and importing with Conda for JupyterBook for Python for R in Jupyter notebooks Python environments like flossing Python environments like the Pike family Quinn's office trauma troubleshooting Using Conda working in a Conda environment Expectations operationalizing through DH F Fair use acting boldly amount and substantiality of the portion used as a standard as an open list computational text analysis legitimate DSC case study with TEI annotations effect of use on potential market effect on the market factors fair use vs. fair dealing implications of licensing leeway for education legitimacy of parody nature of copyrighted work, [1] purpose and character of use reasonableness of amount used transformative uses Feminism patience as feminist imperative File size as a warning Finding aids see also Archives Finding textual differences Food as a topic of interest in translation contributing to the vibrancy of the BSC books corpora annotated for food inconsistency in translation Formalism compatibility with DH French dialectal variation translation French translations grammar changes between editions identifying Belgian translators lack of localization for Twinky in Quebec similarity of Belgian and French G Goodreads "classics" as a source of lists GPT-2 anti-Semitism for generating BSC fan-fiction generation based on word frequencies Holocaust denial racism Twitter bot GPT-3 GPUs vs CPUs Graph quadrants Graphic novels image changes in translations shareability of TEI annotations Graphs basic explanation X-axis and Y-axis Guest Data-Sitters Anastasia Salter Annie Lamar Dainy Bernstein Elisa Beshero-Bondar Erik Stallman Heather Froehlich Isabelle Gribomont Jeff Tharsen Matthew Sag Rachael Samberg Sathvika Anand Shelley Staples Xanda Schofield H Handwriting cursive HathiTrust as future home for DSC corpus as legal benchmark for text analysis as metaphorical New York City corpus size corpus sources creating a workset data capsule data capsule maintenance mode data capsule secure mode downloading texts to the data capsule Emergency Temporary Access Service extracted feature algorithms extracted features lack of youth literature launching the data capsule memories of its founding pronunciation reception for HTRC at DH 2011 releasing results research capsule application form use by Ted Underwood High performance computing for searching large corpora Highlighting in paper books I Images annotating details methods for analysis Interdisciplinary work what humanists bring Internet Archive what's there and what isn't Interpretation corpus analysis J Junior Officers Cadence Cordell Paul Dombrowski Sam Dombrowski Jupyter Notebooks dangers of not resetting them difficulty of adapting existing scripts usefulness as publishing platform K Karen as a Karen Kristy topic in the topic model KWIC with AntConc L LaTeX Learning to code bad Python classes coding opens possibilities Danger Noodle Club Humanities Data in R in high school via Business Computer Programming like learning a human language messing around with operating systems Miriam Posner's blog post on coding problem with learn-to-code books Python at DHSI R feels different than Python reaching conversational fluency similarities & differences among languages Text Analysis with R for Students of Literature the need for a good reason via BASIC via CS 101 via Jekyll via XML and TEI via XSLT Licenses possibility of EU collaboration Licensing limitations on use potential legislative fixes Literary criticism justification for DH methods M Machine Learning ChatGPT Machine learning comparing GPT-2 vs Ch 2 exerpts danger as a decision-making tool deep learning evaluation fine-tuning GPT-2 on Chapter 2 generating comparably-sized excerpts to GPT-2 Google Colab GPUs it's just math learning rates limits on transformer text production loss of human control overfitting role for humanities scholars training as a guessing game training on small corpora Mark Algee-Hewitt R-based workflows Markup see also Annotation Martha Tolles Matt Jockers Metadata converting from UNIMARC created by people national libraries, [1] Methods confusion with similarly-named methods method options for different questions Mistakes checking machine learning rates recognizing when things have gone wrong when copying and pasting code when separating code and prose Multilingual DH horror at how sentiment analysis is implemented problems exporting accents from national libraries Mysteries BSC vs Nancy Drew N N-grams Named Entity Recognition challenges with model training explanation for English for English organizations for English people for English places for English works of art for French, [1] for French misc entities for French organizations for French places future steps improvement to French model model training resembles GBBO challenge spaCy vs Stanford NLP use of context Networks visualization Newbery Medal Criticism pizza runners-up Newcomers to DH experience NLP comparison on French & English training data dependency parse discrepancy in result quality Displaycy finding subjects of verbs impact of character names on finding the subject name localization training data for French spaCy model Xanda Schofield on "The Possibilities and Limitations of NLP for the Humanities" Normal distribution see also Statistics Notes from Underground O OCR errors evaluation handwriting noise software tips Open Refine random sample OpenRefine cell cross, [1] cell cross example example cleaning workflow for processing NLP output text facet P Parallel universes Part-of-speech tagging CLAWS tagger errors Pedagogy bad pedagogical approaches communities of practice insecurity Voyant assignment Phrases a little a little, but Pizza occurrences in BSC series occurrences in youth literature possible research questions Play as research method Prescriptivism in DH vs literary studies Principal Component Analysis grappling with dimensionality interpreting PCA of top 1k nouns purpose see also Typicality Spreadsheet of Doom to cluster pets understanding features using M&M colors Principal component analysis looking for distinctions between quotes struggle with interpretation Programming Historian Jupyter Notebook lesson lesson on similarity measures lesson on TF-IDF Projcts AI Weirdness Projects A Distant Reading of Empire Digital Dostoevsky Viral Texts Visualizing English Print Propp relation to DH methods Prosecraft Python vs R according to DH Twitter Q QQplot see also Statistics Quebec translations found in a used bookstore Questions ability to answer using different workflows addressable by Voyant DH without a research question does DH have unique answers? interaction with corpora method options more than confirming what's known? necessity of making compromises pizza as a starting point reframing as needed trial and error vs messing around with data vs play what are the questions of the DSC? R Race as depicted in Jessi's Secret Language diversity in crowd scenes Reading with your eyeballs Rebecca Munson at Oxford at Princeton CDH feminism Fuck the Patriarchy jewelry hair dye as memorial ritual Lurlene McDaniel dialogue picking up the purple project management and student support self-esteem this is what a feminist looks like Regular expressions use with TEI Ronald Reagan S Scanning see also Text acquisition Scott Enderle A Distant Reading of Empire as developer of usable & understandable things plans to revisit DSC 8 reappearing to offer a solution ultimate bug report Scott Enderle's 6-gram code aggregating results data preparation running the code Searching for lists of words Sentiment analysis application to modernist literature as applied to literary text challenge of interpretation difficulty in scoring sentences horror at multilingual implementation manually plotting sentiment materials it was designed for modeling sentiment null hypothesis quantifying characters' emotional landscape Textblob and VADER transparency of word-list method wariness about it word lists vs word vectors Similarity multiple definitions Software emulation EAASI VirtualBox Software Studies 90's game nostalgia Barbie Fashion Designer gendered dynamics in computer stores Mattel buy-out of girl games Purple Moon similarity to signed languages Theresa Duncan Software studies difficulty of building corpora spaCy downloading models English named entity recognition English named entity recognition for organizations English named entity recognition for people English named entity recognition for places English named entity recognition for works of art French named entity recognition, [1] French named entity recognition for misc entities French named entity recognition for organizations French named entity recognition for places getting started Spanish translations Standards lack of standard for machine learning annotations Stanford's Literary Lab building on projects Microgenres Star Wars character verbs example Statistics as applied philosophy checking for normal distribution finding a good baseline independent vs dependent variables like a bra Mann-Whitney U test QQplot t-test Wilcoxon rank-sum test Stepwise variable selection Stéfan Sinclair advocating for a bigger tent as a husband and father as ACH president as co-organizer of DH 2017 as hypothetical DH pin-up calendar star at DH 2010 in London helping OCR a French book leadership and building relationships listening to critiques of Voyant memorial playful realizing an expansive vision of DH the nicest person in DH, ever working on Spyral Stopwards with Mallet Stopwords in Voyant Stuck winter and spring 2023 Syuzhet 'just say no' dubious connection between sentiment and plot graphs invite interpretation methods for navigating too much data running the simple visualization The Syuzhet Incident of 2015 word lists for scoring texts T Tableau Gantt visualization use with Typicality PCA output Tacos occurrences in youth literature TEI annotation is personal Atomic TEI background CBML compared to concordances creating custom schemas div element Elisa Beshero-Bondar handling character lists header highlighting how to explain it to an alien internationalization is it a cult? ODD personography queering TEI rend attribute said element semantic markup subsets tag abuse vs. customization transcribing an image with CBML value vs XML who makes up the rules Temporality in the Baby-Sitters Club Text acquisition downloading scanning Text alignment Bleualign machine translation limits running Bleualign use in Lee's dissertation Text classification classification by a reader distinctive words limitations PCA stepwise variable selection three-character sequences Text comparison what are we comparing? Text Data Mining Text distance variation in text length Text length counting words implications for Euclidean distance importance of checking for length length of BSC books Textual differences CollateX Diff Match Patch Juxta TF-IDF top terms vs human-memorable words Theorizing emotional labor how we answer questions modeling mitigation of emotional distress Tools AntConc Atom text editor Atomic TEI Bleualign CLAWS POS-tagger CollateX Comic Book Markup Language (CBML) Cytoscape Diff Match Patch Distant Viewing Toolkit Gephi Jupyter Notebooks Juxta Mallet OpenRefine Oxygen XML Editor spaCy StanfordNLP TEI textreuse R package tools have histories Topic Modeling Tool Transkribus Webscraper.io WordNet XML Topic modeling adding more topics at the start of the DSC authorless topic models chapter 2 character names character-specific topics corpus knowledge is useful definition of a computational topic document length downsides of more topics DSC misunderstanding of "document" how it works installing and running Mallet klutzy like Jackie Rodowsky Kristy looking at top documents for a topic Mallet output files meaning of "document" meaning through context recommended packages specific to a text collection spiky topic stopwords text pre-processing text splitter code topic proportions understanding keywords visualizing with heat maps worth trying Transkribus layout analysis Translation lack of localization for France localization for Europe localization for Quebec localization of Jessi's origins localization of where Jessi learned Spanish localized names name inconsistency within a book non-Belgium localization of Belgian books Quebec market soccer vs football thoughtfulness of food adaptation Translations handling sign language name adaptation Typicality finding the closest points to 0,0 how it works interpretation noun extraction PCA function project overview running Typicality code visualization V Visualization graph comprehensibility of Typicality PCA output Voyant as a manifestation of Stéfan Sinclair Cirrus word cloud tool classroom assignment Contexts tool critiques export overview questions it can answer shifting between tools stopwords Summary panel Terms tool Trends tool use as interpretive tool whimsy of the word knot W Web scraping with Webscraper.io Webscraper.io creating a sitemap simple use of selectors Websites maintenance Wizard of Oz Word frequencies vs word counts WordNet challenges with plant matter explanation looking for a label at any level running WordNet on texts use of weird definitions Workflows creating one that works for you for identifying and aligning food translations WorldCat X XML creating reasonable schemas syntax well-formed vs. valid why use TEI Y YA in Quebec Youth literature Goosebumps series books undervalued by academic institutions YRDL database not corpus identifying series books origin of the name PCA across series books problems with identifying words at scale