# 🗄️ NOTES - LIBRARY & INFO SCIENCE Information-seeking behavior patterns are purposeful, often non-linear, actions taken to bridge knowledge gaps, involving activities like searching, browsing, and evaluating information. Common patterns include, berrypicking (iteratively gathering small, relevant nuggets), chaining (following citations), and monitoring developments, with behaviors often driven by a Principle of Least Effort to find the easiest, most relevant information **Principle of Least Effort**: The tendency to choose the most accessible, convenient, or easiest source of information rather than the most high-quality one. **Active, not passive search** - people actively construct meaning through dynamic iterative, search process... Context matters - personal, professional, social factors Information overload - too much info hinders decisions [File Not Found" Series](https://apps.sciencefriday.com/data/ghosts.html) - sci short story Labeling theory is the theory of how the self-identity and behavior of individuals may be determined or influenced by the terms used to describe or classify them. It is associated with the concepts of self-fulfilling prophecy and stereotyping. An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have it. (Mooers 1959, p.1) ## Classification Basics of Classification Examples of Classification Why have a Taxonomy? How to build a Taxonomy? Building a Taxonomy from Scratch (simple) Building a Taxonomy from Scratch (long) How to Develop Terms How to Organize Terms Characteristics of a Good Taxonomy Taxonomy Pro Tips Challenges and Limitations Exercise - Taxonomy References Classification - Grouping together of similar related things and the separation of dissimilar and unrelated things and the arrangement of the resulting groups in a logical and helpful sequence [BS 8723-1:2005] Controlled vocabulary - A list of predefined, authorized terms that can be consistently applied to content Classification scheme - A method of organization according to a set of pre-established principles, usually characterized by an alpha-numeric notation system and a hierarchical structure of relationships among the entities. [ANSI/NISO Z39.19-2005] Taxonomy - A controlled vocabulary, all of which are connected in a hierarchy (ANSI/NISO Z39.19-2005). Taxonomy comes from Greek ταξινοµία (taxinomia), specifically the words taxis = order and nomos = law. Taxonomy can refer to either a hierarchical classification of things, or the principles underlying the classification. Classification scheme = Taxonomy A taxonomy/classification scheme is a set of mutually exclusive and non-overlapping classes arranged within a hierarchical structure and reflecting a predetermined ordering of reality. It is a knowledge organization system. The process of classification is simply the assignment of something to the correct taxonomic location Grouping “related things” together might seem like a simple thing to do but its not.... Most things can be classified in more than one way. Most classification systems do not handle this well. Examples of Classification - Taxonomies are Everywhere! * Libraries - Dewey Decimal Classification System (DDC) or Library of Congress Classification (LCC) * Biology - Linnaean Taxonomy e.g. Kingdom, Phylum, Class, Order, Family, Genus, Species. * Dictionaries and Thesauri * Government - AIRS/211 LA County Taxonomy of Human Services Why Have A Taxonomy * Helps you understand the structure of your knowledge domain at one easy glance * Captures the key vocabulary of your domain (semantic function) * Reduces ambiguity * Promotes consistency and predictability * Improves precision and recall * Makes content more usable How Do You Build a Taxonomy? - From scratch? - Organization/Community - Users and Experts - Existing data and databases - Adoption from - Taxonomy Warehouse - Other taxonomy/thesauri resources Always research to determine if an appropriate classification scheme already exists that could be modified or repurposed to meet the current needs. Don’t reinvent the entire wheel if possible. Building a Taxonomy from Scratch (simple steps) * Identify and Define a subject field * Collect terms * Organize terms * Fill gaps * Flesh out and interrelate terms How to Develop Terms * Terms should represent simple, unitary concepts i.e. a single concept (or unit of thought) * Only one term can be used to represent a given concept or entity * Most terms are nouns or simple noun phrases A unit of thought, formed by mentally combining some or all of the characteristics of a concrete or abstract, real or imaginary object. Concepts exist in the mind as abstract entities independent of terms used to express them. [ANSI/NISO z39.19-2005] ### Three main types of terms * **abstract concepts** - ex. properties of things; abstract entities * **concrete entities** - ex. things and their physical parts * **proper nouns** - ex. Ohio Tap into all relevant stakeholder communities and subject experts to ensure your list of terms is complete and accurately describes the collections. You should also note potential areas where you may encounter resistance or obstacles. How to Organize Terms * Identify core areas and peripheral topics * Once the core areas are established, continue to work through the next level until the entire scheme is developed. * Different Types of Taxonomy Structures - List, Tree, Hierarchy, Polyhierachy, Faceted, Matrix, etc * Tree - A tree structure is a grouped list subdivided at the top level and sub-categories underneath – e.g., like in a folder structure on your network share drives. * Hierarchy - A hierarchy is a tree structure that follows very strict rules about how it is subdivided. The same principle of subdivision must be consistently applied at every level. Think parent-child. Each node in the hierarchy can have only one parent; inheritance or inclusion - what is true for parent is true for child How to Develop, Deploy, and Maintain a Taxonomy (detailed steps) * Step 1: Identify Stakeholders * Step 2: Determine the Purpose * Step 3: Determine the Approach - top down vs bottom up; incremental or all at once * Step 4: Collect Information - conduct interviews and surveys * Step 5: Develop and Test Taxonomy * Step 6: Pilot Taxonomy * Step 7: Train Users and Deploy * Step 8: Gather User Feedback and Iterate ### Characteristics of a Good Taxonomy * **Intuitive** - reflects natural working or usage habits, assumptions or well-known structures (such as organizational structure, workflow) * **Unambiguous** - does not offer alternates * **Hospitable** - accommodates new content w/o revision * **Parsimonious** - no redundancy or repetition; no more than what is needed * **Durable** - does not require frequent changes or radical reorganization * **Balanced** - evenly distributed across the structure ...Consistent, Predictable, Meaningful and Relevant See http://community.aiim.org/blogs/beth-mayhew/2014/08/06/do-you-know-the-9-principles-of-classification ### Taxonomy Pro Tips A taxonomy should be clear and intuitive as possible, with enough detail and structure to cover the entire scope of information to be categorized, but with nothing extra or irrelevant that could lead to ambiguity or unusable branches or dead ends. The goal of the scheme is to make sure it is easy and understood well enough so that the organization will use it. Wherever possible, automate the process and validate. A taxonomy should be structured in a way that reflects natural working or usage habits, assumptions or well-known structures e.g. organization, workflow. Think “taskonomy” or "mise en place" - French culinary phrase which means "putting in place" or "everything in its place."; the arrangement of tools and resources around the most frequent and important tasks they serve. Maintaining the scheme is not a trivial task. Requires a good, robust governance system including policies, procedures and processes in place. Exercise - Taxonomy Dewey Decimal System The following shows how the number “636.8” is assigned to the classification of cats in the section of Animal Husbandry in the division of Agricultural and Related Technologies in the class of Technology: 600 Number from (first) summary of ten classes (i.e., the class named Technology) 630 Number from (second) summary of ten divisions (i.e., the division under class Technology named Agriculture) 636 Number from (third) summary of ten sections (i.e., the section named Animal Husbandry) 636.7 Number representing dogs 636.8 Number representing cats Consider how a listing of American cities might appear when categorized according to a hierarchy of regions and states in the United States. ### Challenges and Limitations * Initial Costs and Upkeep * Governance and Ownership * Too complex * Users may resist/misfile ### References * The Organization of Information * The Accidental Taxonomist * Sorting Things Out [Association for Information and Image Management](https://www.aiim.org) [Taxonomy Warehouse](http://www.taxonomywarehouse.com/) [Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies (ANSI/NISO Z39.19-2005)](https://groups.niso.org/apps/group_public/download.php/12591/z39-19-2005r2010.pdf) **“force-fitting”** new concepts into unsuitable categories **“big bang” approach** and attempt to install major portions of the classification immediately, or if a more incremental delivery approach makes more sense for them. Types of relationships * equivalence (ex aliases) * associative (ex cousins) * hierarchical * reciprocal (ex broader/narrower; whole-part) * homographic (ex same spelling, diff meaning) ## Classification vs Categorization [Classification and Categorization: A Difference that Makes a Difference](https://pdfs.semanticscholar.org/774e/ab27b22aa92dfaa9aeeeafbe845058e85f58.pdf) - Structural and semantic differences between classification and categorization are differences that make a difference in the information environment by influencing the functional activities of an information system and by contributing to its constitution as an information environment **Classification** divides a universe of entities into an arbitrary system of mutually exclusive and non-overlapping classes that are arranged within the conceptual context established by a set of established principles. **Categorization** divides the world of experience into groups or categories whose members bear some immediate similarity within a given context. Basically, categorization is the process of dividing the world into groups of entities whose members are in some way similar to each other. Classification as process involves the orderly and systematic assignment of each entity to one and only one class within a system of mutually exclusive and non-overlapping classes. A logical scheme for the arrangement of knowledge, usually by subject. Classification schemes are alpha and/or numeric. [DCMI Glossary] Classification schemes * rarely gets changed (due to the pre-established numeric code hierarchy) * based on alpha-numeric codes * involves assigning an item only one classification code unambiguous terms, clear to the user or group distinguish between terms that appear similar ambiguity occurs in the natural language when a word or phrase (a homograph or polyseme) has more than one meaning Another way to think of the comparison: Classification is for: where to put things/where does this document or item go. Taxonomy is for: how to describe content/what is this text, image, or other media about. conceptual classes Lists -> Taxonomgy -> Ontology Increasing complexity and control Controlled Vocabulary Taxonomy Thesurus Ontology Semantic Network Buckland, M. (1991). Information as Thing. Journal of the American Society for Information Science, 42(5), 351-360. Furnas, G. W., Landauer, T. K., Gomez, L. M., & Dumais, S. T. (1987). The vocabulary problem in human-system communication. Communications of the ACM, 30(11), 964-971. https://www.uio.no/studier/emner/matnat/ifi/INF3280/v14/pensumliste/additionalliterature/furnasetal1987vocabularyproblem.pdf Belkin, N. J., Oddy, R. M., & Brooks, H. M. (1982). ASK for information retrieval: Part I. Background and theory. Journal of Documentation, 38(2), 61-71. Dewdney, P. & Michell, G. (1996). Oranges and peaches: Understanding communication accidents in the reference interview. RQ, 35(4), 520-523 & 526-536. ## Information Architecture [Building an Information Architecture Checklist](https://pdfs.semanticscholar.org/0061/e2210fa2e9b0202e500d7556c452df5cd0dc.pdf) Several definitions in the IA field generally focus on organizing information via mechanisms such as labeling, structuring, chunking, and categorizing in order to support navigation, findability and usefulness. Bailey’s (2002) definition of IA is perhaps the simplest and most straightforward: IA is the art and science of organizing information so that it is findable, manageable and useful. There is also a perspective from enterprise architecture that views information architecture as an enterprise wide activity that includes such aspects as data architecture, metadata management and knowledge management (Stiglich, 2007). Another definitional approach is that of Big Architect (strategic) or Big IA and Little Architect (tactical) or Little IA (Morville, 2000; Dillon, 2002; Dillon & Turnbull, 2005). Dillon (2002) and Dillon & Turnbull (2005) discuss these competing views. Little IA and Big IA both focus on information organization with Little IA being done from the ground up and Big IA being approached from the top down. A major difference is that of user experience. Little IA does not focus on formal user experience but more on metadata and controlled vocabulary. Big IA, as the name implies, is approached from a wider view and includes user and organizational aspects with an emphasis on information being useful, usable and acceptable. e-government is underpinned by information and its effective management is a necessary prerequisite for service delivery Information management is concerned with information quality, security, business processes and metadata and all of these need to be addressed to deliver good e-government. Good information management implies understanding what information assets are in place and what part they play in a particular business process and the first step in that regard is usually the compilation of an information audit which details the size and scope of the information available and its lifecycle. Enterprise Architecture or Enterprise Information Architecture and was more concerned with infrastructure and applications than information per se. Information Architecture and e-Government www.researchgate.net/profile/John-Akeroyd-2/publication/238766830_Information_Architecture_and_e-Government/links/591e000545851540595d937a/Information-Architecture-and-e-Government.pdf A resurgence of interest in Information Architecture https://www.researchgate.net/profile/John-Akeroyd-2/publication/221954673_A_resurgence_of_interest_in_Information_Architecture/links/6291cd6c88c32b037b56f65e/A-resurgence-of-interest-in-Information-Architecture.pdf?_sg%5B0%5D=started_experiment_milestone&origin=journalDetail&_rtd=e30%3D ## Misc Tech Ethics Curriculum -- a Google sheet of tech ethics courses, with pointers to syllabi. https://docs.google.com/spreadsheets/d/1jWIrA8jHz5fYAW4h9CkUD8gKS5V98PDJDymRf8d9vKI/edit#gid=0 Ethics for the Information Age, 7th Ed., by Michael Quinn Baase, S. (2013). Chapter 1. A gift of fire: social, legal, and ethical issues for computing technology. Upper Saddle River, NJ: Pearson. Question Negotiation and Information Seeking http://choo.ischool.utoronto.ca/fis/courses/lis1325/QuestionNego.pdf Conceptual Approaches for Defining Data, Information, and Knowledge http://www.success.co.il/is/zins_definitions_dik.pdf Yarger (Kvasny). “Let the sisters speak: understanding information technology from the standpoint of the ‘other.’” The DATA BASE for Advances in Information Systems 37:4 pp. 13-25. https://faculty.ist.psu.edu/lyarger/DataBase-Kvasny- Forthcoming.pdf Wajcman. “Feminist theories of technology.” Cambridge Journal of Economics, 34:1 pp. 143–152. http://dx.doi.org/10.1093/cje/ben057 Castilla and Benard. “The paradox of meritocracy in organizations.” Administrative Science Quarterly 55:4. http://dx.doi.org/10.2189/asqu.2010.55.4.543 Freeman. “The tyranny of structurelessness.” http://www.jofreeman.com/joreen/tyranny.htm Ehmke. “The dehumanizing myth of the meritocracy.” Model View Culture https://modelviewculture.com/pieces/the-dehumanizing-myth-of-the-meritocracy Cooper. “The false promise of meritocracy." https://www.theatlantic.com/business/archive/2015/12/meritocracy/ 418074/ Max Weber, "Bureaucracy," From Max Weber (New York: Oxford University Press, 1978) 196-244 Federal freedom of information policy: Highlights of recent developments https://www.sciencedirect.com/science/article/pii/S0740624X08001494 Government Information Quarterly Volume 26, Issue 2, April 2009, Pages 314-320 Library & Information Science Research Volume 30, Issue 1, March 2008, Pages 2-21 Government Information Quarterly Volume 4, Issue 2, 1987, Pages 189-196 Boyd and Crawford, “Critical Questions for Big Data” Zarsky, “The Trouble with Algorithmic Decisions” Gitelman and Jackson, Raw Data is an Oxymoron [Introduction] Agre, “Surveillance and Capture: Two Models of Privacy” Bowker and Star, Sorting Things Out Auerbach “The Stupidity of Computers” Moor, “What is Computer Ethics?” Hand, “Deconstructing Statistical Questions” https://www.jstor.org/stable/2983526 O’Neil, On Being a Data Skeptic Domingos, “A Few Useful Things to Know About Machine Learning” Lavergne and Mullainathan, “Are Emily and Greg more Employable than Lakisha and Jamal?” Kroll, Huey, Barocas, Felten, Reidenberg, Robinson, and Yu, “Accountable Algorithms” Robinson and Yu, Knowing the Score Buolamwini, “Algorithms Aren’t Racist. Your Skin Is just too Dark” ## Folksonomy / Collaborative Tagging Folksonomy aka collaborative tagging, social classification System where users are encouraged to tag multiple times, even within the same facet, ambiguity is detected by measuring the broad consistency of tagging decisions. The more diverse the tagging decisions, the more ambiguity that exists Folksonomy can be problematic due to its potential for poorly applied tags, ambiguity, and subjectivity, which can lead to confusion and inconsistent categorization of content. Additionally, the lack of a controlled vocabulary can result in difficulties in finding relevant information. Folksonomy and tagging often feels smart/good to neophytes; because its "accepting", "accommodating", "democratizing" but its pretty much garbage most of the for information management and retrieval. ## Misc Metadata & Classification: Metadata (data about data), controlled vocabularies, and classification schemes (e.g., Dewey Decimal, Library of Congress) enable structured access. Information Architecture (IA): The structural design of shared information environments, emphasizing usability, navigation, and experience. Taxonomy & Indexing: Defining hierarchical relationships between information components and assigning tags for easier retrieve Step 1 is identify and isolate what you want to organize (e.g., books) Step 2 is identify the attributes of thing you want organize (e.g,, for books - think title, author, publication date, theme, genre, format, description) and standardize this attributes with some structure (eg., controlled vocabulary, metadata Step 3 is identify the manner you want to arrange and organize things to enable structured access (e.g.,for books that would be Dewey Decimal Classification (DDC), Library of Congress Classification (IC), or Book Industry Standards and Communications (BISAC)) Step 4 is re-organizing everything into that structure and maintaining that structure Step 5 is now weave/reinforcing that structure throughout the environment