Congratulations, Doctors: PhD Students Successfully Defend their Dissertations

Congratulations, Doctors: PhD Students Successfully Defend their Dissertations

Please help us in congratulating the successful dissertation defenses of our iSchool students: Dr. Rebecca Follman, Dr. Ning Gao, Dr. Chi Young Oh, Dr. Jyothi Vinjumur, and Dr. Amanda Waugh! Read about their research below:

Dr. Rebecca Follman

Dissertation Title: Describing the Ineffable: A Mixed-Methods Study of Faculty Mentoring Information Practices
Formal mentoring programs are a valuable tool for the professional development and socialization of new employees, and for the mentor. However, formal mentoring is often difficult to institutionalize. What are the indications that mentor and mentee should be split up? How often should mentoring partners meet? These questions and others highlight the problem: without a clear definition of mentoring itself, we are challenged to identify the characteristics of good mentoring. Mentoring is so contextual, and generally so private, that it is difficult to define. However, there is one element that is central to all mentoring relationships, and that can be used to describe mentoring explicitly – the exchange of information. The study described here consists of a longitudinal, mixed-method investigation of mentoring attitudes and practices among higher education faculty, with the goal of gathering data about the information practices – information seeking and sharing in a social context – of faculty engaged in mentoring. The study identifies the information practices of faculty who are engaged in mentoring, as well as how those information practices change across time. Faculty were surveyed about their attitudes toward mentoring, using an online instrument. The respondents provided data about their experiences with mentoring, including aspects such as the frequency of their meetings with mentoring partners, the topics they often discussed, the number of years they had worked with mentoring partners, their expectations of their mentoring partners, and their personal philosophy of mentoring. Faculty mentoring participants also completed an online diary of their mentoring information practices. The information diary provided an opportunity for faculty mentoring participants to share their information practices in real time, without requiring a prohibitive amount of effort. Data analysis shows that faculty mentoring participants do engage in information practices, such as seeking or sharing information regarding the specifics of the work environment, with the goal of transmitting culture (e.g., the requirements to achieve tenure). Both mentors and mentees value honest and open communication with their mentoring partners. Examination of the information exchanged between mentoring participants gives us a sense of what topics are most likely to be addressed, and also recommendations for new mentors and mentees. 
Examination Committee:
Beth St. Jean (Chair), iSchool
Brian Butler, iSchool
Paul Jaeger, iSchool
Ann Weeks, iSchool
Steve Marcus (Dean's Representative), Electrical & Computer Engineering

Dr. Ning Gao

Dissertation Title:  Towards Population of Knowledge Bases from Conversational Sources


With an increasing amount of data created daily, it is challenging for users to organize and discover information from massive digital documents (e.g., text and speech). The population of knowledge bases refers to the task of extracting information from unstructured sources (e.g., news articles and web pages) to structured external knowledge bases (e.g., Wikipedia), which has the potential to advance information archiving and access, and to support knowledge discovery and reasoning. Because of the complexity of this task, knowledge base population is composed of multiple sub-tasks, including the entity linking task, defined as linking the mention of entities (e.g., persons, organizations, and locations) found in documents to their referents in external knowledge bases and the event task, defined as extracting related information for events that should be entered in the knowledge base.

Most prior work on tasks related to knowledge base population has focused on dissemination-oriented sources written in the third person (e.g., news articles) that benefit from two characteristics: the content is written in formal language and is to some degree self-contextualized, and the entities mentioned (e.g., persons) are more likely to be widely known to the public so that rich information can be found from existing general knowledge bases (e.g., Wikipedia and DBpedia). The work proposed in this thesis focuses on tasks related to knowledge base population for conversational sources written in the first person (e.g., emails and phone recordings), which offers new challenges. One challenge is that most conversations (e.g., 68% of the person names and 53% of the organization names in Enron emails) refer to entities that are known to the conversational participants but not widely known. Thus, existing entity linking techniques relying on general knowledge bases are not appropriate. Another challenge is that some of the shared context between participants in first-person conversations may be implicit and thus challenging to model, increasing the difficulty, even for human annotators, of identifying the true referents.

This thesis focuses on several tasks relating to the population of knowledge bases for conversational content: the population of collection-specific knowledge bases for organization entities and meetings from email collections; the entity linking task that resolves the mention of three types of entities (person, organization, and location) found in both conversational text (emails) and speech (phone recordings) sources to multiple knowledge bases, including a general knowledge base built from Wikipedia and collection-specific knowledge bases; the meeting linking task that links meeting-related email messages to the referenced meeting entries in the collection-specific meeting knowledge base; and the speaker identification technologies to improve the entity linking task for phone recordings without known speakers. Following the model-based evaluation paradigm, three collections (namely, Enron emails, Avocado emails, and Enron phone recordings) are used as the representations of conversational sources, new test collections are created for each task, and experiments are conducted for each task to evaluate the efficacy of the proposed methods and to provide a comparison to existing state-of-the-art systems. The proposed work has implications in the research fields of e-discovery, scientific collaboration, speaker identification, speech retrieval, and privacy protection.

Examining Committee:
Associate Professor Mark Dredze (Johns Hopkins)
Assistant Professor Vanessa Frias-Martinez
Associate Professor Jennifer Golbeck
Associate Professor David A. Kirsch (Dean’s representative)
Professor Douglas W. Oard (Chair/Advisor)

Dr. Chi Young Oh

Dissertation Title: Who was a Neighbor to Those from the Other Side of the Globe?: International Newcomer Students’ Local Information Behaviors in Unfamiliar Environments

Although international education has been increasingly popular around the globe, international students face various challenges in their new countries. Previous studies have addressed their challenges in adjusting to different cultures, languages, societal systems, and social networks. However, little is known about the role and nature of information behaviors involved during their adjustment. Specifically, although a few studies considered international students’ needs for local information, such as information about local areas, housing, places, services, and transportation in new environments, there remain significant unknowns about the local information behaviors (LIBs) of international students during adjustment to new environments. Open questions include: 1) How do international newcomer students need, seek, and use local information during adjustment to new environments?, 2) What are the factors and contexts that shape international newcomer students’ LIBs?, 3) How do LIBs vary among international newcomer students from different countries of origin?, 4) How do international newcomer students’ LIBs change over time as they adjust to new environments? To address these questions and develop a holistic understanding of international students’ LIBs, I conducted a series of three studies using a mixed-method approach with surveys, interviews, and cognitive mapping. 
The first study of 20 international graduate students suggested that international students’ LIBs might vary depending on their co-national social environment in local areas. International students who had many co-nationals in their new environment tended to perceive co-nationals as their main information source while those who had fewer co-nationals in their new environment did not. In the second study, the degree to which individuals from the same countries of origin are available in local environments was defined as socio-national context. The second study of 149 international and domestic graduate students finds that socio-national context is a key factor in shaping new international students’ LIBs. “International-common” students who are from the top 3 most common countries of origin—China, India, and Korea—used co-nationals as their major social source of local information both offline and online, augmented by their active use of online and mobile social technologies. In contrast, “International-less-common” students, who are from other countries of origin, significantly less frequently engaged in such social information practices. Instead, "International-less-common" students relied primarily on non-social, online and mobile sources. The findings of the third study, conducted one year later with a subset of the students from the second study, show that the use of co-national sources by “International-common” students declines and that online and mobile sources are used as the main sources of information by most participants. 
Overall, the findings indicate that international students are not a monolithic group in terms of their information behaviors and that socio-national context plays a key role in shaping international newcomer students’ LIBs. In addition, the interplay between socio-national and socio-technical contexts allows "International-common" students to better connect with and effectively share information among their co-nationals. More attention and support are needed for "International-less-common" students who may experience more information challenges and difficulties in host countries, especially during early adjustment. Also, the findings suggest that temporal context affects international students’ LIBs during adjustment to new environments. Based on the findings, this dissertation concludes that information behavior models and theories need to account for socio-national context and its interactions with other contexts, such as socio-technical and temporal contexts, which can in turn influence international students’ LIBs in new countries. This theoretical consideration is specifically crucial if those theories and models are to be more relevant in global and migration contexts and to provide helpful insights for the design of systems and services for internationally mobile students around the globe.
Examining Committe:
Dr. Brian Butler
Dr. Paul Jaeger
Dr. Beth St. Jean
Dr. Jessica Vitak 
Dean's Rep: Dr. Kent Norman

Dr. Jyothi Vinjumur

Dissertation Title: Use of Predictive Coding Techniques with Manual Review to identify Privileged Documents in E-Discovery


In twenty-first century civil litigation, discovery focuses on the retrieval of electronically stored information. Lawsuits may be won or lost because of incorrect production of electronic evidence. Organizations may generate fewer paper documents, leading to an increase in the amount of electronic documents by many fold. Litigants face the task of searching millions of electronic records for the prevalence of responsive and not-privileged documents, making the e-discovery process burdensome and expensive. In order to ensure that the material that has to be withheld is not inadvertently revealed, the electronic evidence that is found to be responsive to a production request is typically subjected to an exhaustive manual review for privilege. Although the budgetary constraints on review for responsiveness can be met using automation to some degree, attorneys have been hesitant to adopt similar technology to support the privilege review process. This dissertation draws attention to the potential for adopting predictive coding technology for the privilege review phase during the discovery process.
Two main questions that are central to building a privilege classifier are addressed.  The first question seeks to determine which set of annotations can serve as a reliable basis for evaluation. The second question seeks to determine which of the remaining annotations, when used for training classifiers, produce the best results. As an answer, binary classifiers are trained on labeled annotations from both junior and senior reviewers. Issues related to training bias and sample variance due to the reviewer's expertise are thoroughly discussed. Results show that the annotations that were randomly drawn and annotated by senior reviewers are useful for evaluation. The remaining annotations can be used for classifier training.
A research prototype is built to perform a user study. Privilege judgments are gathered from multiple lawyers using two types of user interface. One of the two interfaces includes automatically generated features to aid the review process. The goal is to help lawyers make faster and more accurate privilege judgments. A significant improvement in recall was noted when comparing the users' review performance when using the automated annotations. Classifier features related to the people involved in privileged communications were found to be particularly important for the privilege review task. Results show that there was no measurable change in review time.
As cost is proportional to time during review, as the final step, this work introduces a semi-automated framework that aims to optimize the cost of manual review process. The framework calls for litigants to make some rational choices about what to manually review. The documents are first automatically classified for responsiveness and privilege, and then some of the automatically classified documents are reviewed by human reviewers for responsiveness and/or for privilege with the overall goal of minimizing the expected cost of the entire process, including costs that arise from incorrect decisions. A risk-based ranking algorithm is used to determine which documents need to be manually reviewed. Multiple baselines are used to characterize the cost savings achieved by this approach. Although the work in this dissertation is applied to e-discovery, similar approaches could be applied to any case in which retrieval systems have to withhold a set of confidential documents despite their relevance to the request.
Examining Committee:
Chair: Dr. Douglas W. Oard
Dean's Rep: Dr. Hal Daumé III
Dr. Beth St. Jean
Dr. Vanessa Frias-Martinez
Dr. Fabrizio Sabastiani (CNR, Italy)

Dr. Amanda Waugh

Dissertation Title:  A Nice Place on the Internet: An Exploratory Case Study of Teen Information Practices in an Online Fan Community


This dissertation focuses on the everyday life information practices of teens in the Nerdfighter online fan community known as Nerdfighteria. Nerdfighteria is the community of fans of vloggers, John and Hank Green. This study examines aspects of everyday life information seeking (ELIS) by 1) focusing on an understudied population, teens between the ages of 13 to 17; 2) focusing on a fan community, Nerdfighteria, which has many members, but has been rarely studied in the academic literature; and 3) investigating everyday life information practices using a single community that utilizes multiple online platforms (i.e. Facebook, Twitter, Discord, and YouTube), rather than centering on a single platform. This dissertation is a case study incorporating a survey of 241 teens and semi-structured interviews with 15 teens about their experiences in Nerdfighteria, followed by a month-long diary activities. The study also included observations of public communities and review of documents related to the Nerdfighter community. Data analysis was iterative and incorporated grounded theory techniques. This study finds that teen Nerdfighters use their fan community to engage in a wide variety of everyday life information seeking around topics that are related to their personal development. Social, cognitive, emotional, and fan topics were predominant. Teen Nerdfighters engaged across platforms and were likely to switch platforms to find the optimal technical affordances while staying in Nerdfighteria. The teens viewed these changes as staying within the community rather than changing from one platform to another—illustrating the primacy of the community to the teens in meeting their information needs. Teens were drawn to Nerdfighteria because they believed it to be a unique place on the Internet, which valued intellectualism, positivity, and kindness. In many cases, teens preferred to observe other’s interactions in order to gain the information they needed or wanted, and waited to engage via posting or responding when certain criteria were met. These findings describe the complicated interplay of the ELIS topics sought, the preferred practices for meeting an information need, and the reasons for choosing one community over another.

Examination Committee:
Dr. Mega Subramaniam (Chair), iSchool
Dr. Brian Butler, iSchool
Dr. Tamara Clegg, iSchool
Dr. Kari Kraus, iSchool
Dr. Ronald A. Yaros (Dean’s Representative), Philip Merrill College of Journalism