Research Guides: Artificial Intelligence: AI in Research

AI Tools for Research

Large Language Models: An Overview

Large Language Models (LLMs) can significantly enhance academic research.

Literature Review and Summarization: LLMs can quickly sift through large amounts of academic papers, extracting key points, summarizing content, and helping researchers understand the landscape of a particular field. This reduces the time spent on manual literature review.
Data Analysis and Interpretation: LLMs can analyze data, generate hypotheses, and provide interpretations. They can also help researchers make sense of complex datasets by finding patterns and relationships within the data.
Writing Assistance: LLMs can assist in drafting and editing research papers by generating coherent text based on inputs, suggesting improvements, and ensuring consistency. They can help researchers articulate their findings more effectively.
Code Generation and Debugging: In fields requiring programming, LLMs can write code snippets, help debug errors, and optimize algorithms, thus speeding up computational research.
Translation and Accessibility: LLMs can translate research papers into different languages, making them accessible to a broader audience. They can also simplify technical content, making it easier for non-experts to understand.
Idea Generation and Collaboration: LLMs can help brainstorm new research ideas by providing insights based on existing literature and suggesting novel research directions. They can also facilitate collaboration by summarizing discussions or drafting project proposals.
Ethics and Bias Identification: LLMs can assist in identifying potential ethical concerns or biases in research methods or data analysis, helping to improve the integrity of the research.

Challenges with LLMs and Reproducibility

While the benefits of using generative AI models in research can be significant, there are several pitfalls to avoid. Reproducibility in scientific research has become a significant problem, with some studies showing that more than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments. Reproducibility becomes a greater challenge when using LLMs due to factors like model complexity, large-scale datasets, and proprietary algorithms. Here's a quick look at the most significant issues.

Proprietary Models: Many advanced LLMs (e.g., GPT-4) are owned by private companies, meaning their architecture, training data, and fine-tuning processes are often inaccessible. This lack of transparency makes it hard for researchers to replicate or verify the results.
Resource Intensive: LLMs often require vast amounts of computational resources to train and fine-tune, making it difficult for smaller research teams to recreate the same conditions.
Data Availability: The datasets used to train LLMs are often proprietary or unavailable, preventing others from reproducing the exact training process or results.
Version Control and Model Drift: With frequent updates to LLMs, results can change depending on which version of a model is used. Without version control, it’s challenging to ensure reproducibility over time.

How Open-Source LLMs Solve Reproducibility Issues

Open-source LLMs enhance reproducibility by providing transparency in architecture, accessible code, and shared datasets, while proprietary LLMs often face challenges in this area due to closed systems and limited data availability.

Transparency in Architecture: Open-source LLMs like GPT-Neo, Bloom, or Falcon share their architectures, allowing researchers to understand how the models are built and replicate them in their own environments.
Accessible Code and Models: Researchers can directly use and modify open-source LLMs, ensuring they can replicate experiments or fine-tune models with clear visibility into the process. This transparency also makes it easier to reproduce research results.
Standardized Training Data: Open-source LLMs often share their training datasets or provide instructions for recreating a similar dataset, which allows other researchers to reproduce results using the same data.
Reproducible Environments: Tools like Hugging Face’s model hub provide standardized environments where LLMs can be easily shared, reproduced, and evaluated, maintaining consistent results across different users.
Collaborative Improvements: The open-source community contributes to improving LLMs, fixing bugs, and enhancing reproducibility by collaboratively working on shared codebases, ensuring that future research based on these models remains reliable and transparent.

Additional reading on reproducibility in research and LLMs

Projects and Project Consultations

KSL librarians are always available for consultations with faculty, students, and staff at any point in the research life cycle. This includes projects that use, or hope to use, generative AI tools. Please don't hesitate to reach out to us. If we can't help you directly, we will know someone who can!

Open Source Generative AI Tools

Hugging Face

The first stop on any open source LLM journey should be huggingface.co. Hugging Face is a popular platform and community that provides tools and resources for working with machine learning models, particularly in natural language processing (NLP). It is widely known for its Transformers library, which allows users to easily access, fine-tune, and deploy state-of-the-art deep learning models for tasks such as text generation, translation, classification, and more.

Model Hub

Hugging Face hosts a massive Model Hub that contains thousands of pre-trained models contributed by the community. These models cover various domains like NLP, computer vision, and speech processing. Users can:

Download and use pre-trained models.
Fine-tune models on specific datasets.
Upload and share their own models with others.

Models on the platform include those based on architectures like BERT, GPT, T5, and more, and are suited for tasks like sentiment analysis, translation, summarization, and text generation.

Transformers Library

Hugging Face's Transformers library is a widely-used Python package that simplifies the use of pre-trained deep learning models. It provides:

Easy-to-use interfaces for working with models in NLP tasks such as text classification, named entity recognition (NER), machine translation, and question answering.
Support for multiple frameworks like PyTorch and TensorFlow.
Extensive documentation and tutorials to help researchers and developers get started quickly.

Datasets Library

Hugging Face also offers a Datasets library, which provides access to a wide range of datasets for NLP and other machine learning tasks. The datasets can be easily loaded, preprocessed, and integrated into machine learning pipelines.

Spaces

Hugging Face Spaces is a feature that allows users to create and share machine learning demos and applications using web technologies like Streamlit or Gradio. It enables the community to easily showcase models and interact with them in real-time.

Hugging Face Community

The platform fosters a vibrant community of researchers, developers, and data scientists who contribute models, datasets, and knowledge. The community aspect encourages collaboration and knowledge sharing, accelerating advancements in machine learning.

Hugging Face Hub

The Hugging Face Hub is a collaborative platform where users can manage machine learning experiments, monitor model training, track performance, and version control their models and datasets. It integrates with the Transformers and Datasets libraries, making it a central place for managing the machine learning lifecycle.

Inference API

Hugging Face offers an Inference API that allows developers to easily deploy models and integrate them into applications without needing to manage complex infrastructure. Users can send requests to models hosted by Hugging Face via simple API calls.

Open-Source and Commercial Offerings

While Hugging Face provides powerful open-source tools and resources, they also offer commercial services, including managed hosting and enterprise solutions for deploying machine learning models at scale.

Hugging Face is a versatile platform that provides tools for accessing and fine-tuning pre-trained models, managing datasets, and sharing machine learning applications. It plays a significant role in democratizing access to cutting-edge AI models and fostering collaboration in the machine learning community.

LM Studio
LM Studio is an open-source application that allows users to run large language models (LLMs) locally on their laptops or desktops without needing cloud-based services. LMStudio is ideal for tasks like text generation, research, content creation, and more, providing a practical, private, and customizable LLM experience. Key features include:
- Local Deployment: Run LLMs directly on your hardware.
- User-Friendly Interface: No-code options for easy interaction.
- Model Flexibility: Support for various models and customization.
- Privacy: Full control over data with no external servers.
All LLM Directory
All LLM Directory is a comprehensive directory that lists LLMs, both commercial and open-source. The platform provides detailed comparisons, allowing users to explore a wide range of LLMs based on different categories like text-based, multilingual, or domain-specific models. It includes both well-known commercial models such as ChatGPT, Bard, and Claude, and popular open-source alternatives like Alpaca and BLOOM. The directory is aimed at helping developers find suitable models for their projects while also offering exposure to those who have built LLMs and wish to share them with the community

There are many excellent guides regarding the various Open Source LLMs available, such as this one from Open CV University. You can find LLMs to support your research by searching the web for Open LLMs and including your research area in the search, or by reaching out to a librarian at KSL for help finding LLMs that are already fine tuned for work in your area of interest.

Proprietary AI Tools and Chatbots

ChatGPT
ChatGPT is a language model developed by OpenAI, based on the GPT architecture, designed to generate human-like text and engage in natural conversations. It can perform a wide variety of tasks, including answering questions, writing content, coding, and more, by leveraging its extensive training on diverse text data. ChatGPT is widely used for chatbots, content creation, and personal or professional assistance across different fields. There are many tutorials for using ChatGPT available online, making it a great option for beginning to explore LLMs.
Claude from Anthropic
Claude AI is a conversational artificial intelligence developed by Anthropic, designed to assist users through natural language processing. It emphasizes safety and ethical considerations in its interactions, aiming to provide helpful and reliable responses. With a focus on user-friendly dialogue, Claude AI can handle a wide range of queries while prioritizing clarity and understanding. There are also significant data visualization and project management capabilities.
Phind.com
With very powerful coding and tech project management capabilities, Phind is an AI-powered search engine and chatbot that provides instant answers to questions instead of having to dig through hours of search results. It features multiple AI models including unlimited free uses of Phind’s own natural language model. Phind also has access to the internet for receiving up-to-date information, and when using the chatbot mode, you can provide additional context with your message (useful for adding code, snippets, data, etc).
Copilot
A tool for supporting research as it offers real-time assistance with writing, editing, and summarizing academic content. Its advanced AI capabilities can help researchers synthesize information, generate new ideas, and even translate content across various languages. Use your CWRU single sign-on user name and password to access your secure CoPilot account.
Research Rabbit
ResearchRabbit is a free online “citation-based literature mapping tool." It is a visual literature review software mapping tool that is similar to Spotify. The tool connects your research interests to related articles and authors.

Dimensions
A data analytics platform that provides access to various research-related data, including publications, grants, patents, and clinical trials. It uses AI to offer insights and analytics that can inform research strategies. CWRU users need to create an account with their network ID email address.

AI-Enhanced Library Resources

Research databases and other subscription-based and library-licensed tools are increasingly adding AI components and features to their platforms. We'll compile new developments that may aid in your research as they become available. If you are interested in an AI-enhanced tool the library does not have, please contact us to let us know!

scite.ai
Scite.ai is a tool designed to help researchers, faculty, staff, and students evaluate the credibility and impact of research articles. It uses artificial intelligence to analyze citation contexts, identifying whether a paper has been supported, contradicted, or simply mentioned by subsequent research. This helps users quickly understand the quality and relevance of research literature, facilitating more informed decision-making in research and academic writing.
Statista
Statista's Research AI delivers concise, written responses tailored to user queries, integrating statistics, infographics, diverse topics, and key insights from market research.
EBook Central
EBook Central is a collection of ebooks in a range of subjects. Ebook Central Research Assistant provides AI-enhanced capabilities while reading titles on the EBook Central platform to guide users to quickly assess the relevance of each book and its chapters, helping to review, analyze and explore new ideas.

Dimensions Research GPT
This scholar GPT brings Dimensions’ scientific evidence and ChatGPT's powerful generative AI capabilities together - get summaries, insights and citations for research topics evidenced in academic papers in seconds.
Generative AI on JSTOR (Beta)
JSTOR, a nonprofit service from ITHAKA, invites you to explore the beta release of a generative AI-powered research tool. Developed in collaboration with our community, this interactive tool leverages innovative technology and JSTOR’s trusted corpus to empower people to deepen and expand their research.
Financial Times - Ask FT
Financial Times offers Ask FT, which generates automated answers summarized from Financial Times articles. Create a FT account with your CWRU credentials for full access.
PolicyView: AI
PolicyView: AI is a new twice-monthly intelligence report that monitors and summarizes developments with artificial intelligence legislation, regulation, and conversations at the state and federal levels.

Guidelines for Using Library-Licensed Content in GenAI Models

The use of library-acquired content and materials (journal articles, eBooks, chapters, etc.) in Generative AI models, LLMs, RAGs, etc. may be restricted by licensing agreements and copyright laws. Pending court cases and legislation will continue to shape how copyright law and AI interact. There may be instances where certain material can be used in closed Case environments only, and CWRU libraries and the OhioLink consortium are investigating the potential to add language in licensing agreements to allow certain content to be used in closed AI environments, but that work is in development.

We recommend contacting the library before using any library-licensed content in Generative AI Models, LLMs, RAGs, etc.

AI & Copyright
US Copyright AI Information Page
Copyright Registration Guidance
Guidance for works containing material generated by Artificial Intelligence
Fair Use: Training Generative AI