Overview

Moonlighting proteins are single polypeptide chains that perform multiple, unrelated biological functions without changes in amino acid sequence.

MoonLitDB is a database of protein functions extracted from literature related to moonlighting proteins. Each function entry is predicted to be canonical (primary) or moonlighting (secondary) with a confidence score and linked to source sentences in publications. Protein identifiers are normalized using PubTator³. Annotations include species, localization, switching mechanisms and GO Slim terms predicted by Gemini from function descriptions. GO Slim prediction uses Gemini's internal semantic knowledge rather than simple text matching or cosine similarity approaches used by existing tools like Text2Term and BioPortal Annotator, enabling more accurate biological context understanding.

Tutorials

Quick, step-by-step guides for common tasks in MoonLitDB. Click a question to expand.

  1. Open the home page and choose By Gene.
  2. Enter a gene symbol or Gene ID.
  3. Click Search and review the results card and function table.
  4. If the gene/protein is not present in MoonLitDB, check the cross-database presence/absence summary for other moonlighting protein databases.
  5. Check the evidence text and PMIDs to validate each moonlighting annotation.

Search Methods

By Gene Search

Search for individual genes by NCBI Gene ID or gene symbol (e.g., GAPDH, TP53). Filter by function type (all, moonlighting, canonical), species, and GO Slim confidence threshold.

Results: Gene cards with aggregated functions, wordclouds, GO Slim terms (Gemini-predicted), and detailed function tables.

Batch Search

Query multiple genes simultaneously by entering a list of gene names or IDs. Supports bulk data retrieval with optional species and function type filters.

Results: Combined results for all genes with summary statistics and CSV export options.

Advanced Search

Build complex queries using rule-based conditions. Search across function names, types, species, localization, switching mechanisms, notes, source sentences, protein/gene names, IDs, and UniProt IDs. Combine conditions with AND/OR logic.

Results: Individual function entries matching your query criteria.

Data Modes

The home page offers two viewing modes, toggled via buttons at the top:

Normalized Data (Default)

Proteins grouped by NCBI Gene IDs using PubTator³ normalization.

Use when: You want standardized gene identifiers and cross-database links.

Raw Data

Proteins displayed with original names as extracted from literature.

Use when: You need to see exactly how proteins were mentioned in source papers.

Note: Raw data is retained to enable re-normalization with improved gene normalizers and entity recognition tools in the future, as PubTator³ represents the current state-of-the-art.

Export Formats

All search results can be exported using the Download Results button. Click the dropdown arrow to choose your preferred format:

Excel (.xlsx)

Multi-sheet workbook with summary and detailed data

CSV

Comma-separated values for spreadsheets

TSV

Tab-separated values for data processing

JSON

Structured data for programmatic use

Export by Search Type
Search Type Export Content
By Gene Per-gene summary sheet + all function entries (Excel includes both sheets)
Batch Search Combined statistics for all queried genes
Advanced Search All matching function entries from query
Column Definitions

Common columns:

  • protein_name - Protein identifier
  • function_name - Function description
  • function_type - "moonlighting" or "canonical"
  • species - Organism(s)
  • localization - Cellular location(s)
  • switching_mechanism - Condition(s) triggering function switch
  • go_terms - GO Slim term IDs (Gemini-predicted)
  • source_sentences - Supporting evidence from papers
  • pmids - PubMed IDs

Normalized data adds:

  • gene_id - NCBI Gene ID
  • gene_name - PubTator gene symbol
  • uniprot_id - UniProt identifier

Disclaimer

LLM-generated and automatically extracted content

  • Annotations are derived using large language models (LLMs) and automated text-mining of PubMed abstracts and full-text articles.
  • The database may contain errors, omissions, or mis-annotations (including incorrect protein names, functions, species, or PMIDs).
  • MoonLitDB is provided for research and exploratory use only and does not constitute medical, diagnostic, or treatment advice.

Always cross-check critical findings against the original publications (via PMIDs) and authoritative resources such as UniProt, NCBI Gene, and primary literature before drawing conclusions or using the data in downstream analyses.

Citations

PubTator³: Chih-Hsuan Wei, Alexis Allot, Po-Ting Lai, Robert Leaman, Shubo Tian, Ling Luo, Qiao Jin, Zhizheng Wang, Qingyu Chen, Zhiyong Lu, PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge, Nucleic Acids Research, Volume 52, Issue W1, 5 July 2024, Pages W540–W546, https://doi.org/10.1093/nar/gkae235.

AmiGO: Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S, AmiGO Hub, Web Presence Working Group. AmiGO: online access to ontology and annotation data. Bioinformatics. 2009 Jan;25(2):288-289. DOI:10.1093/bioinformatics/btn615