Overview
Moonlighting proteins are single polypeptide chains that perform multiple, unrelated biological functions without changes in amino acid sequence.
MoonLitDB is a database of protein functions extracted from literature related to moonlighting proteins. Each function entry is predicted to be canonical (primary) or moonlighting (secondary) with a confidence score and linked to source sentences in publications. Protein identifiers are normalized using PubTator³. Annotations include species, localization, switching mechanisms and GO Slim terms predicted by Gemini from function descriptions. GO Slim prediction uses Gemini's internal semantic knowledge rather than simple text matching or cosine similarity approaches used by existing tools like Text2Term and BioPortal Annotator, enabling more accurate biological context understanding.
Tutorials
Quick, step-by-step guides for common tasks in MoonLitDB. Click a question to expand.
- Open the home page and choose By Gene.
- Enter a gene symbol or Gene ID.
- Click Search and review the results card and function table.
- If the gene/protein is not present in MoonLitDB, check the cross-database presence/absence summary for other moonlighting protein databases.
- Check the evidence text and PMIDs to validate each moonlighting annotation.
Search Methods
Search for individual genes by NCBI Gene ID or gene symbol (e.g.,
GAPDH, TP53). Filter by function type (all, moonlighting,
canonical), species, and GO Slim confidence threshold.
Results: Gene cards with aggregated functions, wordclouds, GO Slim terms (Gemini-predicted), and detailed function tables.
Query multiple genes simultaneously by entering a list of gene names or IDs. Supports bulk data retrieval with optional species and function type filters.
Results: Combined results for all genes with summary statistics and CSV export options.
Build complex queries using rule-based conditions. Search across function names, types, species, localization, switching mechanisms, notes, source sentences, protein/gene names, IDs, and UniProt IDs. Combine conditions with AND/OR logic.
Results: Individual function entries matching your query criteria.
Data Modes
The home page offers two viewing modes, toggled via buttons at the top:
Normalized Data (Default)
Proteins grouped by NCBI Gene IDs using PubTator³ normalization.
Use when: You want standardized gene identifiers and cross-database links.
Raw Data
Proteins displayed with original names as extracted from literature.
Use when: You need to see exactly how proteins were mentioned in source papers.
Note: Raw data is retained to enable re-normalization with improved gene normalizers and entity recognition tools in the future, as PubTator³ represents the current state-of-the-art.
Export Formats
All search results can be exported using the Download Results button. Click the dropdown arrow to choose your preferred format:
Excel (.xlsx)
Multi-sheet workbook with summary and detailed data
CSV
Comma-separated values for spreadsheets
TSV
Tab-separated values for data processing
JSON
Structured data for programmatic use
Export by Search Type
| Search Type | Export Content |
|---|---|
| By Gene | Per-gene summary sheet + all function entries (Excel includes both sheets) |
| Batch Search | Combined statistics for all queried genes |
| Advanced Search | All matching function entries from query |
Column Definitions
Common columns:
protein_name- Protein identifierfunction_name- Function descriptionfunction_type- "moonlighting" or "canonical"species- Organism(s)localization- Cellular location(s)switching_mechanism- Condition(s) triggering function switchgo_terms- GO Slim term IDs (Gemini-predicted)source_sentences- Supporting evidence from paperspmids- PubMed IDs
Normalized data adds:
gene_id- NCBI Gene IDgene_name- PubTator gene symboluniprot_id- UniProt identifier
Disclaimer
LLM-generated and automatically extracted content
- Annotations are derived using large language models (LLMs) and automated text-mining of PubMed abstracts and full-text articles.
- The database may contain errors, omissions, or mis-annotations (including incorrect protein names, functions, species, or PMIDs).
- MoonLitDB is provided for research and exploratory use only and does not constitute medical, diagnostic, or treatment advice.
Always cross-check critical findings against the original publications (via PMIDs) and authoritative resources such as UniProt, NCBI Gene, and primary literature before drawing conclusions or using the data in downstream analyses.
Citations
PubTator³: Chih-Hsuan Wei, Alexis Allot, Po-Ting Lai, Robert Leaman, Shubo Tian, Ling Luo, Qiao Jin, Zhizheng Wang, Qingyu Chen, Zhiyong Lu, PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge, Nucleic Acids Research, Volume 52, Issue W1, 5 July 2024, Pages W540–W546, https://doi.org/10.1093/nar/gkae235.
AmiGO: Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S, AmiGO Hub, Web Presence Working Group. AmiGO: online access to ontology and annotation data. Bioinformatics. 2009 Jan;25(2):288-289. DOI:10.1093/bioinformatics/btn615