PROTEIN→RNA→DNA: INVERTING THE CENTRAL DOGMA
A Novel Synthetic Biology System Based on DRT3 Protein-Templated DNA Synthesis
Enhanced v2.0 — With Expanded Protein Database & Web Visualization Platform
For current setup and usage instructions, see the project README.
EXECUTIVE SUMMARY
The DRT3 bacterial defense system (Deng et al., Science 2026) reveals an unprecedented mechanism where the Drt3b reverse transcriptase synthesizes sequence-specific DNA (poly(AC)) without any nucleic acid template, using instead a network of protein residues (Glu26, Arg253, Tyr650) that directly template base selection through hydrogen bonding, cation-π interactions, and steric gating. This discovery enables, for the first time, the conceptual inversion of the central dogma: protein information can be directly encoded into DNA sequence without transcription or translation intermediates.
This document proposes a complete three-tier architecture—Protein Sensing → RNA Transduction → DNA Writing—that does not exist in current synthetic biology but is buildable from validated component technologies.
v2.0 Enhancements
Version 2.0 introduces major expansions to the simulator platform:
- 22 curated proteins across 10 categories (up from a single p53 target)
- Interactive 3D molecular visualization via NGL Viewer with PDB and AlphaFold DB integration
- DRT3b active site engineering simulator for exploring base selection variants
- Next.js 16 web application with real-time simulation, responsive design, and advanced charting
- Database API integration with RCSB PDB (200,000+ structures) and AlphaFold DB (200M+ predicted models)
- Comprehensive protein annotations including clinical significance, therapeutic targeting, and mutation hotspots
PART 1: THE DRT3 BREAKTHROUGH AND ITS IMPLICATIONS
1.1 Key Findings from the DRT3 Paper
The DRT3 anti-phage system from E. coli comprises: - Drt3a: A class 2 "unknown group" (UG) reverse transcriptase that synthesizes poly(GT) ssDNA using the ACACAC motif of a noncoding RNA (ncRNA) as template—functionally analogous to telomerase. - Drt3b: A class 1 UG reverse transcriptase that synthesizes the complementary poly(AC) strand without any nucleic acid template—a mechanism unprecedented among polymerases. - ncRNA: A ~130 nt RNA with four stem-loops (SL1-4) that wraps around Drt3a, positioning the ACACAC template motif.
The DRT3 complex assembles as a D3-symmetric 6:6:6 hexamer of Drt3a:Drt3b:ncRNA, producing double-stranded poly(GT/AC) DNA up to several kilobases in length.
1.2 The Revolutionary Aspect: Protein-Templated DNA Synthesis
Drt3b achieves template-independent, sequence-specific polymerization through three critical active-site residues:
| Residue | Function | Mechanism |
|---|---|---|
| Glu26 | dA selection | Side chain projects into nucleotide-binding pocket; forms two H-bonds with N6 amine of dATP, discriminating against dGTP/dTTP |
| Arg253 | dC selection / purine discrimination | Guanidinium group forms cation-π interaction with dA; H-bonds to Watson-Crick edge of preceding dC17, distinguishing dC from dT |
| Tyr650 | Priming nucleophile | C-terminal tyrosine hydroxyl initiates covalent protein-DNA linkage; conserved across DRT3 homologs |
| Gly248 | Steric gate | Excludes dG at dC positions through steric clash with N2 exocyclic amine |
| Thr335 | Purine pocket | Maintains register of poly(AC) interactions after translocation |
Critical insight: The nascent cDNA adopts an unusual backbone kink compressing three bases into a space normally occupied by two, enabling extensive protein-DNA base interactions that replace the structural role of a nucleic acid template.
This is fundamentally different from: - Template-dependent polymerases (DNA pol, RNA pol, telomerase): Require base-pairing with template strand. - Template-independent polymerases (TdT, poly(A) polymerase): Add random or homopolymeric sequences without sequence control. - RDE-3 (the only known exception): Adds poly(UG) to RNA but uses an RNA template and produces RNA, not DNA.
1.3 DRT3b Active Site Engineering (v2.0 New)
The native DRT3b produces only poly(AC). To encode arbitrary protein information, the active site must be engineered. Our simulation platform models six enzyme variants with distinct base selection specificities:
| Variant | Mutation | Product Pattern | dA% | dC% | dG% | dT% | Processivity | Fidelity |
|---|---|---|---|---|---|---|---|---|
| Wildtype | — | poly(AC) | 50 | 50 | 0 | 0 | 1000 nt | 99.99% |
| E26Q | Glu26→Gln | poly(GC) | 5 | 50 | 45 | 0 | 600 nt | 99.70% |
| R253K | Arg253→Lys | poly(AT) | 45 | 5 | 0 | 50 | 700 nt | 99.80% |
| G248A | Gly248→Ala | poly(ACG) | 35 | 35 | 25 | 5 | 800 nt | 99.50% |
| T335S | Thr335→Ser | poly(AG) | 50 | 10 | 5 | 35 | 500 nt | 99.00% |
| E26Q+R253K | Double | poly(GT) | 5 | 10 | 45 | 40 | 400 nt | 98.50% |
By creating orthogonal DRT3b variants, multiplexed protein encoding becomes theoretically possible, where each variant writes a unique dinucleotide pattern corresponding to a specific sensed protein.
PART 2: THE PROPOSED SYSTEM — PROTEIN→RNA→DNA
2.1 System Architecture Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROTEIN → RNA → DNA SYSTEM │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ TIER 1: PROTEIN SENSING TIER 2: RNA TRANSDUCTION TIER 3: DNA │
│ (Split-DRT3b Proximity) (Signal Amplification) WRITING │
│ │
│ Target Protein ──► N-DRT3b Hammerhead Ribozyme ──► DRT3b-TdT │
│ │ + Affibody A (self-cleaves upon + Prime Editor │
│ │ (inactive) DRT3b activation) Fusion │
│ │ │ │ │ │
│ │ C-DRT3b Barcode RNA Genomic DNA │
│ │ + Affibody B (protein ID + (permanent │
│ │ (inactive) timestamp + record) │
│ │ │ intensity) │
│ └──────────────► Reconstituted │ │
│ Active DRT3b ──► Toehold Switch │
│ Cascade │
│ (10⁶× amplification) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
2.2 Tier 1: Protein Sensing via Split-DRT3b Proximity Reconstitution
Concept: Split the DRT3b enzyme into N-terminal (residues 1-400) and C-terminal (residues 401-650) fragments at the boundary between the thumb and palm domains. Fuse each fragment to a protein-binding domain (affibody, scFv, or nanobody) targeting different epitopes on the protein of interest. Only when the target protein is present do the fragments come into proximity, reconstituting the active DRT3b enzyme.
Why this is novel: Current protein sensing methods (FRET, BiFC, split luciferase) produce optical or enzymatic readouts. No existing method uses protein proximity to reconstitute a DNA polymerase that writes sequence information. This is a fundamentally new modality.
Design parameters: - Split point: Between the thumb domain (residues ~323-338 β-hairpin) and palm domain. Structural analysis of the DRT3b cryo-EM map (PDB 9Z6Y) suggests residue 380-420 as the optimal split, avoiding secondary structure elements. - Linkers: Flexible (G4S)₃ or (G4S)₄ linkers to allow fragment reconstitution without steric constraints. - Binding domains: - Affibodies (6 kDa, Kd < 10 nM) for intracellular targets - ScFvs (25-30 kDa) for surface proteins or extracellular sensing - Nanobodies (12-15 kDa) for conformation-specific sensing - Orthogonality: Multiple split-DRT3b pairs with distinct split points and binding domains enable multiplexed sensing of many proteins simultaneously.
2.3 Tier 2: RNA Transduction via Ribozyme-Coupled Signal Amplification
Concept: The reconstituted DRT3b triggers a conformational change that activates a self-cleaving hammerhead ribozyme, releasing a barcoded RNA molecule. This barcode RNA then triggers a toehold switch cascade for exponential amplification before DNA writing.
Barcode RNA structure:
5'-ACACAC-[6-nt protein ID]-[8-nt timestamp]-[4-nt intensity]-[6-nt UMI]-3'
- ACACAC: DRT3a recognition motif (enables downstream dsDNA formation)
- Protein ID: Unique 6-nt barcode identifying the sensed protein
- Timestamp: 8-nt sequence encoding detection time (using a rolling clock mechanism)
- Intensity: 4-nt sequence proportional to protein concentration
- UMI: 6-nt unique molecular identifier for quantification
2.4 Tier 3: DNA Writing via Humanized DRT3b-Prime Editor Fusion
Concept: Fuse the DRT3b polymerase domain to a prime editor (nCas9-M-MLV RT) and terminal deoxynucleotidyl transferase (TdT). The DRT3b active site writes the protein-encoded DNA sequence, while the prime editor directs integration to a specific genomic locus, and TdT adds template-independent nucleotides for flexibility.
Fusion architecture:
N-term → nCas9(H840A) → M-MLV RT → DRT3b(palm+fingers) → TdT → C-term
(nickase) (reverse (protein-templated (template-
transcriptase) DNA synthesis) independent
nucleotide addition)
PART 3: EXPANDED PROTEIN DATABASE (v2.0 New)
3.1 Protein Categories
The v2.0 simulator includes 22 curated proteins organized into 10 functional categories:
Tumor Suppressors
- p53 (TP53): Guardian of the genome. Most frequently mutated gene in cancer. Key residues: R175H, R248Q, R273H (mutation hotspots), Ser15, Ser20 (phosphorylation sites). PDB: 1TUP.
- RB1: Cell cycle regulator at G1/S transition. Phosphorylation at Ser780, Ser795 controls activity. PDB: 1AO9.
- PTEN: Dual-specificity phosphatase that antagonizes PI3K/AKT signaling. Catalytic cysteine Cys124. PDB: 1D5R.
Receptor Tyrosine Kinases
- EGFR (HER1): Binds EGF/TGF-alpha. Mutations L858R (activating) and T790M (resistance). Targeted by erlotinib, osimertinib. PDB: 1NQL.
- HER2 (ErbB2): Ligandless RTK, amplified in 20-30% of breast cancers. Targeted by trastuzumab. PDB: 3PP0.
- VEGFR2 (KDR): Primary mediator of VEGF-induced angiogenesis. Targeted by ramucirumab. PDB: 2XIR.
Signaling Kinases
- KRAS: Small GTPase molecular switch downstream of RTKs. G12C, G12D, G12V mutations lock GTP-bound state. PDB: 5USJ.
- BRAF: MAPK cascade kinase. V600E mutation causes constitutive activity in melanoma. PDB: 4MNF.
- AKT1: Central PI3K/AKT/mTOR pathway node. E17K activating mutation. PDB: 3CQW.
- MEK1 (MAP2K1): Dual-specificity kinase upstream of ERK. PDB: 3V0S.
- JAK2: Cytokine receptor-associated kinase. V617F mutation drives myeloproliferative neoplasms. PDB: 3KRR.
Transcription Factors
- MYC: Master transcription regulator binding E-box sequences. Amplified in 50-70% of cancers. PDB: 1NKP.
- HIF1A: Hypoxia response master regulator. Prolyl hydroxylation targets for VHL degradation. PDB: 1LQB.
- STAT3: JAK-activated transcription factor. Constitutively active in many cancers. PDB: 6NUQ.
DNA Repair Proteins
- BRCA1: Homologous recombination repair via E3 ubiquitin ligase activity with BARD1. Synthetic lethality with PARP inhibitors. PDB: 1JNX.
- PARP1: DNA damage sensor and poly(ADP-ribosyl)ator. Target of olaparib, rucaparib. PDB: 4DQY.
Polymerases (DRT3 System)
- DRT3b: Template-independent reverse transcriptase synthesizing poly(AC) DNA. Active site: Glu26 (dA), Arg253 (dC), Tyr650 (priming). PDB: 9Z6Y.
- DRT3a: Template-dependent reverse transcriptase synthesizing poly(GT) from ncRNA ACACAC motif. PDB: 9Z6Y.
Epigenetic Regulators
- DNMT1: Maintenance DNA methyltransferase preserving CpG patterns. Catalytic Cys1226. PDB: 4DA4.
- HDAC1: Class I zinc-dependent histone deacetylase. Targeted by vorinostat. PDB: 4BKX.
Apoptosis Regulators
- BCL2: Anti-apoptotic mitochondrial membrane protein. Sequesters BIM, BID, BAD, BAX. Targeted by venetoclax. PDB: 2XA9.
- BAX: Pro-apoptotic effector that oligomerizes to form MOMP pores. PDB: 1F16.
Immune Checkpoint Proteins
- PD-L1 (CD274): Immune checkpoint ligand suppressing T-cell activation. Overexpressed in many cancers. Targeted by atezolizumab. PDB: 5J8O.
- CTLA4 (CD152): T-cell checkpoint receptor outcompeting CD28 for CD80/CD86. Targeted by ipilimumab. PDB: 1I85.
3.2 Database Integration
The v2.0 platform integrates with three major structural biology databases:
-
RCSB PDB: Experimentally determined structures (X-ray crystallography, cryo-EM, NMR). Over 200,000 entries covering all major protein families. Provides atomic-resolution coordinate files in PDB and mmCIF formats.
-
AlphaFold DB: DeepMind/EBI predicted structures with per-residue confidence scores (pLDDT). Covers over 200 million proteins from UniProt, essentially the entire known proteome. Structures available in PDB format with predicted aligned error (PAE) data.
-
UniProt: Comprehensive protein sequence and annotation database. Provides gene-to-protein mapping, functional annotations, post-translational modifications, and cross-references to PDB and other databases.
PART 4: WEB VISUALIZATION PLATFORM (v2.0 New)
4.1 3D Molecular Viewer
The platform includes a WebGL-based molecular structure viewer powered by NGL Viewer, supporting:
Representations: - Cartoon: Ribbon diagram showing protein secondary structure (alpha helices, beta sheets, loops) - Ball and Stick: Atomic-level visualization with spheres (atoms) and cylinders (bonds) - Spacefill: Van der Waals surface representation - Licorice: Thin bond representation - Surface: Solvent-accessible molecular surface - Backbone: Polypeptide backbone trace
Coloring Schemes: - Chain: Color by protein chain (multi-chain complexes) - Element: Color by atomic element (C=gray, O=red, N=blue, S=yellow) - Residue: Color by amino acid type (hydrophobic, polar, charged) - Secondary Structure: Alpha helices (blue), beta sheets (yellow), loops (white) - pLDDT: AlphaFold confidence score coloring (blue=high, yellow=medium, orange=low) - Charge: Electrostatic surface coloring (red=negative, blue=positive)
4.2 Interactive Simulation Dashboard
The simulation dashboard provides:
- Real-time parameter adjustment: Sliders for concentration, Kd, amplification time, polymerization time; toggles for phosphorylation state
- Tier 1 kinetics: Split-DRT3b reconstitution kinetics with equilibrium line
- Tier 2 amplification: Logarithmic-scale toehold switch amplification kinetics
- Tier 3 DNA visualization: Color-coded DNA sequence display, Monte Carlo repair outcomes, active site interaction details, genomic integration records
4.3 DRT3b Engineering Lab
Interactive tool for exploring active site engineering:
- Variant simulation: Six enzyme variants with distinct base selection properties
- Base preference analysis: Bar charts comparing A/C/G/T incorporation percentages
- Trade-off radar plots: Visualize processivity-fidelity trade-offs for each variant
- Summary tables: Complete variant characteristics for comparison
PART 5: TECHNICAL CHALLENGES & MITIGATION STRATEGIES
5.1 Challenge: DRT3b Active Site Engineering for Arbitrary Sequences
Problem: Native DRT3b produces only poly(AC). Encoding arbitrary protein information requires programmable base selection.
Solution: - Directed evolution: Create DRT3b libraries with randomized Glu26/Arg253/Gly248 residues; select for new base specificities using compartmentalized partnered replication (CPR) or phage display. - Computational design: Use AlphaFold3/RoseTTAFold to model active site mutations and predict base specificity changes. - Expanded genetic alphabet: Incorporate non-natural dNTPs (dZ/dP from Hachimoji DNA) to increase encoding capacity.
5.2 Challenge: Genomic Integration Fidelity
Problem: DRT3b-TdT may add random nucleotides, corrupting the encoded information.
Solution: - Proofreading fusion: Add T7 DNA polymerase exonuclease domain or archaeal proofreading polymerase to correct errors. - Redundant encoding: Use error-correcting codes (Hamming, Reed-Solomon) in barcode design. - UMI-based filtering: Use unique molecular identifiers to distinguish true signals from PCR/sequencing errors.
5.3 Challenge: Cellular Toxicity
Problem: Constitutive DRT3b activity may disrupt DNA repair, replication, or transcription.
Solution: - Inducible expression: Use split-intein systems or chemically inducible promoters (rapamycin, doxycycline). - Compartmentalization: Target DRT3b to specific nuclear bodies or membraneless organelles. - Kill switches: Include iCasp9 or HSV-Tk for rapid elimination of cells with runaway DRT3b activity.
5.4 Challenge: Delivery to Primary Cells and In Vivo
Problem: Lentiviral vectors have packaging limits (~8 kb); the full system may exceed this.
Solution: - Split-vector delivery: Deliver Tier 1 (split-DRT3b) and Tier 2/3 (RNA circuit + PE fusion) on separate vectors. - AAV vectors: Use dual AAV with trans-splicing intein for smaller cargo. - mRNA delivery: Use modified mRNA with optimized UTRs for transient expression. - LNP delivery: Lipid nanoparticles for in vivo liver, muscle, or CNS targeting.
PART 6: COMPARISON WITH EXISTING GENOMIC RECORDING SYSTEMS
| System | Recording Mechanism | Information Type | Temporal Resolution | Readout | Multiplexing |
|---|---|---|---|---|---|
| MEMOIR (Frieda et al., 2017) | CRISPR-Cas9 indels | Cell lineage | Single timepoint | In situ sequencing | Limited |
| GESTALT (Raj et al., 2018) | CRISPR-Cas9 indels | Lineage + editing history | Cumulative | scRNA-seq | Moderate |
| PEAR (Tang et al., 2024) | Prime editing | Lineage + base edits | Cumulative | scRNA-seq | Moderate |
| DNA Typewriter (Choi et al., 2022) | Prime editing + sequential writing | Ordered events | Sequential | Nanopore | High |
| CAMERA (Roquet et al., 2016) | Recombinase + CRISPR | Analog signals | Single timepoint | Sequencing | Low |
| Proposed System | DRT3b protein-templated synthesis | Protein identity + PTM + dynamics | Real-time, continuous | Nanopore | Unlimited |
Key differentiator: All existing systems record DNA sequence changes (indels, base edits). The proposed system records protein state information by translating protein conformation into DNA sequence—a fundamentally new information modality.
PART 7: APPLICATIONS AND IMPLICATIONS
7.1 Basic Science
- Protein history recording: Reconstruct the complete protein expression trajectory of a single cell throughout development, differentiation, or disease progression.
- PTM dynamics: Record kinase activity, phosphorylation states, and signaling pathway activation in real time.
- Synthetic developmental biology: Engineer cells that record their own developmental decisions for later analysis.
7.2 Clinical Applications
- Cancer monitoring: Record tumor protein biomarker dynamics (p53, KRAS, EGFR) in circulating tumor cells.
- Immunotherapy tracking: Record T-cell activation markers (CD69, PD-1, IFN-γ) in CAR-T cells after infusion.
- Neurodegeneration: Record protein aggregation (tau, α-synuclein, huntingtin) in neurons over time.
- Drug response profiling: Record target engagement and pathway modulation in patient-derived cells.
7.3 Synthetic Biology
- Cellular state machines: Build cells that sense protein inputs, compute via RNA logic, and write decisions to DNA.
- Living diagnostics: Engineered probiotics that record gut biomarker proteins and report via fecal DNA sequencing.
- Biomanufacturing: Record protein quality and stress markers during bioproduction for process optimization.
7.4 Philosophical Implications
The central dogma (DNA→RNA→Protein) has been the foundational paradigm of molecular biology for 65 years. The proposed system represents a true inversion: protein information flows backward to DNA, creating a closed information loop:
DNA → RNA → Protein
↑ │
└────────────────┘
(DRT3b-mediated)
This creates, for the first time, a molecular memory system where cellular experiences (protein states) are permanently encoded in the genome and heritable through cell division—essentially a molecular form of Lamarckian inheritance at the protein level.
PART 8: IMPLEMENTATION ROADMAP
Phase 1: DRT3b Humanization & Validation (Months 1-6)
Express functional DRT3b in human cells with codon optimization, nuclear localization, and toxicity assessment.
Phase 2: Split-DRT3b Engineering (Months 4-9)
Create protein-responsive split-DRT3b sensors with systematic split point testing and affibody fusion optimization.
Phase 3: RNA Transduction Circuit (Months 7-12)
Build ribozyme-coupled amplification circuit with hammerhead ribozyme and toehold switch optimization.
Phase 4: Prime Editor Fusion & Genomic Integration (Months 10-18)
Achieve targeted genomic writing with DRT3b-TdT-PE fusion and AAVS1 safe harbor targeting.
Phase 5: System Integration & Applications (Months 16-24)
Demonstrate end-to-end protein→DNA recording with multi-protein sensing and nanopore readout.
REFERENCES
- Deng, P., Lee, H., Armijo, C., Wang, H., & Gao, A. (2026). Protein-templated synthesis of dinucleotide repeat DNA by an antiphage reverse transcriptase. Science, aed1656.
- Anzalone, A.V., et al. (2019). Search-and-replace genome editing without double-strand breaks or donor DNA. Nature, 576, 149-157.
- Chen, P.J., et al. (2021). Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell, 184, 5635-5652.
- Frieda, K.L., et al. (2017). Synthetic recording and in situ readout of lineage information in single cells. Nature, 541, 107-111.
- Roquet, N., et al. (2016). Synthetic recombinase-based state machines in living cells. Science, 353, aad8559.
- Green, A.A., et al. (2014). Toehold switches: De-novo-designed regulators of gene expression. Cell, 159, 925-939.
- Silverman, A.D., et al. (2020). De novo DNA synthesis using polymerase-nucleotide conjugates. Nature Biotechnology, 38, 1451-1458.
- Anzalone, A.V., et al. (2022). Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nature Biotechnology, 40, 731-740.
- Jumper, J., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583-589.
- Varadi, M., et al. (2022). AlphaFold Protein Structure Database: massive structural prediction for biomedical research. Nucleic Acids Research, 50, D439-D444.
Based on: Deng et al., Science 2026 (DRT3 protein-templated DNA synthesis) Enhanced v2.0 with Expanded Protein Database & Web Visualization Platform