PROTEIN→RNA→DNA: INVERTING THE CENTRAL DOGMA

A Novel Synthetic Biology System Based on DRT3 Protein-Templated DNA Synthesis

Enhanced v2.0 — With Expanded Protein Database & Web Visualization Platform

For current setup and usage instructions, see the project README.

EXECUTIVE SUMMARY

The DRT3 bacterial defense system (Deng et al., Science 2026) reveals an unprecedented mechanism where the Drt3b reverse transcriptase synthesizes sequence-specific DNA (poly(AC)) without any nucleic acid template, using instead a network of protein residues (Glu26, Arg253, Tyr650) that directly template base selection through hydrogen bonding, cation-π interactions, and steric gating. This discovery enables, for the first time, the conceptual inversion of the central dogma: protein information can be directly encoded into DNA sequence without transcription or translation intermediates.

This document proposes a complete three-tier architecture—Protein Sensing → RNA Transduction → DNA Writing—that does not exist in current synthetic biology but is buildable from validated component technologies.

v2.0 Enhancements

Version 2.0 introduces major expansions to the simulator platform:

22 curated proteins across 10 categories (up from a single p53 target)
Interactive 3D molecular visualization via NGL Viewer with PDB and AlphaFold DB integration
DRT3b active site engineering simulator for exploring base selection variants
Next.js 16 web application with real-time simulation, responsive design, and advanced charting
Database API integration with RCSB PDB (200,000+ structures) and AlphaFold DB (200M+ predicted models)
Comprehensive protein annotations including clinical significance, therapeutic targeting, and mutation hotspots

PART 1: THE DRT3 BREAKTHROUGH AND ITS IMPLICATIONS

1.1 Key Findings from the DRT3 Paper

The DRT3 anti-phage system from E. coli comprises: - Drt3a: A class 2 "unknown group" (UG) reverse transcriptase that synthesizes poly(GT) ssDNA using the ACACAC motif of a noncoding RNA (ncRNA) as template—functionally analogous to telomerase. - Drt3b: A class 1 UG reverse transcriptase that synthesizes the complementary poly(AC) strand without any nucleic acid template—a mechanism unprecedented among polymerases. - ncRNA: A ~130 nt RNA with four stem-loops (SL1-4) that wraps around Drt3a, positioning the ACACAC template motif.

The DRT3 complex assembles as a D3-symmetric 6:6:6 hexamer of Drt3a:Drt3b:ncRNA, producing double-stranded poly(GT/AC) DNA up to several kilobases in length.

1.2 The Revolutionary Aspect: Protein-Templated DNA Synthesis

Drt3b achieves template-independent, sequence-specific polymerization through three critical active-site residues:

Residue	Function	Mechanism
Glu26	dA selection	Side chain projects into nucleotide-binding pocket; forms two H-bonds with N6 amine of dATP, discriminating against dGTP/dTTP
Arg253	dC selection / purine discrimination	Guanidinium group forms cation-π interaction with dA; H-bonds to Watson-Crick edge of preceding dC17, distinguishing dC from dT
Tyr650	Priming nucleophile	C-terminal tyrosine hydroxyl initiates covalent protein-DNA linkage; conserved across DRT3 homologs
Gly248	Steric gate	Excludes dG at dC positions through steric clash with N2 exocyclic amine
Thr335	Purine pocket	Maintains register of poly(AC) interactions after translocation

Critical insight: The nascent cDNA adopts an unusual backbone kink compressing three bases into a space normally occupied by two, enabling extensive protein-DNA base interactions that replace the structural role of a nucleic acid template.

This is fundamentally different from: - Template-dependent polymerases (DNA pol, RNA pol, telomerase): Require base-pairing with template strand. - Template-independent polymerases (TdT, poly(A) polymerase): Add random or homopolymeric sequences without sequence control. - RDE-3 (the only known exception): Adds poly(UG) to RNA but uses an RNA template and produces RNA, not DNA.

1.3 DRT3b Active Site Engineering (v2.0 New)

The native DRT3b produces only poly(AC). To encode arbitrary protein information, the active site must be engineered. Our simulation platform models six enzyme variants with distinct base selection specificities:

Variant	Mutation	Product Pattern	dA%	dC%	dG%	dT%	Processivity	Fidelity
Wildtype	—	poly(AC)	50	50	0	0	1000 nt	99.99%
E26Q	Glu26→Gln	poly(GC)	5	50	45	0	600 nt	99.70%
R253K	Arg253→Lys	poly(AT)	45	5	0	50	700 nt	99.80%
G248A	Gly248→Ala	poly(ACG)	35	35	25	5	800 nt	99.50%
T335S	Thr335→Ser	poly(AG)	50	10	5	35	500 nt	99.00%
E26Q+R253K	Double	poly(GT)	5	10	45	40	400 nt	98.50%

By creating orthogonal DRT3b variants, multiplexed protein encoding becomes theoretically possible, where each variant writes a unique dinucleotide pattern corresponding to a specific sensed protein.

PART 2: THE PROPOSED SYSTEM — PROTEIN→RNA→DNA

2.1 System Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PROTEIN → RNA → DNA SYSTEM                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  TIER 1: PROTEIN SENSING          TIER 2: RNA TRANSDUCTION    TIER 3: DNA   │
│  (Split-DRT3b Proximity)          (Signal Amplification)      WRITING       │
│                                                                             │
│  Target Protein ──► N-DRT3b       Hammerhead Ribozyme ──►    DRT3b-TdT      │
│       │              + Affibody A   (self-cleaves upon       + Prime Editor │
│       │              (inactive)      DRT3b activation)       Fusion         │
│       │                   │                    │                │           │
│       │              C-DRT3b            Barcode RNA         Genomic DNA     │
│       │              + Affibody B       (protein ID +       (permanent      │
│       │              (inactive)          timestamp +          record)       │
│       │                   │              intensity)                         │
│       └──────────────► Reconstituted         │                              │
│                         Active DRT3b  ──► Toehold Switch                    │
│                                           Cascade                           │
│                                           (10⁶× amplification)              │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

2.2 Tier 1: Protein Sensing via Split-DRT3b Proximity Reconstitution

Concept: Split the DRT3b enzyme into N-terminal (residues 1-400) and C-terminal (residues 401-650) fragments at the boundary between the thumb and palm domains. Fuse each fragment to a protein-binding domain (affibody, scFv, or nanobody) targeting different epitopes on the protein of interest. Only when the target protein is present do the fragments come into proximity, reconstituting the active DRT3b enzyme.

Why this is novel: Current protein sensing methods (FRET, BiFC, split luciferase) produce optical or enzymatic readouts. No existing method uses protein proximity to reconstitute a DNA polymerase that writes sequence information. This is a fundamentally new modality.

Design parameters: - Split point: Between the thumb domain (residues ~323-338 β-hairpin) and palm domain. Structural analysis of the DRT3b cryo-EM map (PDB 9Z6Y) suggests residue 380-420 as the optimal split, avoiding secondary structure elements. - Linkers: Flexible (G4S)₃ or (G4S)₄ linkers to allow fragment reconstitution without steric constraints. - Binding domains: - Affibodies (6 kDa, Kd < 10 nM) for intracellular targets - ScFvs (25-30 kDa) for surface proteins or extracellular sensing - Nanobodies (12-15 kDa) for conformation-specific sensing - Orthogonality: Multiple split-DRT3b pairs with distinct split points and binding domains enable multiplexed sensing of many proteins simultaneously.

2.3 Tier 2: RNA Transduction via Ribozyme-Coupled Signal Amplification

Concept: The reconstituted DRT3b triggers a conformational change that activates a self-cleaving hammerhead ribozyme, releasing a barcoded RNA molecule. This barcode RNA then triggers a toehold switch cascade for exponential amplification before DNA writing.

Barcode RNA structure:

5'-ACACAC-[6-nt protein ID]-[8-nt timestamp]-[4-nt intensity]-[6-nt UMI]-3'

ACACAC: DRT3a recognition motif (enables downstream dsDNA formation)
Protein ID: Unique 6-nt barcode identifying the sensed protein
Timestamp: 8-nt sequence encoding detection time (using a rolling clock mechanism)
Intensity: 4-nt sequence proportional to protein concentration
UMI: 6-nt unique molecular identifier for quantification

2.4 Tier 3: DNA Writing via Humanized DRT3b-Prime Editor Fusion

Concept: Fuse the DRT3b polymerase domain to a prime editor (nCas9-M-MLV RT) and terminal deoxynucleotidyl transferase (TdT). The DRT3b active site writes the protein-encoded DNA sequence, while the prime editor directs integration to a specific genomic locus, and TdT adds template-independent nucleotides for flexibility.

Fusion architecture:

N-term → nCas9(H840A) → M-MLV RT → DRT3b(palm+fingers) → TdT → C-term
         (nickase)       (reverse        (protein-templated       (template-
                         transcriptase)  DNA synthesis)          independent
                                                          nucleotide addition)

PART 3: EXPANDED PROTEIN DATABASE (v2.0 New)

3.1 Protein Categories

The v2.0 simulator includes 22 curated proteins organized into 10 functional categories:

Tumor Suppressors

p53 (TP53): Guardian of the genome. Most frequently mutated gene in cancer. Key residues: R175H, R248Q, R273H (mutation hotspots), Ser15, Ser20 (phosphorylation sites). PDB: 1TUP.
RB1: Cell cycle regulator at G1/S transition. Phosphorylation at Ser780, Ser795 controls activity. PDB: 1AO9.
PTEN: Dual-specificity phosphatase that antagonizes PI3K/AKT signaling. Catalytic cysteine Cys124. PDB: 1D5R.

Receptor Tyrosine Kinases

EGFR (HER1): Binds EGF/TGF-alpha. Mutations L858R (activating) and T790M (resistance). Targeted by erlotinib, osimertinib. PDB: 1NQL.
HER2 (ErbB2): Ligandless RTK, amplified in 20-30% of breast cancers. Targeted by trastuzumab. PDB: 3PP0.
VEGFR2 (KDR): Primary mediator of VEGF-induced angiogenesis. Targeted by ramucirumab. PDB: 2XIR.

Signaling Kinases

KRAS: Small GTPase molecular switch downstream of RTKs. G12C, G12D, G12V mutations lock GTP-bound state. PDB: 5USJ.
BRAF: MAPK cascade kinase. V600E mutation causes constitutive activity in melanoma. PDB: 4MNF.
AKT1: Central PI3K/AKT/mTOR pathway node. E17K activating mutation. PDB: 3CQW.
MEK1 (MAP2K1): Dual-specificity kinase upstream of ERK. PDB: 3V0S.
JAK2: Cytokine receptor-associated kinase. V617F mutation drives myeloproliferative neoplasms. PDB: 3KRR.

Transcription Factors

MYC: Master transcription regulator binding E-box sequences. Amplified in 50-70% of cancers. PDB: 1NKP.
HIF1A: Hypoxia response master regulator. Prolyl hydroxylation targets for VHL degradation. PDB: 1LQB.
STAT3: JAK-activated transcription factor. Constitutively active in many cancers. PDB: 6NUQ.

DNA Repair Proteins

BRCA1: Homologous recombination repair via E3 ubiquitin ligase activity with BARD1. Synthetic lethality with PARP inhibitors. PDB: 1JNX.
PARP1: DNA damage sensor and poly(ADP-ribosyl)ator. Target of olaparib, rucaparib. PDB: 4DQY.

Polymerases (DRT3 System)

DRT3b: Template-independent reverse transcriptase synthesizing poly(AC) DNA. Active site: Glu26 (dA), Arg253 (dC), Tyr650 (priming). PDB: 9Z6Y.
DRT3a: Template-dependent reverse transcriptase synthesizing poly(GT) from ncRNA ACACAC motif. PDB: 9Z6Y.

Epigenetic Regulators

DNMT1: Maintenance DNA methyltransferase preserving CpG patterns. Catalytic Cys1226. PDB: 4DA4.
HDAC1: Class I zinc-dependent histone deacetylase. Targeted by vorinostat. PDB: 4BKX.

Apoptosis Regulators

BCL2: Anti-apoptotic mitochondrial membrane protein. Sequesters BIM, BID, BAD, BAX. Targeted by venetoclax. PDB: 2XA9.
BAX: Pro-apoptotic effector that oligomerizes to form MOMP pores. PDB: 1F16.

Immune Checkpoint Proteins

PD-L1 (CD274): Immune checkpoint ligand suppressing T-cell activation. Overexpressed in many cancers. Targeted by atezolizumab. PDB: 5J8O.
CTLA4 (CD152): T-cell checkpoint receptor outcompeting CD28 for CD80/CD86. Targeted by ipilimumab. PDB: 1I85.

3.2 Database Integration

The v2.0 platform integrates with three major structural biology databases:

RCSB PDB: Experimentally determined structures (X-ray crystallography, cryo-EM, NMR). Over 200,000 entries covering all major protein families. Provides atomic-resolution coordinate files in PDB and mmCIF formats.
AlphaFold DB: DeepMind/EBI predicted structures with per-residue confidence scores (pLDDT). Covers over 200 million proteins from UniProt, essentially the entire known proteome. Structures available in PDB format with predicted aligned error (PAE) data.
UniProt: Comprehensive protein sequence and annotation database. Provides gene-to-protein mapping, functional annotations, post-translational modifications, and cross-references to PDB and other databases.

PART 4: WEB VISUALIZATION PLATFORM (v2.0 New)

4.1 3D Molecular Viewer

The platform includes a WebGL-based molecular structure viewer powered by NGL Viewer, supporting:

Representations: - Cartoon: Ribbon diagram showing protein secondary structure (alpha helices, beta sheets, loops) - Ball and Stick: Atomic-level visualization with spheres (atoms) and cylinders (bonds) - Spacefill: Van der Waals surface representation - Licorice: Thin bond representation - Surface: Solvent-accessible molecular surface - Backbone: Polypeptide backbone trace

Coloring Schemes: - Chain: Color by protein chain (multi-chain complexes) - Element: Color by atomic element (C=gray, O=red, N=blue, S=yellow) - Residue: Color by amino acid type (hydrophobic, polar, charged) - Secondary Structure: Alpha helices (blue), beta sheets (yellow), loops (white) - pLDDT: AlphaFold confidence score coloring (blue=high, yellow=medium, orange=low) - Charge: Electrostatic surface coloring (red=negative, blue=positive)

4.2 Interactive Simulation Dashboard

The simulation dashboard provides:

Real-time parameter adjustment: Sliders for concentration, Kd, amplification time, polymerization time; toggles for phosphorylation state
Tier 1 kinetics: Split-DRT3b reconstitution kinetics with equilibrium line
Tier 2 amplification: Logarithmic-scale toehold switch amplification kinetics
Tier 3 DNA visualization: Color-coded DNA sequence display, Monte Carlo repair outcomes, active site interaction details, genomic integration records

4.3 DRT3b Engineering Lab

Interactive tool for exploring active site engineering:

Variant simulation: Six enzyme variants with distinct base selection properties
Base preference analysis: Bar charts comparing A/C/G/T incorporation percentages
Trade-off radar plots: Visualize processivity-fidelity trade-offs for each variant
Summary tables: Complete variant characteristics for comparison

PART 5: TECHNICAL CHALLENGES & MITIGATION STRATEGIES

5.1 Challenge: DRT3b Active Site Engineering for Arbitrary Sequences

Problem: Native DRT3b produces only poly(AC). Encoding arbitrary protein information requires programmable base selection.

Solution: - Directed evolution: Create DRT3b libraries with randomized Glu26/Arg253/Gly248 residues; select for new base specificities using compartmentalized partnered replication (CPR) or phage display. - Computational design: Use AlphaFold3/RoseTTAFold to model active site mutations and predict base specificity changes. - Expanded genetic alphabet: Incorporate non-natural dNTPs (dZ/dP from Hachimoji DNA) to increase encoding capacity.

5.2 Challenge: Genomic Integration Fidelity

Problem: DRT3b-TdT may add random nucleotides, corrupting the encoded information.

Solution: - Proofreading fusion: Add T7 DNA polymerase exonuclease domain or archaeal proofreading polymerase to correct errors. - Redundant encoding: Use error-correcting codes (Hamming, Reed-Solomon) in barcode design. - UMI-based filtering: Use unique molecular identifiers to distinguish true signals from PCR/sequencing errors.

5.3 Challenge: Cellular Toxicity

Problem: Constitutive DRT3b activity may disrupt DNA repair, replication, or transcription.

Solution: - Inducible expression: Use split-intein systems or chemically inducible promoters (rapamycin, doxycycline). - Compartmentalization: Target DRT3b to specific nuclear bodies or membraneless organelles. - Kill switches: Include iCasp9 or HSV-Tk for rapid elimination of cells with runaway DRT3b activity.

5.4 Challenge: Delivery to Primary Cells and In Vivo

Problem: Lentiviral vectors have packaging limits (~8 kb); the full system may exceed this.

Solution: - Split-vector delivery: Deliver Tier 1 (split-DRT3b) and Tier 2/3 (RNA circuit + PE fusion) on separate vectors. - AAV vectors: Use dual AAV with trans-splicing intein for smaller cargo. - mRNA delivery: Use modified mRNA with optimized UTRs for transient expression. - LNP delivery: Lipid nanoparticles for in vivo liver, muscle, or CNS targeting.

PART 6: COMPARISON WITH EXISTING GENOMIC RECORDING SYSTEMS

System	Recording Mechanism	Information Type	Temporal Resolution	Readout	Multiplexing
MEMOIR (Frieda et al., 2017)	CRISPR-Cas9 indels	Cell lineage	Single timepoint	In situ sequencing	Limited
GESTALT (Raj et al., 2018)	CRISPR-Cas9 indels	Lineage + editing history	Cumulative	scRNA-seq	Moderate
PEAR (Tang et al., 2024)	Prime editing	Lineage + base edits	Cumulative	scRNA-seq	Moderate
DNA Typewriter (Choi et al., 2022)	Prime editing + sequential writing	Ordered events	Sequential	Nanopore	High
CAMERA (Roquet et al., 2016)	Recombinase + CRISPR	Analog signals	Single timepoint	Sequencing	Low
Proposed System	DRT3b protein-templated synthesis	Protein identity + PTM + dynamics	Real-time, continuous	Nanopore	Unlimited

Key differentiator: All existing systems record DNA sequence changes (indels, base edits). The proposed system records protein state information by translating protein conformation into DNA sequence—a fundamentally new information modality.

PART 7: APPLICATIONS AND IMPLICATIONS

7.1 Basic Science

Protein history recording: Reconstruct the complete protein expression trajectory of a single cell throughout development, differentiation, or disease progression.
PTM dynamics: Record kinase activity, phosphorylation states, and signaling pathway activation in real time.
Synthetic developmental biology: Engineer cells that record their own developmental decisions for later analysis.

7.2 Clinical Applications

Cancer monitoring: Record tumor protein biomarker dynamics (p53, KRAS, EGFR) in circulating tumor cells.
Immunotherapy tracking: Record T-cell activation markers (CD69, PD-1, IFN-γ) in CAR-T cells after infusion.
Neurodegeneration: Record protein aggregation (tau, α-synuclein, huntingtin) in neurons over time.
Drug response profiling: Record target engagement and pathway modulation in patient-derived cells.

7.3 Synthetic Biology

Cellular state machines: Build cells that sense protein inputs, compute via RNA logic, and write decisions to DNA.
Living diagnostics: Engineered probiotics that record gut biomarker proteins and report via fecal DNA sequencing.
Biomanufacturing: Record protein quality and stress markers during bioproduction for process optimization.

7.4 Philosophical Implications

The central dogma (DNA→RNA→Protein) has been the foundational paradigm of molecular biology for 65 years. The proposed system represents a true inversion: protein information flows backward to DNA, creating a closed information loop:

        DNA → RNA → Protein
         ↑                │
         └────────────────┘
              (DRT3b-mediated)

This creates, for the first time, a molecular memory system where cellular experiences (protein states) are permanently encoded in the genome and heritable through cell division—essentially a molecular form of Lamarckian inheritance at the protein level.

PART 8: IMPLEMENTATION ROADMAP

Phase 1: DRT3b Humanization & Validation (Months 1-6)

Express functional DRT3b in human cells with codon optimization, nuclear localization, and toxicity assessment.

Phase 2: Split-DRT3b Engineering (Months 4-9)

Create protein-responsive split-DRT3b sensors with systematic split point testing and affibody fusion optimization.

Phase 3: RNA Transduction Circuit (Months 7-12)

Build ribozyme-coupled amplification circuit with hammerhead ribozyme and toehold switch optimization.

Phase 4: Prime Editor Fusion & Genomic Integration (Months 10-18)

Achieve targeted genomic writing with DRT3b-TdT-PE fusion and AAVS1 safe harbor targeting.

Phase 5: System Integration & Applications (Months 16-24)

Demonstrate end-to-end protein→DNA recording with multi-protein sensing and nanopore readout.

REFERENCES

Deng, P., Lee, H., Armijo, C., Wang, H., & Gao, A. (2026). Protein-templated synthesis of dinucleotide repeat DNA by an antiphage reverse transcriptase. Science, aed1656.
Anzalone, A.V., et al. (2019). Search-and-replace genome editing without double-strand breaks or donor DNA. Nature, 576, 149-157.
Chen, P.J., et al. (2021). Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell, 184, 5635-5652.
Frieda, K.L., et al. (2017). Synthetic recording and in situ readout of lineage information in single cells. Nature, 541, 107-111.
Roquet, N., et al. (2016). Synthetic recombinase-based state machines in living cells. Science, 353, aad8559.
Green, A.A., et al. (2014). Toehold switches: De-novo-designed regulators of gene expression. Cell, 159, 925-939.
Silverman, A.D., et al. (2020). De novo DNA synthesis using polymerase-nucleotide conjugates. Nature Biotechnology, 38, 1451-1458.
Anzalone, A.V., et al. (2022). Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nature Biotechnology, 40, 731-740.
Jumper, J., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583-589.
Varadi, M., et al. (2022). AlphaFold Protein Structure Database: massive structural prediction for biomedical research. Nucleic Acids Research, 50, D439-D444.

Based on: Deng et al., Science 2026 (DRT3 protein-templated DNA synthesis) Enhanced v2.0 with Expanded Protein Database & Web Visualization Platform