How can AI models read biological data like human language?

Large language models are sequence-processing engines. Biological data — gene expression profiles, protein structures, epigenetic marks — can be converted into structured sequences that LLMs can ingest and learn from. Pioneering research like Google's Cell2Sentence demonstrated this approach. NovaGenAI builds custom models trained on proprietary biological datasets to solve specific enterprise problems.

What is Cell2Sentence and how does NovaGenAI use it?

Cell2Sentence is Google's research demonstrating that multi-dimensional biological data can be converted into structured sequences for large language model processing. NovaGenAI builds on these foundations by training custom models on proprietary biological datasets from stem cell laboratories and biobanks to solve specific enterprise problems in drug discovery and personalised medicine.

How does AI accelerate drug discovery?

AI models that have learned the language of cellular biology can predict how cells respond to novel compounds without running physical experiments. They can screen millions of drug candidates in hours rather than months, identify off-target effects, and enter the laboratory with higher-confidence candidates — potentially saving hundreds of millions of dollars per successful drug.

What biological data types can NovaGenAI's models process?

NovaGenAI's models process multiple omics layers: transcriptomics (gene expression from single-cell RNA sequencing), genomics (DNA sequences and structural variants), epigenomics (chemical modifications regulating gene activity), proteomics (protein expression), and metabolomics (small molecule profiles). Multi-omics integration enables the models to predict not just what a cell is doing, but why.

Is NovaGenAI's AI a diagnostic tool?

No. NovaGenAI's models are research platforms — new ways of representing and reasoning about biological data that accelerate discovery and improve prediction accuracy. They are not diagnostic tools, clinical decision-makers, or replacements for clinical trials and regulatory approval. Their outputs are hypotheses to be tested, not conclusions to act on without validation.

When Biology Becomes Code: Custom AI for Biotech

Biology has a language. It's written in nucleotide sequences, expressed through protein structures, and manifested in the transcriptomic profile of every living cell. At NovaGenAI, we build custom AI models that make biology machine-readable.

AI Reading Biology Like Language

Large language models are sequence-processing engines. They ingest ordered tokens, learn statistical relationships, and generate predictions. This architecture was designed for human language — but biology can be expressed as meaningful sequences too.

Pioneering research like Google's Cell2Sentence demonstrated that multi-dimensional biological data can be converted into structured sequences that LLMs can ingest and reason about. We're building on these foundations — training custom models on proprietary biological datasets to solve specific enterprise problems.

Computational DNA analysis visualization — AI models learning to read the language of cellular biology

"A model that understands the full multi-omics landscape can tell you not just what a cell is doing — but why."

Multi-Omics: The Full Picture

Our models integrate multiple biological data layers: transcriptomics (gene expression), genomics (DNA sequences), epigenomics (gene regulation), proteomics (protein expression), and metabolomics (small molecule profiles).

The relationships between these layers — how epigenetic state influences transcription, how transcription drives protein expression — are precisely the kind of complex dependencies that transformer architectures excel at modelling.

In-silico biological modelling — In-silico modelling accelerates drug discovery by orders of magnitude

Why Stem Cells Are Ideal Training Data

Stem cells are uniquely information-rich. They exist in dynamic states — proliferating, differentiating, responding to signals, making fate decisions. A single culture can produce thousands of distinct cellular states, creating an extraordinarily rich training corpus.

Where a static tissue sample gives you a snapshot, stem cell data gives you narratives — trajectories of cellular change the model can learn to predict.

Computational biotech: where AI meets the building blocks of life

Accelerating Drug Discovery

The pharmaceutical industry spends $2.6 billion and 10–15 years per drug. Most of that cost is failure. Our models attack the failure rate at its root.

A model that has learned cellular biology can predict drug responses without physical experiments — screening millions of candidates in hours, identifying off-target effects, and entering the lab with higher-confidence leads. Even a modest reduction in preclinical failure rates means hundreds of millions in savings.

Stem cell laboratory research — From wet lab to AI lab — accelerating the path from hypothesis to validation

"We're building at the frontier of two fields simultaneously — and that's exactly where the most important work happens."

What We're Not Claiming

Our models are not diagnostic tools. They don't make clinical decisions. They're research platforms — new ways of representing and reasoning about biological data. Their outputs are hypotheses to be tested, not conclusions to act on without validation.

The future of computational biology — The convergence of AI and biology is just beginning

What's Next

Over the next 12 months, we will scale training to larger multi-omics datasets, publish benchmark results, deploy inference to partner laboratories, and extend the framework to additional biological domains. The future is computational — and it's closer than you think.

Don Calaki

CEO & Founder, NovaGenAI

Don leads NovaGenAI at the intersection of computational biology and enterprise AI. With operations spanning Malaysia, Singapore, and Australia, he's building a company that makes biological data machine-readable.

Frequently Asked Questions

Biological data — gene expression, protein structures, epigenetic marks — can be converted into structured sequences that LLMs ingest and learn from. Google's Cell2Sentence research demonstrated this. NovaGenAI builds custom models on proprietary biological datasets.

AI models predict how cells respond to novel compounds without physical experiments, screening millions of candidates in hours and identifying off-target effects — potentially saving hundreds of millions per successful drug.

Multiple omics layers: transcriptomics, genomics, epigenomics, proteomics, and metabolomics. Multi-omics integration predicts not just what a cell is doing, but why.

No. Our models are research platforms that accelerate discovery. They are not diagnostic tools or replacements for clinical trials and regulatory approval.

When Biology Becomes Code