A protein biodiversity viewer with Deep Learning embeddings

This application will build upon bespokin, however, instead of using AI generated images, we will use real-world proteins. Given the sheer volume of published proteins, we will focus our app and data visualization on a specific pathway with known proteins, the nicotine biosynthesis pathway.

Genetic engineering in plants is very complicated. As an example, nicotine is a very valuable chemical, to both humans and plants. Nicotine is produced in root tissue and transported to the rest of the green tissues above-ground. Why wouldn’t the plant simply synthesize nicotine everywhere, instead of having to go through the trouble of transporting the molecule? There are a couple good answers: The raw materials or precursors to make nicotine are found in greater abundance in the cytosol and vacuoles of root cells; which are designed as storage units for the plants resources. The proteins involved in biosynthesis of nicotine have evolved to properly function in root cells which have different ionomic conditions, aka the proteins wouldn’t be the correct shape (conformation) in the leaves for example and therefore would not work nearly as efficiently.

Therefore, if a genetic engineer wanted to produce nicotine in other cells they would have to increase the presence of the sugars and amino acids which act as building blocks for the relevant proteins and nicotine pre-cursors, as well as redesign the structures of the proteins to behave in their new locations which could take many generations amounting to thousands of years of evolution.

This app will show the protein space around these root-cell nicotine-enzymes of known proteins across the tree of life. We will use (protein)BLAST to find similar proteins according to a multiple sequence alignment. This is a traditional algorithm. So it will not explictly find proteins which have the same functions as the proteins which catalyze reactions to form nicotine.

Alternatively, this pipeline could be modified to only BLAST the functional domains (the parts of the protein which actually interact with chemicals to modify them). Doing so would change the nature of the app, it would allow us to create a tobacco plant which produces something other than nicotine. Our current approach will help engineers to control the ability to make these chemical factories functional in other parts of the plant. Additionally, we could ignore the proteins and focus on the regulatory genomics of the expression levels of proteins and examine similar promoters/enhancers/silencers which surround the physical gene; this would allow us to move the factories (rather than make those factories functional which is what we are interested in).


We will use a similar set of tools as bespokin.com: the primary tools will be Svelte for making the frontend, D3 for the interactive visualizations, and PostgreSQL to store our BLAST results.

Currently in development