The Problem of Language in Biology and Evolution

In many ways, I think biology’s use of metaphor, analogy, and verbal descriptions of concepts makes it more comprehensible than physical concepts based in abstract math. However, verbal language is susceptible to connotations and preconceived judgments. My hypothesis, shall we say, is that because much of biology is not mathematically based (like physics), many biological concepts are vague, ill-defined, ambiguous and imprecise. I believe this is a problem that should be fixed. Let me explain.

(Note: I know that much of biology is actually based in mathematics. As I discuss later, I am currently making my way through population genetics which is heavily based in mathematics and statistics. I am focusing on the areas that are not mathematically defined.)

Take the concept of force, for example. In (Newtonian) physics, force is mathematically defined as Precise. Verbally, Wikipedia claims it states that “the net force acting upon an object is equal to the rate at which its momentum changes.” Unambiguous.

Compare this to biology. Oftentimes evolutionary concepts that change allele frequencies – mutation, natural selection, drift, and migration – are described as the “forces of evolution.” Force in this context does bear some resemblance to physical force – natural selection “pushes” allele frequencies in a particular “direction,” for example, but does it actually constitute a force? I know this is a hot debate – whether natural selection is a creative force, a filter, etc. and I won’t touch it here. I’ll just say I think the language of natural selection could be a bit better – I found the “selection-for” and “selection-of” debate a bit tiring (but I admit ignorance here). Natural selection’s conception as an analogy to artificial selection makes it easily understandable but I also think it leads to vagaries. Also, does drift constitute a force? I find it hard to conceive sampling error as an active force.

I’m pretty sure I have stumbled across papers discussing the issue of “force” in evolution but I can’t remember what it was. Edit 2 hours before post: I found the article I was thinking of, “Selection, Drift, and the ‘Forces’ of Evolution” by Christopher Stephens (2004). It’s too late to read its 20 pages so I’ll let my point here stand and I will write a brief update in the upcoming days discussing it.

My next two examples will discuss the language of dominance and the language of trait evolution.

I found the following article I discuss in a 2005 post by Razib (am I late to the issue?) about this very subject. In it, Douglas Allchin (2001) argues (successfully, I believe) that biologists should “dissolve the concept of dominance in genetics.” He writes,

The concept [of dominance] is a vestige of history, a frozen accident that may have aided Mendel’s important discovery but is hardly essential as a basic principle of genetics. Moreover, the concept of dominance is ill-framed and often misleading in terms of heredity, natural selection, and molecular and cellular processes. More direct language is available to refer to the key relevant principles in inheritance and the phenotypic expression of genetic states.

I recall learning about dominance and recessiveness in high school and not questioning what those ideas really meant. Perhaps embarrassingly, I still didn’t question them in my first few years of college. Dominance and recessiveness were just facts of genetics. Learning about gene regulation, I may have assumed dominant alleles suppressed recessive alleles. It wasn’t until my genetics course when my professor explained to the class how the smooth (dominant)/rough (recessive) phenotypes actually work – as Allchin (2001) explains:

The wrinkled (versus smooth) trait in pea seeds has now been isolated to a transposon in the exon of a gene for a starch-branching enzyme (SBE1). In one form (“smooth”) the protein acts enzymatically to convert amylose to amylopectin. As a result, starch accumulates in the developing seed. In another genetic form (“wrinkled”) the gene is presumably transcribed, but it is either not translated (due to the sizable DNA insertion) or the resultant protein does not fold into a similar shape. Consequently, the same reaction is not catalyzed. Instead, unpolymerized amylose and sucrose molecules accumulate in the developing seed, which osmotically imbibes considerably more water, producing a temporarily larger seed. When the seed matures and dries, however, the endosperm contracts and the now-enlarged seed coat wrinkles.

While not as convenient as simple dominant/recessive language, this in-depth coverage of pea shape tells a much more accurate and interesting story. I think everyone could do thesmelves some good for just reading all of Allchin’s paper (you can find it here for free) – he gives a history of dominance and how we would be better off ridding the term from genetics. “Dissolving” dominance removes the ambiguity of its loaded language and its imprecision (imprecise language often disguises the fact that we don’t know something) as well as the litany of exceptions to “Mendelian rules” like blood type (codominance for A/B, recessive for O).

Relatedly, I have come to wonder about gene notations. While I never had a problem with the uppercase and lowercase letters, I think they may imply dominance and recessiveness (A and a). I have found p and q or numbered subscripts to be superior in this regard.

I began reading Dan Hartl’s A Primer of Population Genetics (3rd ed, 2000) a few days ago and he provides a magnificent example of how we can use a more precise language for alternative alleles (perhaps unknowingly). Hartl writes,

Consider for example, a 32-base-pair indel found in the human chemokine receptor gene CCR5. This gene encodes a major macrophage coreceptor for HIV-1, which is the causative agent of AIDS. Genotypes that are homozygous for the CCDR5-Δ32 deletion are strongly resistant to infection by HIV-1.

CCDR5-Δ32! Instead of A and a, or + and -, we get + and Δ32! The precision used here is excellent – Δ32 not only avoids dominance language but also tells you specifically what the allele is – a change of 32 bases from the ancestral allele. This is probably the best allele notation I have yet to encounter. Yeah, I’m aware of fly gene names that describe what the mutation does, like eyeless, but it’s not molecularly precise, but not all alleles may be as easy to describe in a limited amount of characters.

I can see there perhaps being problems with purging dominance language from genetics. Genetics becomes more complicated and every allele requires a more elaborate explanation. Ironically, the language actually becomes more complicated as you need several paragraphs to describe what was once just dominant or recessive. However, the “dissolving of dominance” would bring out the molecular interactions which really do provide interesting explanations – I think how Mendel’s smooth and rough peas actually work is absolutely fascinating.

Moving on… (for lack of a better transition)…

A pet peeve of mine is language that implies one species is “higher,” “more advanced,” “more evolved,” or even “primitive” to another. Of course, the one species is usually human. What is particularly infuriating about this is that such language has no place in evolutionary biology – I feel it teaches the exact opposite, in fact.

T Ryan Gregory highlighted this problem just a few days ago, here and here. When we have terms such as “derived” and “ancestral” to describe trait evolution, using the aforementioned terms like “more evolved” is just useless. (“Trait” is another word I have a problem with, by the way. Let’s mention Dawkins’ use of the language of design while we’re at it too.)

This is an example of what I think the problem is – “derived” and “ancestral” are not mathematic descriptions but verbal and prone to interpretations (often egoistic in this case). Force and momentum are not anywhere near as susceptible to cultural biases. (Perhaps the problem of language in biology is related to its similarities to history (which often faces the same issues)?) In light of this, I think biologists should make a concerted effort to construct and/or apply a more precise language.

What does everyone think? Is my comparison of physics and biology invalid? Am I making a mountain out of a molehill? Am I just naive? I would love to hear your thoughts.


Allchin, Douglas. 2002. “Dissolving Dominance.” Pp. 43-61 in Lisa Parker and Rachel Ankeny (eds.), Mutating Concepts, Evolving Disciplines: Genetics, Medicine, and Society. Dordrecht: Kluwer.

Hartl, Daniel. A Primer of Population Genetics, 3rd edition. Sinauer, 2000. ISBN 0-87893-304-2

11 thoughts on “The Problem of Language in Biology and Evolution

  1. My observation, that all academic fields such as Physics, Math, CSci, Statistics, or Biology use confusing nomenclature within to accidentally create a linguistics barrier between everyone involved. Problems in the language are propagated by the less talented teachers (who only repeat such things out of tradition). Being able to explain the field and the language is a rare talent. Basically my point is, Biology (and many of the other sciences) still needs their Feynman (


  2. I don’t think the use of symbols detract from the message. If A = CCDR5-d32, then using A instead of CCDR5-d32 provides the same information without the associated clutter, and definitely enhances readability (at least for me).

    I don’t agree with the loss of precision either. Does using pi instead of 3.14… decrease the precision?


  3. Another source of trouble is that most people — especially biologists — think that they understand evolutionary concepts. In fact, most do not grasp them well at all. Data are very often misinterpreted because of the use of terms like “primitive” or “basal” in the scientific literature when people think they grasp phylogenies but don’t.


  4. It would seem each field contains a “proprietary language” of sorts. In medicine many illnesses are labeled in latin by exactly what they do. For example, Amyotrophic Lateral Sclerosis (Lou Gehrig’s disease) is a beautiful example of both sides of your point. On one side you have the descriptive scientific language, and on the other… you have an awesome baseball player who happened to have the illness. Two names for the same illness, only one actually tells you anything useful.

    In response to Jake’s comment:
    The use of variables wouldn’t detract from the message if they were well defined, had an ample amount to clearly differentiate multiple genes, or used temporarily. However, not only is there not nearly enough variables to account for the millions of available genes, but each variable would have to be fixed to that gene permanently. Pi only works in the manner it does due to being fixed at 3.14.. for hundreds of years. If fixed to a gene temporarily for the sake of the problem (like most maths do), that would suffice…but why not use the real gene name?


  5. I like your thesis statement very much and I totally agree. The language used in Biology lacks precision, but how can you compare a law, such as force, that is defined by the equation above, to a process, such as natural selection? The problem is that you can’t create an equation that defines it.
    And, in response to Rev.Frost’s comment:
    I like how you pointed out that each field has it’s own sort of language. Physics has mathematical equations. Medicine, more or less, has Latin. But what “language” does Biology have?


  6. I agree with a lot of what’s being said here. It seems that in many experts’ zeal for efficiency and parsimony, there is an ocean of jargon that’s developed in any field. While jargon helps researchers inform each other succinctly, it serves to exaggerate the learning curve for the various sciences. The universe simply isn’t simple, so we shouldn’t expect to sufficiently describe it by simple means.


  7. Hi Kele, I’ve been browsing through your blog and it’s really refreshing to see an undergraduate as interested as you are in the nuances of biology. I hope you are planning on graduate school (and hopefully in the sciences!)

    As for the softer edges of biology compared to physics, I agree completely. I am wondering, not having read the specific work you mention above, how exactly dissolving dominance would work? I agree that the nomenclature used as a teaching tool in one-locus two-allele models (A and a, for example) can sometimes be problematic and is definitely a vast oversimplification, but are you suggesting that alleles being dominant, recessive, or somewhere in between isn’t a helpful distinction? I’d love to hear more about the proposed alternatives if you have the time.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s