Data is or data are? This is the dilemma. Whether you have to write for business, study, or research, you cannot avoid using this word , and the question is always the same: should you use this term as if it were a plural or a singular? You already Googled it twenty times, but when it’s time to write it down, you are still haunted by doubts. Well, I promise that this is the last time you’re wasting your time with this word. Here's everything you want to know about it.

Etymology

The word “data” comes from Latin. To be precise, it is the plural of datum, which indicated “something that is given”.

Gaius Julius Caesar according to René Goscinny
Gaius Julius Caesar according to René Goscinny

The current English meaning is:

A measurement of something on a scale understood by both the recorder (a person or device) and the reader (another person or device).

Although the singular term “datum” is attested, its use is currently considered obsolete in English. The plural, instead, is very common, especially within academic or technical fields. Etymologically speaking, thus, the word “data” is a plural. It means that the following examples are correct:

Other data are equally intriguing
The data clearly show that developing organisms are less sensitive than adults are to atrazine
The data are really promising.

Current use

So, why is the word “data” used as a singular noun? One has to take into consideration that languages are like living beings: they are born, die and, above all, they are subject to incessant transformation. Therefore, if most English speakers use "data" as a singular… well, it means that it can be used as a singular.

If we check on Google trends (a website that analyses the popularity of top search queries on Google), it is possible to see how  “data is”  is far more popular in terms of day-to-day usage.

Data is VS data are, Google trends
A comparison between “data is” Vs “data are” on Google Searches in the US since 2004. Source

At this point, you might be asking yourself whether the day-to-day use differs from printed English. Surprisingly, we obtained similar results by using our VS operator. Indeed, according to the 250 million sentences coming from reliable English sources selected by Ludwig, the use of “data is” seems to be more widespread than “data are”.

Data is VS data are

If we were to broaden our research to a larger number of printed books in English, taking into account a longer range of time, we would discover that “data is” is gaining popularity,  even if the latter has not surpassed "data are" yet.

Data is VS data are, Google Ngram viewer
A comparison between “data is” Vs “data are” on Google NGram Viewer.

It is therefore clear that, while in day-to-day language "data is" seems to be the most common usage, when it comes to printed sources the situation is slightly different. In formal contexts, the expression "data are" is still preferred, but it is slowly decaying. The survey conducted by means of Ludwig is significant in this regard. The latter takes into account also "less formal"  printed sources, such as daily newspapers - which are more inclined to adopt the day-to-day language - and the results clearly indicate how the use of the expression "data are" is slowly but surely disappearing.

The topic has been treated in an article that appeared in the Wall Street Journal, and The Guardian dwelt on the subject too. The latter also modified its style guide, by highlighting that: Data takes a singular verb (like agenda), though strictly a plural; no one ever uses "agendum" or "datum".

It follows that if you want to use data as a singular noun, it is not wrong. Rather, someone would say that sentences like the following would be the most correct choice:

Your data is mud
The data clearly shows that substituting natural gas for coal will have a substantial greenhouse benefit
This data highlights that the risk of failure is considerable and that fixed retention does not guarantee prolonged stability.

Ludwig’s wrap-up

So, how should you behave? “Data is” or “data are”? The issue is often divisive and even Wikipedia has dedicated a page to the controversy.

doubt

When The Guardian suggested that the usage of "data is" should be considered the correct form, a storm of conflicting opinions broke out on Twitter. Indeed, several people regard the knowledge of Latin as a distinctive trait of their cultural status, preferring thus the expression "data are", because it is etymologically correct. On the other hand, the use of "data is" is seen as more in line with the current usage of English, and as many say, English is not Latin, but a different language with its own rules.

We can conclude that it is a question of personal taste. Our tip is to take into account what you are writing and, above all, for whom. Keep also in mind that, while some prominent journals like The Guardian strongly suggest the use of “data is”, other institutions, such as the British Office for National Statistics, still recommend the use of “data are”. So, the best thing to do is to understand the taste of your potential audience.

A twist for very nerdy people

The Merriam-Webster dictionary has also spoken on the matter, underlining that that the word "data"  may occur in two constructions:

·        As a plural noun (like earnings), taking a plural verb and plural modifiers (such as these, many, a few) but not cardinal numbers, and serving as a referent for plural pronouns (such as they, them);
·         As an abstract mass noun (like information), taking a singular verb and singular modifiers (such as this, much, little), and being referred to by a singular pronoun (it).
we should totally stab Caesar
We should totally just stab Caesar - or maybe not?

Indeed, as mentioned above, the word “data” derives from the Latin datum, and the latter was a neuter term (in Latin the nouns could be masculine, feminine, or neuter). As attested for several Indo-European languages, the plurals of neuter terms were often employed as abstract mass nouns, which could be used as subjects of singular verbs. This use was common in Ancient Greek and, although not so widespread, it is also attested in Latin sources.  After all, perhaps, the ancient Romans would not have turned up their noses at "data is".