Why do journals insist that data ‘are’?

Given the controversy over this grammatical point, I argue that journal style guides should allow both ‘data is’ and ‘data are’.

I was recently directed (via @blefurgy and @deb_lavoy on Twitter) to an old blog post on something that frequently bugs me: the question of whether the word ‘data’ is singular or plural. The post, by Norman Gray, an astronomical data management researcher at Glasgow University, UK, dates from 2005 but I haven’t seen a better one on the topic. Gray argues that:

…the word ‘data’, in english, is a singular mass noun. It is thus a grammatical and stylistic error to use it as a plural.

Plural use is barbaric: amongst other crimes, it is a deliberate archaism, and thus a symptom of bad writing.

Strong stuff.

An alternative view is given by Peter Coles (@telescoper), another astronomer at Cardiff University, UK, who also explains the issue clearly:

For those of you who aren’t up with such things, English nouns can be of two forms: “count” and “non-count” (or “mass”). Count nouns are those that can be enumerated and therefore have both plural and singular forms: one eye, two eyes, etc. Non-count nouns (which is a better term than “mass nouns”) are those which describe something which is not enumerable, such as “furniture” or “cutlery”. Such things can’t be counted and they don’t have a different singular and plural forms. You can have two chairs (count noun) but can’t have two furnitures (non-count noun)…

…Norman Gray asserts that (a) “data” is a non-count noun and that (b) it should therefore be singular.

I tend to look and listen out for instances of ‘data’, and I have very rarely heard someone say ‘the data are’ in natural speech. As Gray says:

The majority of writers who would dutifully pluralise ‘data’ in writing naturally and consistently use it as a mass noun in conversation: they ask how much data an instrument produces, not how many; they talk of how data is archived, not how they are archived; they talk of less data rather than fewer; and they always talk of data with units, saying they have a megabyte of data, or 10 CDs, or three nights, and never saying ‘I have 1000 data’ and expecting to be understood.

You may wonder why this matters at all. Well, practically every scientific paper contains the word ‘data’ somewhere, and all the journals I edit for insist that it is made plural every time. I spend a ridiculous amount of my editing time looking out for instances of ‘the data is’ and similar. And they can’t be found automatically using a macro, either, because the subject and verb can be separated by other words, or the verb may be something else like ‘shows’ or ‘illustrates’ rather than ‘is’. This is ‘mistake’ that a lot of authors make.

So is ‘data is’ really a mistake? Are the journals right to insist on this change?

The argument from etymology

The main argument used for ‘data are’ is that the word is derived from a plural Latin word. Gray dismantles this thoroughly by showing that it never was a simple plural in Latin. It is:

…the neuter plural past participle of the first conjugation verb dare, ‘to give’ (it’s actually also the feminine singular past participle, but that really, really, doesn’t matter).

…there was almost certainly no latin word for the concept that we now identify by the english word ‘data’….

…Put another way, that means that the word ‘data’, as a technical term referring to the ore of observations, which can be painstakingly reduced to extract knowledge, is not a latin word at all. It’s a native english word with a latin past, which means, bluntly, that we get to choose how to use it, and if its meaning changes over time – as it has – then its grammatical analysis can reasonably and properly migrate also.

I find this a convincing argument. It reminds me of the pedants who don’t like split infinitives (‘to boldly go’) because Latin infinitive verbs couldn’t be split, which is pretty irrelevant to how we should treat them in English (see Wikipedia for current views on this issue).

Gray goes on to compare ‘data’ with other similar Latin-derived words, such as ‘agenda’, ‘stamina’, ‘media’ and ‘phenomena’. ‘Stamina’ is at one end of a spectrum: it is never used in the singular (‘stamen’) except in a specialist botanical sense, and it is a singular noun. ‘Phenomena’ is at the other end – the singular ‘phenomenon’ is frequently used and ‘phenomena’ is a plural noun. ‘Agenda’ is almost the same as ‘stamina’ but the singular ‘agendum’ just about makes sense (although ‘agenda item’ would be more usual). ‘Media’ is moving from being a plural of ‘medium’ to being a separate singular noun in its own right. Gray says:

In this spectrum (not ‘spectra’, of course), ‘data’ is clearly located near ‘agenda’.

I would agree with this assessment on the whole, though I disagree with Gray that ‘datum’ is ‘certainly not one of the things that makes up data’. But like ‘agenda item’, a more commonly used term would be ‘data point’.

In fact, there is a technical use of the word ‘datum’, which Gray has dug out: it is a surveying term. But the plural of this usage of ‘datum’ is ‘datums’, not ‘data’.

Peter Coles doesn’t in fact completely agree with the journal publishers’ stipulation that ‘data’ is never singular – rather, he argues that there are contexts in which the plural use makes sense, and others in which singular use is better:

“If I had less data my disk would have more free space on it.” (Non-count)

“If I had fewer data I would not be able to obtain an astrometric solution.” (Count).

I’m fine with this distinction if people want to use it. But why, then, should journals insist that the singular use is incorrect?

A proposal: stop being prescriptive about data

You may or may not agree with Norman Gray (and me) that ‘data are’ is incorrect. But you can surely agree that there is controversy about the issue. The reasons to insist on plural data are hotly contested, to say the least.

So I propose that publishers remove the stipulation in their style guides that ‘data is’ is incorrect and should be changed to ‘data are’. In fact there is no need to be prescriptive on the issue at all: if the author writes ‘data are’, it can stay, but if they write ‘data is’, that can stay too. This would save a not insignificant amount of time for copyeditors, in searching and replacing ‘data is’ and in arguing the point with authors. It would probably save authors some time and annoyance too. And it would also make journals look more modern in this age of terabytes of data.

Who is going to be the first publisher to take a leap into the unknown? You have nothing to lose but your fuddy-duddy reputation.

Your opinions

Grammatical issues like this usually generate more heat than light, so I expect there will be comments on this post. I would particularly like to hear from journal editors who have been involved in discussions about this issue for their style guides, and from authors who have railed against the ‘data are’ rule imposed by a journal. I reserve the right to remove comments that simply rehash old arguments or only say that one or other construction is ‘ugly’ or ‘just wrong’.

About sharmanedit
Owner of Cofactor, a company helping scientists to publish their research.

10 Responses to Why do journals insist that data ‘are’?

  1. Mike Taylor says:

    No argument. Insisting on “data are” is stupid; insisting on “data is” would be pointless.

  2. Jan Andersen says:

    As someone who helps folks prepare their manuscripts, I’ve learned to follow the particular journal’s preferred style guide no matter what I think about this (or any other) issue. The one time I missed a “data is” construction, a peer reviewer actually pointed it out in his/her comments! Those of us in the trenches can’t change this because we must serve the best interests of our clients; it has to come from the journals. I agree with Mike — stupid and pointless!

    • Mike Taylor says:

      “the particular journal’s preferred style guide” — Ugh, my six most hated words, and the cause of more truly pointless busy-work that perhaps any other aspect of academic life.

  3. I don’t think the issue is terribly important, but I’ve written about it anyway (link below). If we tend to use ‘data are’ in writing but ‘data is’ when we speak, it’s because our brains have decided the issue for us. Data is no longer a collection of countable items, but rather more like a quantity of liquid: water is.

    If the issue is controversial, it’s only because the word is in a state of transition. Our brains are steadily being rewired to think of “data” as a singular noun, so it’s only a matter of time. I agree with those who say that common usage doesn’t make something correct. But any usage that goes against the grain of our conceptual grasp of reality is not going to survive.

    “Data” isn’t plural (anymore):

  4. Richi says:

    All the text-books I read as a student had “data is”, until I started reading books from the US. I thought it was just an American thing, like using nouns instead of adjectives (Mexico president, count nouns).
    On another note: I am surprised to read that there are proof readers for journals – I have seen so many stupid grammatical mistakes in journals. Maybe they should forget about “data” and concentrate on more important things.

  5. Sarah Stokes says:

    I wonder if publishers currently insists on data plural because it is simply easier to have a blanket rule? I agree that there are contexts in which the plural use makes sense, and others in which singular use is better, but it would require a more sensitive, alert reading to use the word correctly and time sometimes does not allow for this. It is easier to say in a style guide ‘make it plural’ than to say ‘make it right’!

    I have no particular brief to support publishers in this, though, and often find the required plural clunky.

  6. sharmanedit says:

    Thanks for all the comments.
    Mike: I’m afraid all publishers have their style guides, but it should generally not be the author’s job to comply with all the details in them – that is the copyeditor’s job. I know some journals ask authors to do this, but I bet none of them realistically expect all authors to get it right.
    Kevin: Thanks for the link to your related post. I agree that the word is in transition. The Google ngrams graph of ‘data is’ and ‘data are’ is very interesting, showing how ‘data are’ has decreased since about 1985. And you provide a good further explanation of count and mass nouns, with some nice examples.
    Richi: Interesting, I hadn’t thought of it as a US/UK (or US/elsewhere) difference. Many of the style guides I am given specify US spelling, so at least in journals it isn’t this. And I agree that this isn’t very important – which is why I say style guides should stop being so strict about it.
    Sarah: I don’t think publishers need blanket rules, and they are generally flexible when the sense of the sentence makes following the rule clumsy. I do think, however, that it is difficult to get style guides changed in large companies because changes need to be approved by several people. Having the rule doesn’t save time over letting the author have their own preference, though I agree it would take longer to think about when it is being used deliberately in two different senses. I would think that most articles should consistently use one form or the other, and only a few authors would mean two different things by ‘data is’ and ‘data are’.

  7. sharmanedit says:

    There have been several discussions of this issue online since this post was published:

    • The Wall Street Journal now allow ‘data is’ in certain circumstances
      The Guardian have reiterated their use of data in the singular in response to the WSJ’s post (the post also includes a Storify of tweets on the issue)
    • The Economist responded to the WSJ post, mentioning that they say ‘data are’.
    • Theo Bloom in PLoS blogs quotes the Guardian post to explain her use of ‘data is’.
  8. Dave Konkel says:

    I’m a scientific editor for other folks’ work (mainly grants), and I always still insist on “the data are.” The reason is simple — while the usage is indeed in transition, there are still many reviewers who think that using “data is” shows a lack of good scientific education and may at least subconsciously “ding you”for the usage, while there are few who would have the same reaction to “data are,” even among those who prefer “data is.” Prudence thus dictates that there’s likely more to be gained from using “data are” than “data is.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: