How to query gender in a multiple-choice poll/survey?

I have previously seen this done in the following simple way:

  • Female
  • Male
  • Other (fill in blank) ________
  • Decline to Answer

This makes the common case (Male/Female) easy to code, allows unbounded expression for people who don't fall into that category, and also allows people to opt out of providing the information.

Your IRB, of course, will need to review this just as they do with all other parts of your survey.


One option is not to have checkboxes at all. Just give people a single free-text field, let them write whatever describes their gender best, and then use manual or automated methods to code those responses into categories. Optionally include a prompt along the lines of "e.g. male, female, genderqueer" to clarify that non-binary answers are welcome.

Advantages

This approach guarantees that everybody can give the answer that fits them best, and it's unlikely to offend anybody except those who resent being asked for their gender at all. You don't have to work through a long list of gender variations trying to anticipate every option that somebody might want, and it puts everybody on an even footing without the problem of "othering" (see below).

Disadvantages

It does require a bit more work on the experimenter's side since you need to set up a coding system. But this is less than it might seem, because again you don't need to decide in advance for every possible response that somebody might give; you only need to determine rules for the answers that people actually gave. Mostly likely you will end up with 90%+ that can unambiguously be classified as either "male" or "female", and a small percentage of non-binary responses.

Some people may use a free-text field to write nonsense answers, but this is a good thing; these people are likely to give bad answers to checkbox questions too, so filtering them out of the results is a Good Thing.

Analysing non-binary responses

In theory, you might then end up with some complicated decisions about which non-binary responses should be categorised together for data analysis. In practice, this question will probably be largely answered by issues of sample size. Unless you're running an exceptionally large survey or targeting populations with an unusually high percentage of non-binary people, you probably won't have enough non-binary responses to get meaningful findings for any categorisation finer than "non-binary", if even that.

This doesn't mean that you shouldn't try to make the design respectful of non-binary people; it just means that your main reason for doing so is "treating people with respect", rather than data collection per se.

Side points

Stealing some good points from other people's answers, it's worth considering the purpose for which this data will be used, and to check your institution's policies on human subjects research for guidance.

What is "othering"?

For those not familiar with the concept of othering: many classification systems emphasise the characteristics of the people who designed them, or those considered more important, with "other" used as a catch-all for many different things whose only common characteristic might be that they're unfamiliar/unimportant to the designer of the classification. This excellent series from Robot Hugs discusses the issue in detail, giving a couple of examples from the Dewey Decimal System: its religion section has seven two-digit categories for various topics from Christianity, whereas all non-Christian religions are lumped together in a single "Other religions" group, and the same ratio for European vs. non-European languages.

This sort of framing can be disrespectful to "others", since it can be read as saying "the most important thing about you is that you're Not One Of Us". It also presents the possibility of bias; if researchers from different cultures make different decisions about what gets "othered" in their forms, it's hard to compare results. (Even if both forms have text fields that allow people to enter custom answers, respondents may be more willing to tick a box than to write it in.)


The honest, if slightly cop-out answer, is that it depends on what you need the data for.

I'll break it down further:

Section One: Gender

Let's say you're doing a sociological study in which you're looking for gendered trends. You have a few options here, which I'll outline some of the pros and cons of:

Option One: Free Text

What is your gender? [Free text input field here]

However, this runs into a few issues.

  1. People might answer in bad faith (e.g., "apache helicopter" from above). There's a valid argument to be made that this can be used to just disqualify their data, but depending on how much you're manually reviewing this data vs. automatically parsing it, it might result in artificial inflation of the nonbinary data if your data parser treats anything but "male" or "female" as nonbinary.
  2. Typos happen. If you're manually reviewing the data, it's not a huge deal if someone screw up and enters "fmeale", but if your data will largely be handled algorithmically, you're going to have to get some extensive typo handling.
  3. Even without typos, you're going to have to have some extensive 'synonym' consideration if you're algorithmically parsing your data. For example, "m" and "male" will definitely map to the same category. But what if someone enters "trans man"? Are you separating cis vs. trans binary identities in your analysis, or no? Either is fine, depending on what you're planning on doing with the data, but it's a decision you'll have to make. Also, if you do want to analyze them separately, you may want to break it down.

Generally, this is a good option if you're mostly going to be manually reviewing your data, but if you're planning to let a computer analyze most of it, this probably isn't your best option.

Option Two: Radio Buttons

Let's be honest. Unless you're specifically surveying nonbinary/LGBPTQIA+ audiences, you're probably not distinguishing between "genderqueer", "agender", "genderfuild", etc. If you're only looking for broader trends, you might consider the most basic possible situation, which is just a static select list of:

  • Male
  • Female
  • Nonbinary

This avoids the "othering" of the literal "other", but allows you to avoid having to parse extra data/sanitize potential invalid responses. (That having been said, you're free to tack on a free text box next to "Nonbinary" to let people enter in their gender, which you can either save for the extra data or quietly discard.) This is also a lot less likely to see bad faith data entering, as people tend to answer more honestly if you don't let them do "apache helicopter" jokes.

Generally, this approach is good if you're going to be letting a computer process most of the data and you don't need to differentiate between cis vs. binary trans individuals in your results.

Option Three: More Radio Buttons

But what if you're going more in depth into trends based on cis vs. trans identities and you do need to distinguish in more detail than "male", "female" and "nonbinary"? Then my suggestion is just to add based on the categories you're tracking. For example:

What is your gender?

Terminology note: "cis" means that you identify as the same gender you were assigned at birth. E.g., if you were raised as a girl, and still feel like a girl, you are cis. Intersex individuals, please choose based on the category you feel best describes you.

  • Cis Man
  • Cis Woman
  • Trans Man
  • Trans Woman
  • Nonbinary (You can specify "Nonbinary, Assigned Male at Birth" and "Nonbinary, Assigned Female at Birth" if you need the data, but it can be considered a bit invasive, so I suggest not including it unless you really do need it.)

Generally, this approach is good if you're going to be letting a computer process most of the data and you DO need to differentiate between cis vs. binary trans individuals in your results.

Section Two: Sex

Let's say you're doing something medical, where physical sex is the relevant feature here. You might think that this would be easier. Unfortunately, you would be wrong, for two reasons:

  1. Trans people may be on hormones which will skew the results for their sex
  2. Intersex people exist.

So with that in mind, let's consider a few scenarios here, and what their advantages/disadvantages are.

Option One: Just the Y Chromosome, please!

Some people have already linked the XKCD blog post on the matter in here. For those who haven't checked that link out yet, the basic idea is that XKCD was doing a survey based on how people see color. Since colorblindness rates are closely linked to someone's sex chromosomes (as the colorblindness gene is carried on the x chromosome and is recessive), the survey made a decision to ask if someone had a Y chromosome or not. I appreciate how Randall handled it. It was a sensible question for his usage.

I also think it was the wrong question to ask. Not because of any sensitivity concerns, but because I think it'll skew his data for XXY individuals. Granted, XXY people with colorblindness genes are... a very small subset of the population... but if it's data you're interested in, this will skew it. I'd say that unless it's something actually carried on the Y chromosome, don't ask it like this.

Option Two: Just the Chromosomes, Please

Say you're doing something like the case above, where you're tracking colorblindness based on chromosomes. I'd honestly suggest something like this:

Please select your sex chromosome configuration here. We need to know this because colorblindness is related to sex chromosomes. If you do not know your chromosomes for sure, please take your best guess based on your sex. Men will typically have XY chromosomes, and women will typically have XX chromosomes. Trans individuals, please answer based on the sex you were assigned at birth. * XX * XY * X * XXX * XYY * XXXX * XXXY * XXYY * XXXXY * XXXXX

(You may also go more in depth and just ask for all options listed under Sex chromosome disorders, as well as the most common XX/XY.)

This, obviously, is a very long list, but here's the thing: if you need chromosome information, you need chromosome information. Don't shortcut it just because it's rare. It might not be a lot of data, but it's still data.

Section Three: It's More Complicated, Actually

But what if you need more data? What if you're studying something correlated with hormone levels, or with socialization, etc.? This is above and beyond the hardest scenario here, because there's so many possible options. People can complain all they want about PC culture and having too many genders, but the fact is that you're looking for data on lived experiences, and you don't need to approve of their gender identity to acknowledge that it's going to affect your data. In this case, you might consider doing one giant kitchen sink list that includes all options. I really don't suggest this. I even tried doing a list of options just for that case, and this is as far as I got:

What is your gender? Terminology note: "cis" means that you identify as the same gender you were assigned at birth. E.g., if you were raised as a girl, and still feel like a girl, you are cis. Because we are also considering the impacts of hormones, we need to know information such as intersex configurations and hormone therapy. Please choose the best fit of the following:

  • Cis man, non-intersex
  • Cis woman, non-intersex
  • Trans man, non-intersex, no HRT
  • Trans man, non-intersex, HRT
  • Trans woman, non-intersex, no HRT
  • Trans woman, non-intersex, HRT
  • Nonbinary, assigned male at birth, non-intersex, no HRT
  • Nonbinary, assigned male at birth, non-intersex, HRT
  • Nonbinary, assigned male at birth, non-intersex, no HRT
  • Nonbinary, assigned male at birth, non-intersex, HRT

And I didn't even finish! I got to the point where I had to try to break down intersex options and I gave up. This is a terrible list and no one wants to deal with it, on either end.

So instead, do it better: just ask multiple questions. My personal suggestion would be something like:

Question One: What is your gender? [Choose one of the questions from section one, as your data requires]

Question Two: What is your sex? [Choose one of the questions from section two, as your data requires]

Question Three: If you have a uterus, have you started or undergone menopause?

Question Four: If you are trans/nonbinary, have you undergone hormone replacement therapy?

[Continue to ask questions as you need]

Section Four: The Conclusion

Again, this is a complicated question, and you can get either fairly simple or fairly complicated implementations as your data requires. In general, however, you can keep the following guiding principles in mind:

  1. What data do I need?
  2. What are all of the possible cases for the data I need?
  3. What is the best way to account for all of those cases in a way that my survey responders will understand?
  4. Don't cheap out just because it will cover "most" of the cases you need. No one likes to be excluded, and excluding data, even outlier data, will impact your results.
  5. Don't be a jerk. Just because it isn't your experience, doesn't mean it isn't someone's. Be respectful, and be open to feedback from participants.