Matching single column against multiple values without self-joining table in MySQL

I have found a clever way to do this query without a self join.

I ran these commands in MySQL 5.5.8 for Windows and got the following results:

use test
DROP TABLE IF EXISTS answers;
CREATE TABLE answers (user_id VARCHAR(10),question_id INT,answer_value VARCHAR(20));
INSERT INTO answers VALUES
('Sally',1,'Pouch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duke');
INSERT INTO answers VALUES
('Sally',1,'Pooch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duck');

SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id;

+---------+-------------+---------------+
| user_id | question_id | given_answers |
+---------+-------------+---------------+
| John    |           1 | Pooch         |
| John    |           2 | Duke,Duck     |
| Sally   |           1 | Pouch,Pooch   |
| Sally   |           2 | Peach         |
+---------+-------------+---------------+

This display reveals that John gave two different answers to question 2 and Sally gave two different answers to question 1.

To catch which questions were answered differently by all users, just place the above query in a subquery and check for a comma in the list of given answers to get the count of distinct answers as follows:

SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A;

I got this:

+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John    |           1 | Pooch         |                 1 |
| John    |           2 | Duke,Duck     |                 2 |
| Sally   |           1 | Pouch,Pooch   |                 2 |
| Sally   |           2 | Peach         |                 1 |
+---------+-------------+---------------+-------------------+

Now just filter out rows where multianswer_count = 1 using another subquery:

SELECT * FROM (SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A) AA WHERE multianswer_count > 1;

This is what I got:

+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John    |           2 | Duke,Duck     |                 2 |
| Sally   |           1 | Pouch,Pooch   |                 2 |
+---------+-------------+---------------+-------------------+

Essentially, I performed three table scans: 1 on the main table, 2 on the small subqueries. NO JOINS !!!

Give it a Try !!!


I like join method, myself:

SELECT a.user_id FROM answers a
INNER JOIN answers a1 ON a1.question_id=1 AND a1.answer_value='Pooch'
INNER JOIN answers a2 ON a2.question_id=2 AND a2.answer_value='Peach'
GROUP BY a.user_id

Update After testing with a larger table (~1 million rows), this method took significantly longer than the simple OR method mentioned in the original question.


We were joining the user_id from the answers table in a chain of joins to get data from other tables, but isolating the answer table SQL and writing it in such simple terms helped me spot the solution:

SELECT user_id, COUNT(question_id) 
FROM answers 
WHERE
  (question_id = 2 AND answer_value = 'Peach') 
  OR (question_id = 1 AND answer_value = 'Pooch')
GROUP by user_id 
HAVING COUNT(question_id) > 1

We were unnecessarily using a second sub-query.