So I was looking into ways of improving the load times of a project, and knowing SQL is one of my biggest weaknesses (in my own humble opinion), I figured there might be a fair bit to gain by researching a bit around it, which lead to a friend linking me to http://www.chrislondon.co/joins-vs-subqueries/
Judging by his benchmarks on a low amount of data points, he discovered a massive leap between a subquery and a joined subquery. I’ve been using join statements all along (and in some cases multiple queries like the schmuck I am) so I figured it was worth giving this a shot on the data I was working with. I mention the data points as it was put up in the comments that the amount of data he queried against was very low, I hit mine against ~50 000 rows of data
I made a dump of the live data from the system in production and started running queries.
The numbers I got out of it were quite amazing (I’ll leave the query I ran at the bottom for public scrutiny and humiliation).
The default view is to return 30 entries per page, with my old approach this took 0.6 seconds, which is a frighteningly long time in my opinion, even if it isn’t a front end system.
Then came the query I have below, the same 30 rows were returned in 0.02 seconds, a very substantial leap, so I got experimental, and I kept increasing the rows I pulled with the new query, wanting to see just how big of a leap I could make.
The final number? 5 000 rows of data to just barely break 0.6 seconds! That’s quite the improvement on the meager 30 rows it managed before, that’s for sure.
Now as you’ll notice, I only took the heavy parts of the query and put into subqueries for now, there are still a few regular join statements left in the query, I might experiment some more but didn’t want to mess up the entire system in one go.
SELECT
cd.triangel_course_id AS course_id,
cd.triangel_course_courseid,
cd.triangel_course_coordinator,
cd.triangel_course_coordinator_2,
cd.triangel_course_seats AS course_seats,
cd.triangel_course_time_start AS course_start,
cd.triangel_course_time_end AS course_end,
cd.triangel_course_note AS course_note,
cd.triangel_course_room,
cd.triangel_course_price AS course_price,
cd.triangel_course_price_private AS course_price_private,
cd.triangel_course_email_note AS course_email_note,
cd.triangel_course_english AS course_english,
cd.triangel_course_web AS course_web,
cd.triangel_course_complete_user AS course_complete,
first_coordinator.triangel_instructor_name AS primary_coordinator,
second_coordinator.triangel_instructor_name AS secondary_coordinator,
c.courseid AS course_code,
c.name AS course_name,
c.name_en AS course_name_en,
room.triangel_room_name AS classroom,
tc.attendee_count,
tc.invoiced_count,
tr.reserved_count
FROM
triangel_course cd
LEFT JOIN
triangel_instructor first_coordinator
ON ( first_coordinator.triangel_instructor_id = cd.triangel_course_coordinator )
LEFT JOIN
triangel_instructor second_coordinator
ON ( second_coordinator.triangel_instructor_id = cd.triangel_course_coordinator_2 )
LEFT JOIN
courses c
ON ( c.id = cd.triangel_course_courseid )
LEFT JOIN
triangel_rooms AS room
ON ( room.triangel_room_id = cd.triangel_course_room )
LEFT JOIN (
SELECT
COUNT(DISTINCT triangel_courses_id) AS attendee_count,
COUNT(DISTINCT triangel_courses_faktura) AS invoiced_count,
triangel_courses_course
FROM
triangel_courses
WHERE
triangel_courses_delisted <= 0
AND
triangel_courses_noshow = 0
GROUP BY
triangel_courses_course
) tc
ON tc.triangel_courses_course = cd.triangel_course_id
LEFT JOIN (
SELECT
SUM(triangel_courses_reserved_seats) AS reserved_count,
triangel_courses_reserved_course
FROM
triangel_courses_reserved
GROUP BY
triangel_courses_reserved_course
) tr
ON tr.triangel_courses_reserved_course = cd.triangel_course_id
WHERE
1 = 1
And yes, it’s terrible, SQL is far from my strong suit, and I’ll gladly be told better ways of approaching things in the query so do let me know!
Leave a Reply