0 votes
asked May 31, 2017 by ascottg (120 points) | 1,075 views

2 Answers

+2 votes
Here are some general tips for improving query performance.  Your query may already be near "optimal" and requires a long time to process.  But here are some things to check.

1. Most of the time, if you reduce the number of rows the query "scans", you will get your results back faster.

When the database runs a query, it has to read through the tables made of rows.  Often this ends up being performed by a set of disk drives that use a mechanical head moving to different regions on a spinning platter.  The less you make the heads move, the faster you get your results.

2.  Use the LIMIT clause to reduce the number of rows returned.

3. If the query JOINs one or more tables, be sure the columns you are joining on are indexed.

4. If the query JOINs one or more columns, and they are indexed, be sure you do not put a function around the specified joining columns.  Otherwise it cannot use the index since it has to perform the function first.

5. If you have a WHERE clause with conditions for ranges (e.g. WHERE g > 5.0 and g < 7.0) and the column is not indexed, see if there is another table named with a "_cs" on the end, for CSTORE.   This is a special type of table stored in columnar format instead of rows, and will run much faster for non-indexed columns.
answered May 31, 2017 by ascott (720 points)
+1 vote
Don't blindly reuse code
This issue is also very common. It's very easy to copy someone else's code because you know it pulls the data you need. The problem is that quite often it pulls much more data than you need, and developers rarely bother trimming it down, so they end up with a huge superset of data. This usually comes in the form of an extra outer join or an extra condition in the WHERE clause. You can get huge performance gains if you trim reused code to your exact needs.
Do pull only the number of columns you need
This issue is similar to issue No. 2, but it's specific to columns. It's all too easy to code all your queries with SELECT * instead of listing the columns individually. The problem again is that it pulls more data than you need. I've seen this error dozens and dozens of times. A developer does a SELECT * query against a table with 120 columns and millions of rows, but winds up using only three to five of them. At that point, you're processing so much more data than you need it's a wonder the query returns at all.
answered Jul 7, 2017 by anonymous

252 questions

254 answers


1,716 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.