Section author: Adam Scott <ascott@noao.edu>

1.12.1. Guidance on constructing queries

To turn your science question into an experiment using the data hosted by the Data Lab, you need to:

  • design a basic workflow
  • construct an SQL/ADQL query to retrieve the data that you need

For example, to answer the question, “Are there faint Milky Way dwarf galaxies yet to be discovered in SMASH or DECaLS?”, you might adopt the technique of looking for overdensities of blue, potentially metal-poor, stars in the SMASH and DECaLS catalogs. The columns you would retrieve would at minimum contain RA, Dec, and magnitudes in the survey bands. You would place a constraint on the color of the objects, make a cut on the shape of objects to retrieve only point sources, and make any data quality cuts suggested by the surveys to limit the number of spurious objects. To identify the tables/views and columns that you would use, you would browse the table schema following the guidance on schema.

Retrieving all blue stars from SMASH and DECaLS would likely return a very large number of rows and take a long time. Thus, as the first step in a workflow, you should design a query that works off a limited area of sky, e.g. a single brick (a box approximately 0.25x0.25 degrees on the sky) in DECaLS or a single field in SMASH. Here is the query from the notebook demonstrating the discovery of the Hydra II dwarf galaxy (Martin et al.2015) in SMASH:

field = 169 # SMASH field number to query

# Create the query string; SQL keyword capitalized for clarity
#   depth > 1 = no short exposures please
#   ndetr, ndetg > 3 = more than 3 detections in r & g bands
#   abs(sharp) < 0.5 = avoid broad objects
query = """SELECT ra,dec,gmag,rmag,imag
           FROM smash_dr1.object
           WHERE fieldid = '%d' AND
               depthflag > 1 AND
               ndetr > 3 AND ndetg > 3 AND
               abs(sharp) < 0.5 AND
               gmag BETWEEN 9 AND 25 AND
               (gmag-rmag) BETWEEN -0.4 AND 0.4""" % field

The scientific analysis would then proceed using the data returned from this query.

For more guidance on constructing queries and references for learning SQL, check out: