1.1.3. Survey datasets

Most Data Lab workflows will begin by posing a question for which the first step in getting an answer will involve a query of one or more survey catalog datasets. In the Data Lab, catalogs are stored in databases, with any given catalog consisting of a number of separate but linked tables. These tables are accessed via Structured Query Language (SQL) or its variant, Astronomical Data Query Language (ADQL). From the beginning, users are thus presented with a set of challenges:

  • Learning what measurements the tables from a given survey dataset contain and what they are named
  • Learning how to construct a database query that will retrieve all of the measurements needed for a given question
  • If measurements from more than one table or more than one survey are needed, learning how to join tables in such a way that all of the information is retrieved
  • For complex questions in particular, learning how to optimize the database query for performance

For many users, the first step in answering a question through the Data Lab will thus be to learn about the particular datasets that it contains.

1.1.3.1. What kinds of datasets does the Data Lab contain?

1.1.3.1.1. Core datasets

These are large high-value datasets served by the Data Lab, possibly providing value-added data such as pre-computed columns or external-table crossmatches. Tables are optimized/indexed to support the most common science cases. Examples of current and coming datasets are DECaLS and the DESI Targeting Surveys, DES, the DESI survey, and the pixel data contained in the NOAO Science Data Archive.

1.1.3.1.2. Hosted datasets

These are smaller-scale, Survey Team, or PI datasets where a delivered high-level data product collection is provided by users who want to share the data via Data Lab services. These are relatively static in terms of release frequency/versions but imply some level of Data Lab operational support in order to be made available to community users. Examples are SMASH, the NEO Survey (PI Allen), and the Galactic Bulge Survey (coming soon; PI Saha).

1.1.3.1.3. Reference datasets

These are large external datasets, mirrored through the Data Lab because of their value as photometric, spectroscopic, or astrometric references. Examples are SDSS, WISE, GAIA, and USNO A/B.

1.1.3.2. Guidance on understanding table schema

Given the variety of datasets available through the Data Lab database, learning how to identify the tables and table columns of interest can be a challenge. There are several tools to help with this:

  • The Data Lab query webpage contains a schema browser through which you can browse the available datasets, their tables, and the column descriptions.
  • The datalab command has a schema method that will display the schema and table descriptions.
  • The Survey Data webpage contains full dataset descriptions and links to survey documentation.

In general, the survey datasets hosted by the Data Lab contain a few kinds of tables:

  • Overview: These are tables that provide summary information of the survey, such as the spatial organization of the catalog data. These tables generally have many fewer rows than the main catalog tables, as they do not contain individual objects.
  • Object: These are typically the main catalog tables, and contain aggregated information for the astronomical objects identified by the survey. There are often views of these main tables that apply a constraint to yield subsets of objects with similar properties, e.g. the star and galaxy views of DECaLS DR3. The object tables are sometimes broken into several tables, each with different columns of information but linked by a unique object identifier.
  • Measurement: These are typically tables containing time-stamped individual measurements of objects in the main catalog tables, in general organized by having one row for every individual epoch of every individual object. While the number of columns in these tables is typically smaller than for the object tables, the number of rows can be much larger, and thus care should be exercised when pulling data from them.
  • Neighbors: These are specialized tables that contain information on all the internal spatial matches within a specified radius of all objects in the object table. Depending on the density of the objects on the sky and matching radius, these tables can be very large.
  • Crossmatch: These tables typically contain the spatially matched cross-identifications of the main object table with object catalogs from one or more external surveys.
  • Exposure: These tables typically contain metadata, such as calibration information, airmass, etc., for every individual exposure taken during the survey. By joining these data through the measurement and object tables, users can assign these metadata values to their objects of interest.
  • Chip: These tables are similar to the Exposure tables, but contain metadata relevant to the individual chips in the mosaics that make up the exposures, e.g. chip-dependent photometric calibration information.

The tables below organize the database tables for the core and reference catalogs hosted by the Data Lab by their type. Table views are listed in italics underneath the primary table from which they are drawn.


Table of Tables: DECaLS, SMASH, and GAIA


Table Type

root:

DECaLS (DR3)

ls_dr3

SMASH (DR1)

smash_dr1

GAIA (DR1)

gaia_dr1

Overview

bricks

bricks_dr3

depth

depth_summary

field  
Object

tractor

tractor_primary

tractor_secondary

star

galaxy

apflux

wise

object

stars

galaxies

gaia_source

tgas_source

variable_summary

rrlyrae

cepheid

phot_variable
_time_series_gfov _statistical_parameters
Measurement   source
phot_variable
_time_series_gfov
Neighbors neighbors    
Crossmatch

dr3_dr12q

dr3_dr7q

dr3_specobj_dr13

dr3_superset_dr12q

xmatch  
Exposure   exposure  
Chip

survey_ccds

ccds_annotated

chip  

Table of Tables: DES SVA1, SDSS DR13, NEO, and USNO


Table Type

root:

DES (SVA1)

des_sva1

SDSS (DR13)

sdss_dr13

NEO (DR1)

neo_dr1

USNO

usno

Overview        
Object

gold_catalog

redmapper_exp_catalog

redmapper_pub_catalog

and many subtables

specobj

photoposplate

calibobj_star

calibobj_gal

dr12qso

emissionlinesport_dr12

galspecline_dr8

movgrp

movmpc

a2

b1

Measurement    

movds

movobs

 
Neighbors        
Crossmatch  

wisematch_gal

wisematch_star

   
Exposure     movexp  
Chip