In [1]:
__author__ = 'Robert Nikutta, Stéphanie Juneau, Knut Olsen & the NOAO Data Lab team <>'
__version__ = '20180107' # yyymmdd; version stamp of this notebook
__datasets__ = ['smash_dr1'] # datasets used in this notebook

Detecting the Hydra II dwarf galaxy

Robert Nikutta, Stéphanie Juneau, Knut Olsen & the NOAO Data Lab team

Background Ultrafaint dwarf galaxies are crucial to understanding many aspects of the universe. For instance, they are dominated by dark matter; their localization in space can thus trace the large-scale structure of the dark matter distribution. Furthermore, dwarf galaxies are suspected to host intermediate-mass black holes (IMBH), which so far have eluded efforts to find them. IMBHs will naturally bridge the gap between the solar-mass black hole and super-massive blackholes that reside at the center of virtually every large galaxy.

Data retrieval We will retrieve data from Field 169 of the SMASH catalog (Nidever et al. (2017, AJ, 154, 199)) and look for overdensities of blue objects.

The required columns are RA, Dec, and the g, r, i magnitudes.

Detection We will convolve the spatial distribution of our dataset with a pair of Gaussian kernels and subtract the results, as done in e.g. Stanford et al. (2005, ApJ, 634, 2, L129) (galaxy clusters), or Koposov et al. (2008, ApJ, 686, 279) (MW satellites). This has the effect of convolving the spatial distribution with a Mexican hat filter, which is useful for detecting objects at a desired spatial scale.

Disclaimer & attribution

If you use this notebook for your published science, please acknowledge the following:

Imports & initialization

In [2]:
# std lib
from __future__ import print_function # to use print() as a function in Python 2

    input = raw_input # use 'input' function name in both Python 2 and 3
except NameError:

from getpass import getpass
import warnings
warnings.filterwarnings('ignore') # to suppress some astropy depracation warnings

# 3rd party
import numpy as np
from astropy import utils, io, convolution, stats
from astropy.visualization import make_lupton_rgb
from photutils import find_peaks
from pyvo.dal import sia
import pylab as plt
%matplotlib inline

# Data Lab
from dl import authClient as ac, queryClient as qc
    from dl.helpers.utils import convert
except ImportError:
    from dl.helpers import convert

# plots default setup
plt.rcParams['font.size'] = 14

Log in

First, obtain a Data Lab authentication token, which needs to passed along to all Data Lab server-side operations.

In [3]:
token = ac.login(input('Enter user name (+ENTER): '),getpass('Enter password (+ENTER): '))  # here we can use the 'anonymous' user name, and an empty password
Enter user name (+ENTER): anonymous
Enter password (+ENTER): ········

Query the SMASH DR1 database

We will query the averaged photometry table from the SMASH catalog and select field #169. We will limit the query to avoid photometry taken only with short exposures (depthflag>1), avoid broad objects (|sharp|<0.5), and pick blue objects (-0.4 < g-r < 0.4). We will also exclude objects with less than 4 detections to improve the spatial SNR.

Construct the query string

In [4]:
field = 169 # SMASH field number to query

# Create the query string; SQL keyword capitalized for clarity
#   depth > 1 = no short exposures please
#   ndetr, ndetg > 3 = more than 3 detections in r & g bands
#   abs(sharp) < 0.5 = avoid broad objects
query =\
"""SELECT ra,dec,gmag,rmag,imag
   FROM smash_dr1.object
   WHERE fieldid = '{:d}' AND
         depthflag > 1 AND
         ndetr > 3 AND ndetg > 3 AND
         abs(sharp) < 0.5 AND
         gmag BETWEEN 9 AND 25 AND
         (gmag-rmag) BETWEEN -0.4 AND 0.4""".format(field)

Submit the query

Running the query in synchroneous mode is very easy.

In [5]:
response = qc.query(token,query) # response is by default a CSV-formatted string
CPU times: user 356 ms, sys: 415 ms, total: 771 ms
Wall time: 34.6 s

We can use a helper function to convert the query result into a data structure. Let's convert to a Pandas dataframe:

In [6]:
R = convert(response,'pandas') # R is a pandas dataframe
print("Number of objects:", R.shape[0])
Returning Pandas dataframe
Number of objects: 104973
           ra        dec       gmag       rmag       imag
0  184.876674 -32.873511  24.746605  24.838743  24.185682
1  184.876606 -32.870861  24.156397  24.068817  23.074945
2  184.875853 -32.867214  24.084047  24.028061  23.630045
3  184.877080 -32.869780  24.482061  24.446104  23.858896
4  184.878492 -32.866905  24.678942  24.714973  24.624266

Make a figure of the spatial distribution

You might spot some overdensities already.

In [7]:
fig = plt.figure(figsize=(7,6))
plt.hexbin(R['ra'], R['dec'],gridsize=200)
plt.colorbar(label='number of objects per spatial bin');

The Dwarf Filter

Here we define the dwarf filter as a differential convolution of a two-dimensional image using two Gaussian kernels; this has the effect of convolution with a Mexican hat filter. The default kernel shapes look for objects on the scale of a few arcmin. The output includes a clipped array of the convolved spatial distribution, which we will use for peak detection.

In [8]:
def dwarf_filter (ra,dec,fwhm_small=2.0,fwhm_big=20):

    """Differential convolution with 2D Gaussian kernels.
       Based on Koposov et al. (2008).
       Code by Ken Mighell and Mike Fitzpatrick.
       Minor edits by RN.
       ra, dec : float or array
           RA & Dec in degrees.
       fwhm_small, fwhm_big : float
           Full-width half maximum sizes of the small and big Gaussian kernels
           to use in convolution, in arcminutes.
    x, y = ra, dec

    print("Computing differential convolution .... ",)

    # Information about declination (y) [degrees]
    ymean = (y.min() + y.max()) / 2.0
    ydiff_arcmin = (y.max() - y.min()) * 60.0 # convert from degrees to arcmin

    # Information about right ascension (x) [degrees in time]:
    xdiff = x.max() - x.min() # angular separation [degrees (time)] 
    xmean = (x.min() + x.max()) / 2.0

    # convert from degrees in time to separation in angular degrees:
    xdiff_angular = (x.max() - x.min()) * np.cos(ymean*(np.pi/180.0))

    # convert from degress to arcmin
    xdiff_angular_arcmin = xdiff_angular * 60.0 

    # Get the number of one-arcmin pixels in the X and Y directions:
    nx = np.rint(xdiff_angular_arcmin).astype('int')
    ny = np.rint(ydiff_arcmin).astype('int')

    # Create a two-dimensional histogram of the raw counts:
    Counts, xedges, yedges  = np.histogram2d (x, y, (nx,ny) )
    extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
    raw_hist = np.rot90(Counts).copy() # hack around Pythonic weirdness

    # Make the small and big Gaussian kernels with a standard deviation
    # of the given FWHM in arcmin^2 pixels.
    kernel_small = convolution.Gaussian2DKernel(fwhm_small/2.35,factor=1)
    kernel_big = convolution.Gaussian2DKernel(fwhm_big/2.35,factor=1)

    # Compute the differential convolution kernels.
    conv_big = convolution.convolve(raw_hist, kernel_big)
    conv_small = convolution.convolve(raw_hist, kernel_small)
    conv_delta = conv_small - conv_big
    delta = conv_delta.copy()

    # Compute statistics and the floor
    mean = np.mean(delta, dtype='float64')
    sigma = np.std(delta, dtype='float64')
    sigmaRaw = np.std(raw_hist,dtype='float64')
    median = np.median(delta) # not used
    floor = mean

    print('dwarf_filter: mean = {:g}  sigma = {:g} sigmaRaw = {:g}'.format(mean, sigma, sigmaRaw))

    clipped = delta.copy()
    clipped[delta < floor] = floor

    # Return the computed fields.
    return raw_hist, extent, delta, clipped, sigma

Run the dwarf filter

We'll use the default convolution kernels of 2 and 20 arcminutes in size.

In [9]:
small_k, big_k = 2., 20.  # kernel sizes in arcminutes
raw, extent, delta, clipped, dsigma = dwarf_filter(R['ra'],R['dec'],fwhm_small=small_k,fwhm_big=big_k)
Computing differential convolution .... 
dwarf_filter: mean = 0.0890546  sigma = 1.79451 sigmaRaw = 5.3359
CPU times: user 615 ms, sys: 529 µs, total: 616 ms
Wall time: 614 ms

Plot the convolved 2D histogram

In [10]:
fig, ax = plt.subplots(figsize=(7,6))
im = plt.imshow(clipped)
plt.colorbar(label='relative spatial density after convolution');

Some peaks are visible, let's locate them automatically...

Identify peaks

We'll use the photutils package to identify 10-sigma peaks in the clipped filtered image.

In [11]:
# find peaks
mean, median, std = stats.sigma_clipped_stats(clipped,sigma=3.0,iters=5)    
tbl = find_peaks(clipped,median+10,box_size=small_k*2)

# add ra & dec positions of peaks found
a, b = extent[:2]
xvec = np.arange(a,b,(b-a)/clipped.shape[1])
a, b = extent[2:]
yvec = np.arange(a,b,(b-a)/clipped.shape[0])

tbl['ra'] = xvec[tbl['x_peak']]
tbl['dec'] = yvec[-tbl['y_peak']-1]
x_peak y_peak   peak_value        ra           dec      
------ ------ ------------- ------------- --------------
    86     89 11.3115245379 185.410559533 -31.9760034474
    34    100 11.5126266478 184.393480413 -32.1595418034

Show the identified density peaks

In [12]:
ecs = ['w','y'] # colors of box frames
ax.scatter(tbl['x_peak'],tbl['y_peak'],marker='s',s=tbl['peak_value']*40,c='none',edgecolors=ecs,lw=3) # keeps writing to previous ax
fig  # repeats (the updated) figure

Inspect the image cutouts around the peaks

Simple Image Access service

Data Lab comes with batteries included. Image cutout and download services are built in.

We'll just write two little functions:

  • one to download the deepest stacked images found in the given bands at a given position in the sky
  • and a function to plot several images side-by-side.
In [13]:
# set up SIA
svc = sia.SIAService(DEF_ACCESS_URL)

# a little func to download the deepest stacked images
def download_deepest_images(ra,dec,fov=0.1,bands=list('gri')):
    imgTable =,dec), (fov/np.cos(dec*np.pi/180), fov), verbosity=2).votable.to_table()
    print("The full image list contains {:d} entries.".format(len(imgTable)))
    sel0 = (imgTable['proctype'] == b'Stacked') & (imgTable['prodtype']==b'image') # basic selection
    images = []
    for band in bands:
        print("Band {:s}: ".format(band)) #, end='')
        sel = sel0 & (imgTable['obs_bandpass'] == band.encode()) # add 'band' to selection
        Table = imgTable[sel] # select
        row = Table[np.argmax(Table['exptime']'float'))] # pick image with longest exposure time
        url = row['access_url'] # get the download URL
        print('downloading deepest stacked image...')
        img = io.fits.getdata(,cache=True,show_progress=False,timeout=120)) # .decode() b/c in Python 3 url is of "byte" type and getdata() expects "string" type
    print("Downloaded {:d} images.".format(len(images)))
    return images

# multi panel image plotter
def plot_images(images,geo=None,panelsize=5,titles=list('gri'),
    if geo is None:
        geo = (len(images),1)  # ncols, nrows
    fig = plt.figure(figsize=(geo[0]*panelsize,geo[1]*panelsize))
    for j,img in enumerate(images):
        ax = fig.add_subplot(geo[1],geo[0],j+1)

Get images for the "left yellow" box

Download the deepest stacked image cutouts (in 3 bands) around the position of the peak marked with a yellow box. Depending on network speed and system load, this can take a few seconds. Also create a 3-band false-color composite.

In [14]:
bands = list('gri')
idx = 1
print(tbl['ra'][idx], tbl['dec'][idx])
images = download_deepest_images(tbl['ra'][idx], tbl['dec'][idx], fov=0.1, bands=bands) # FOV in deg
184.393480413 -32.1595418034
The full image list contains 2616 entries.
Band g: 
downloading deepest stacked image...
Band r: 
downloading deepest stacked image...
Band i: 
downloading deepest stacked image...
Downloaded 3 images.

Plot the images, plus a false-color 3-band image:

In [15]:
images = [im-np.median(im) for im in images] # subtract median from all images for better scaling
images += [make_lupton_rgb(*images[::-1],stretch=30)] # add a 3-color composite image
plot_images(images,geo=(4,1),titles=bands+['False-color 3-band image'])