dl.helpers package


dl.helpers.all module

dl.helpers.cluster module

Data Lab helpers for clustering.

dl.helpers.cluster.constructOutlines(x, y, clusterlabels)[source]

Construct the convex hull (outline) of points in (x,y) feature space,


y (x,) – Location of points in (x,y) feature space (e,g, RA & Dec).


hull – The convex hull of points (x,y), an instance of scipy.spatial.qhull.ConvexHull.

Return type



Given x & y coordinates as 1d sequences:

points = np.vstack((x,y)).T  # make 2-d array of correct shape
hull = constructOutlines(x,y)
plt.plot(points[hull.vertices,0], points[hull.vertices,1], 'r-', lw=2) # plot the hull
plt.plot(points[hull.vertices[0],0], points[hull.vertices[0],1], 'r-') # closing last point of the hull
dl.helpers.cluster.findClusters(x, y, method='MiniBatchKMeans', **kwargs)[source]

Find 2D clusters from x & y data.

  • y (x,) – Location of points in (x,y) feature space, e,g, RA & Dec, but x & y need not be spatial in nature.

  • method (str) – Cluster finding method from sklearn.cluster to use. Default: ‘MiniBatchKMeans’ (a streaming implementation of KMeans), which is very fast, but not the most robust. ‘DBSCAN’ is much more robust, but MUCH slower. For other methods, consult sklearn.cluster.

  • **kwargs

    Any other keyword arguments will be passed to the cluster finding method. If method=’MiniBatchKMeans’ or ‘KMeans’, n_clusters (integer number of clusters to find) must be passed, e.g.

    clusters = findClusters(x,y,method='MiniBatchKMeans',n_clusters=3)

dl.helpers.crossmatch module

Data Lab helpers for (local) positional cross-matching.

dl.helpers.crossmatch.make_catalog(n, min, max)[source]
dl.helpers.crossmatch.plot_time_vs_catalogsizes(times, extent=2, 7)[source]
dl.helpers.crossmatch.xmatch(ra1, dec1, ra2, dec2, maxdist=None, units='deg', method='astropy', **kwargs)[source]

Cross-match two sets of ra & dec coordinates locally (i.e. all coordinates are in RAM).

The function will search for counterparts of ra1/dec1 coordinates in the in ra2/dec2 coordinate set, i.e. one can consider ra2/dec2 to be the catalog that will be searched.

  • dec1 (ra1,) – RA and declination of first coordinate set, in units of units

  • dec2 (ra2,) – RA and declination of second coordinate set, in units of units

  • maxdist (float or None) – If not None, then it is the maximum angular distance (in units of units) to be considered. All distances greater than that will be considered non-matches. If None, then all ra1/dec1 will have matches in ra2/dec2.

  • units (str) – Units of ra1, dec1, ra2, dec2. Default: ‘deg’ (decimal degrees).

  • method (str) – Currently only astropy’s match_to_catalog_sky() method is supported, i.e. the default ‘astropy’.

Other Parameters

nthneighbor (int, optional) – If method='astropy'. Which closest neighbor to search for. Typically 1 is desired here, as that is correct for matching one set of coordinates to another. The next likely use case is 2, for matching a coordinate catalog against itself (1 is inappropriate because each point will find itself as the closest match).


  • idx (1-d array) – Index values of the ra1/dec1 counterparts found in ra2/dec2. Thus ra2[idx], dec2[idx] will select from the ra2/dec2 catalog the matched counterparts of the ra1/dec1 coordinate pairs.

    If maxdist was not None but a number instead, then ‘idx’ only contains the objects matched up to the maxdist radius.

  • dist2d (1-d array) – The angular distances of the matches found in the ra2/dec2 catalog. In units of units.

    If maxdist was not None but a number instead, then ‘dist2d’ only contains the objects matched up to the maxdist radius.

dl.helpers.legacy module

Legacy helpers for Data Lab. Most are deprecated.

class dl.helpers.legacy.Querist(username='anonymous')[source]

Bases: object


Check the first async job in the FIFO queue (if queue is not empty).




  • Always returns a 3 tuple. If no async job was in the queue,

  • returns (None,None,None). If there was an async query in the

  • queue but its status did not return ‘COMPLETED’, re-inserts

  • the query at its old position in the queue, and returns

  • (None,None,None). If the status was ‘COMPLETED’, returns the

  • tuple (query result,outfmt,preview).


Clears the async job queue, i.e. they become unretrievable.


Sets to token to empty string. Useful e.g. before saving a notebook.

property output_formats

Pretty-print to STDOUT the available outfmt values.




Return type



Pretty-print to STDOUT the available outfmt values.




Return type


dl.helpers.plot module

dl.helpers.utils module

Data Lab utility helper functions.

dl.helpers.utils.convert(inp, outfmt='pandas', verbose=False, **kwargs)[source]

Convert input inp to a data structure defined by outfmt.

  • inp (str) – String representation of the result of a query. Usually this is a CSV-formatted string, but can also be, e.g. an XML-formatted votable (as string)

  • outfmt (str) –

    The desired data structure for converting inp to. Default: ‘pandas’, which returns a Pandas dataframe. Other available conversions are:

    string - no conversion array - Numpy array structarray - Numpy structured array (also called record array) table - Astropy Table votable - Astropy VOtable

    For outfmt=’votable’, the input string must be an XML-formatted string. For all other values, as CSV-formatted string.

  • verbose (bool) – If True, print status message after conversion. Default: False

  • kwargs (optional params) – Will be passed as **kwargs to the converter method.


Convert a CSV-formatted string to a Pandas dataframe

arr = convert(inp,'array')
arr.shape  # arr is a Numpy array

df = convert(inp,outfmt='pandas')
df.head()  # df is as Pandas dataframe, with all its methods

df = convert(inp,'pandas',na_values='Infinity') # na_values is a kwarg; adds 'Infinity' to list of values converter to np.inf
dl.helpers.utils.normalizeCoordinates(x, y, frame_in='icrs', units_in='deg', frame_out=None, wrap_at=180)[source]

Makes 2D spatial coordinates (e.g. RA & Dec) suitable for use with matplotlib’s all-sky projection plotting.

  • y (x,) – Location of points in (x,y) feature space (e,g, RA & Dec in degrees). Avoid supplying x and y as columns from a pandas dataframe, as this unfortunately makes the coordinate conversions much slower. Numpy arrays, lists, astropy table and votable columns, all are fine.

  • frame_in (str) – Coordinate frame of x & y. Default: ‘icrs’. ‘galactic’ is also available. If the user desires other frames from astropy.coordinates, please contact __author__.

  • units_in (str) – Units of x & y. Default ‘deg’ (degrees).

  • frame_out (None or str) – If not None, and not same as frame_in, the x & y coordinates will be transformed from frame_in to frame_out.

  • wrap_at (float) – matplotlib plotting functions such as matplotlib.scatter() with all-sky projections expect the x-coordinate (e.g. RA) to be between -180 and +180 degrees (or more precisely: between -pi and +pi). The default wrap_at=180 shifts the input coordinate x (e.g. RA) accordingly.


Resolve object name to coordinates.


name (str or None) – If str, it is the name of the object to resolve. If None (default), a primpt for the object name will be presented.


sc – Instance of SkyCoord from astropy. Get e.g. RA via sc.ra (with units), or sc.ra.value (without units). Or explictly in a different coordinate system, e.g. sc.galactic.b, etc.

Return type


dl.helpers.utils.vospace_readable_fileobj(name_or_obj, token=None, **kwargs)[source]

Read data from VOSpace or some other place.


Most of the heavy lifting is done with get_readable_fileobj(). Any additional keywords passed to this function will get passed directly to that function.

  • name_or_obj (str or file-like object) –

    The filename of the file to access (if given as a string), or the file-like object to access.

    If a file-like object, it must be opened in binary mode.

  • token (str) – A token granting access to VOSpace.


A readable file-like object.

Return type