1.7.1. The datalab Command

1.7.1.1. Data Lab Command Line Client

The Data Lab Command Line Client (DCLC) is a Python-based package that provides an alternate way to interact with the various Data Lab services. It can be installed with pip install datalab-clientIt is invoked via the ``datalab command. The list of supported Data Lab tasks can be obtained via datalab -help:

> datalab --help

Usage:

    % datalab <task> [task_options]

where <task> is one of:

           addcapability - Activate a capability on a Virtual Storage container
               broadcast - broadcast a SAMP message
                      cp - copy a file in Data Lab
                  dropdb - Drop a user MyDB table
                    exec - launch a remote task in the Data Lab
                     get - get a file from Data Lab
                  launch - launch a plugin
     list_query_profiles - List the available Query Manager profiles
   list_storage_profiles - List the available Storage Manager profiles
          listcapability - List the capabilities supported by this Virtual Storage
                  listdb - List the user MyDB tables
                      ln - link a file in Data Lab
                   login - Login to the Data Lab
                  logout - Logout of the Data Lab
                      ls - list a location in Data Lab
                   mkdir - create a directory in Data Lab
                   mount - mount the default Virtual Storage
                      mv - move a file in Data Lab
                     put - Put a file into Data Lab
                qresults - Get the async query results
                 qstatus - Get an async query job status
                   query - Query a remote data service in the Data Lab
                      rm - delete a file in Data Lab
                   rmdir - delete a directory in Data Lab
                  schema - Print data service schema info
                siaquery - query a SIA service in the Data Lab
                  status - Report on the user status
                     tag - tag a file in Data Lab
                  whoami - Print the current active user

All subcommands take the optional arguments:

debug - print debug log level messages [optional]

verbose - print verbose level log messages [optional]

warning - print warning level log messages [optional]

If a required argument is not specified on the command line, a prompt will be given for it. If you are specifying an argument on the command line then you need to put two dashes -- in front of the argument name and an equals before the value of the argument:

-—argument=value.

1.7.1.2. Referencing files in Data Lab

When you want to refer to a file in the Data Lab (also called a remote file), you need to put a 'vos://' prefix before it so that the DCLC knows it is a remote file you are referring to. If you want to be really precise, you can use the full identifier for the file (also known as the VOSpace identifier) which would be something like:

vos://datalab.noao.edu!vospace/nodes/sarah/data/table1.vot

However, you can also just use the location within your virtual storage area, in this case - 'vos://data/table1.vot' - and the DCLC will translate this into the proper form for you. Note that if you need to identify a file within someone else’s virtual storage, e.g., a data file that a collaborator is sharing with you, then you will need to use the full VOSpace identifier to refer to it.

1.7.1.3. Task reference

• addcapability

This activates the specified capability in the specified directory by uploading the appropriate capability’s configuration from $VOSPACE_CAPSDIR. It takes the following additional arguments:

fmt - List of formats to accept [required]

dir - Container name [required]

cap - Capability name [required]

listcap - List available capabilities [optional]

> datalab addcapability —-dir=vos://dbs --cap=tableingester --fmt=votable,fits,csv

• broadcast

This sends a SAMP message of the specified type and with the given parameters. It takes the following additional arguments:

type - SAMP message type [required]

pars - Message parameters [required]

> datalab broadcast -—type=... -—pars=...

• cp

This copies a remote file between the two specified locations. (The get and put commands are used to copy files between Data Lab virtual storage and local storage.) It takes the following additional arguments:

from - Source location in Data Lab [required]

to - Destination location in Data Lab [required]

> datalab cp -—from=vos://dbs/test.vot -—to=vos://results/test.vot

• dropdb

This drops a user MyDB table. It takes the following additional argument:

table - Table name [optional]

> datalab dropdb -—table=mydb://mydbtable

• exec

This executes a remote processing job within the Data Lab. It takes the following additional arguments: cmd - name of remote task to run [required]

args - list of key-value arguments to submit to remote task [optional]

> datalab exec -—cmd=cutout -—args=“pos=/tmp/vospace/sarah/ltg/ltg.csv, urls=/tmp/vospace/ sarah/ltg/img.vot, outdir=vos://datalab.noao.edu~vospace/sarah/ltg, nthreads=20"

• get

This retrieves the specified remote file and optionally saves it to a local file. It takes the following additional arguments:

fr - Remote Data Lab file name [required]

to - Local disk file name [optional]

> datalab get -—fr=vos://data/test.vot -—to=test.vot

• launch - THIS COMMAND IS NOT YET AVAILABLE

It takes the following additional arguments:

> datalab launch

• list_query_profiles

This lists the available Query Manager profiles. It takes the following additional arguments:

profile - Profile to list [optional]

format - Output format (csv|text) [optional]

> datalab list_query_profiles -—profile=... -—format=csv

• list_storage_profiles

This lists the available Storage Manager profiles. It takes the following additional arguments:

profile - Profile to list [optional]

format - Output format (csv|text) [optional]

> datalab list_query_profiles -—profile=... -—format=csv

• listcapability

This lists the capabilities supported by the VOSpace service.

> datalab listcapability
The available capabilities are:
tableingester

• listdb

This lists the user MyDB tables. It takes the following additional argument:

table - Table name [optional]

> datalab listdb -—table=mydb://mydbtable

• ln

This creates a (soft) link to the specified file at the given location. It takes the following additional arguments:

fr - location in Data Lab of link from[required]

to - location linked points to [required]

> datalab ln -—fr=vos://dbs -—to=http://some/data/file

• login

This logs a user into a Data Lab session and optionally mounts their remote storage space (see mount). It takes the following additional arguments:

user - username of account in Data Lab [required]

password - password for account in Data Lab [required]

mount - mountpoint of remote Virtual Storage [optional]

> datalab login -—user=sarah -—password=herr1ng -—mount=/tmp/vospace
Welcome to the Data Lab, sarah
Initializing mount

• logout

This logs out the user from a Data Lab session and optionally unmounts their remote storage space. It takes the following additional arguments:

unmount - mountpoint of remote Virtual Storage [optional]

> datalab logout -—unmount=/tmp/vospace
Unmounting remote space
You are now logged out of the Data Lab

• ls

This lists a remote directory. It takes the following additional arguments:

name - Location in Data Lab to list [optional]

format - Format for listing (ascii|csv|raw) [optional]

> datalab ls -—name=...

• mkdir

This creates the specified directory. It takes the following additional arguments:

name - directory in Data Lab to create [required]

> datalab mkdir -—name=vos://test

• mount

This mounts the specified Virtual Storage (remote storage) via FUSE to appear as a local filesystem. It takes the following additional arguments:

vospace - space to mount [required]

mount - mount point for Virtual Storage [required]

foreground - mount the filesystem as a foreground operation [optional]

cache_limit - upper limit on local diskspace to use for file caching (in MB) [optional]

cache_dir - local directory to use for file caching [optional]

readonly - mount VOSpace readonly [optional]

cache_nodes - cache dataNode Properties [optional]

allow_other - allow all users access to this mountpoint [optional]

max_flush_threads - upper limit on number of flush (upload) threads [optional]

secure_get - use HTTPS instead of HTTP [optional]

nothreads - Only run in a single thread, causes some blocking. [optional]

> datalab mount -—vospace=... -—mount=/tmp/vospace

• mv

This moves the specified remote file/directory between the two locations. It takes the following additional arguments:

from - location in Data Lab to move from [required]

to - location in Data Lab to move to [required]

> datalab mv -—from=vos://data/test.vot -—to=vos://work/test.vot

• put

This uploads a local file to the remote storage space. It takes the following additional arguments:

fr - Local disk file name [required]

to - Remote Data Lab file name [required]

> datalab put -—fr=/home/sarah/simulations/run5.txt -—to=vos://dbs/simul1.dat

• qresults

Returns the async query results. It takes the following additional arguments:

jobId - Query Job ID [required]

> datalab qresults -—jobId=...

• qstatus

Returns the async query job status. It takes the following additional arguments:

jobId - Query Job ID [required]

> datalab qstatus -—jobId=...

• query

This runs a query against either the db directly (synchronous) or via the TAP service (asynchronous). It takes the following additional arguments:

adql - ADQL statement [optional]

sql - input SQL filename [optional]

fmt - requested output format [optional] - ‘csv’, ‘ascii’, ‘votable’

out - output filename [required]

async - asynchronous query [optional]

profile - Service profile to use [optional]

timeout - Requested query timeout [optional]

Note that tables within your MyDB need to be identified with a mydb:// prefix in either the query or the output argument.

> datalab query -—adql="select id, ra_j2000, dec_j2000, g, g_i, i_z from lsdr2.stars" -—fmt="csv" -—out=“vos://lsdr2.csv"
> datalab query -—adql=“select id, ra_j2000, dec_j2000, g, g_i, i_z from lsdr2.stars" -—out=“mydb://table1"
> datalab query -—adql=“select id, g_i, i_z from lsdr2.stars l, mydb://table1 m where l.id = m.id"  -—fmt=“csv"
> datalab query -—sql=complex.sql -—async=true -—fmt=‘csv'

• rm

This deletes a remote file. It takes the following additional arguments:

name - file in Data Lab to delete [required]

> datalab rm -—name=vos://dbs/test.vot

• rmdir

This deletes a remote directory. It takes the following additional arguments:

name - directory in Data Lab to delete [required]

> datalab rmdir -—name=vos://dbs

• schema

Prints data service schema info:

val - Value to list ([[<schema>][.<table>][.<col>]]) [optional]

format - Output format (csv|text|json) [optional]

profile - Service profile [optional]

> datalab schema

Schema Name   Description
-----------   -----------
   gaia_dr1   GAIA Data Release 1
       ivoa   IVOA ObsCore tables
   des_sva1   DES SVA1 Data Products
       usno   USNO Astrometry Catalogs
     ls_dr3   The DECam Legacy Survey Data Release 3

• siaquery

This runs a query against a SIA service in the Data Lab. It takes the following additional arguments:

out - Output filename [optional]

input - Input filename [optional]

search - Search radius [optional]

> datalab siaquery -—search=SearchString -—input="infile" -—out=“vos://lsdr2.csv"
> datalab siaquery -—search=SearchString -—out=“mydb://table1"
> datalab siaquery -—search=SearchString

• status

This shows the status of the current user: whether they are logged in or not, the list of current jobs/queries

> datalab status
User sarah is logged into the Data Lab

• tag

This tags a remote file with a user-specified label. It takes the following additional arguments:

file - file in Data Lab to tag [required]

tag - tag to add to file [required]

> datalab tag -—file=vos://dbs/votable.vot -—tag="A crucial data file"

• whoami

This shows the current active user:

> datalab whoami
sarah