Section author: Robert Nikutta <nikutta@noao.edu>

Version: 20190106

1.7.1. The datalab Command

1.7.1.1. Data Lab Command Line Client

The Data Lab Command Line Client (DCLC) is a Python-based package that provides an alternate way to interact with the various Data Lab services. It can be installed with pip install datalab-client. It is invoked via the datalab command. The list of supported Data Lab tasks can be obtained via datalab help:

> datalab help

Usage:

   % datalab <task> [task_options]

where <task> is one of:

              cp - copy a file in Data Lab
          dropdb - Drop a user MyDB table
             get - get a file from Data Lab
          listdb - List the user MyDB tables
              ln - link a file in Data Lab
           login - Login to the Data Lab
          logout - Logout of the Data Lab
              ls - list a location in Data Lab
           mkdir - create a directory in Data Lab
              mv - move a file in Data Lab
       mydb_copy - Rename a user MyDB table
     mydb_create - Create a user MyDB table
       mydb_drop - Drop a user MyDB table
     mydb_import - Import data into a user MyDB table
     mydb_insert - Insert data into a user MyDB table
       mydb_list - List the user MyDB tables
     mydb_rename - Rename a user MyDB table
   mydb_truncate - Truncate a user MyDB table
        profiles - List the available Query Manager profiles
             put - Put a file into Data Lab
        qresults - Get the async query results
         qstatus - Get an async query job status
           query - Query a remote data service in the Data Lab
              rm - delete a file in Data Lab
           rmdir - delete a directory in Data Lab
          schema - Print data service schema info
        services - Print available data services
          status - Report on the user status
        svc_urls - Print service URLs in use
             tag - tag a file in Data Lab
         version - Print task version
          whoami - Print the current active user

All subcommands take the optional arguments:

debug - print debug level log messages [optional]

verbose - print verbose level log messages [optional]

warning - print warning level log messages [optional]

If a required argument is not specified on the command line, a prompt will be given for it. If you are specifying an argument on the command line then you need to put two dashes -- in front of the argument name and an equals before the value of the argument:

-—argument=value.

1.7.1.2. Referencing files in Data Lab

When you want to refer to a file in the Data Lab (also called a remote file), you need to put a 'vos://' prefix before it so that the DCLC knows it is a remote file you are referring to. If you want to be really precise, you can use the full identifier for the file (also known as the VOSpace identifier) which would be something like:

vos://datalab.noao.edu!vospace/nodes/sarah/data/table1.vot

However, you can also just use the location within your virtual storage area, in this case - 'vos://data/table1.vot' - and the DCLC will translate this into the proper form for you. Note that if you need to identify a file within someone else’s virtual storage, e.g., a data file that a collaborator is sharing with you, then you will need to use the full VOSpace identifier to refer to it.

1.7.1.3. Task reference

• addcapability - THIS COMMAND IS NOT YET AVAILABLE

This activates the specified capability in the specified directory by uploading the appropriate capability’s configuration from $VOSPACE_CAPSDIR. It takes the following additional arguments:

fmt - List of formats to accept [required]

dir - Container name [required]

cap - Capability name [required]

listcap - List available capabilities [optional]

> datalab addcapability —-dir=vos://dbs --cap=tableingester --fmt=votable,fits,csv

• broadcast - THIS COMMAND IS NOT YET AVAILABLE

This sends a SAMP message of the specified type and with the given parameters. It takes the following additional arguments:

type - SAMP message type [required]

pars - Message parameters [required]

> datalab broadcast -—type=... -—pars=...

• cp

This copies a remote file between the two specified locations. (The get and put commands are used to copy files between Data Lab virtual storage and local storage.) It takes the following additional arguments:

from - Source location in Data Lab [required]

to - Destination location in Data Lab [required]

> datalab cp -—from=vos://dbs/test.vot -—to=vos://results/test.vot

• dropdb

This drops a user MyDB table. It takes the following additional argument:

table - Table name [optional]

> datalab dropdb -—table=mydb://mydbtable

• exec - THIS COMMAND IS NOT YET AVAILABLE

This executes a remote processing job within the Data Lab. It takes the following additional arguments:

cmd - name of remote task to run [required]

args - list of key-value arguments to submit to remote task [optional]

> datalab exec -—cmd=cutout -—args=“pos=/tmp/vospace/sarah/ltg/ltg.csv, urls=/tmp/vospace/ sarah/ltg/img.vot, outdir=vos://datalab.noao.edu~vospace/sarah/ltg, nthreads=20"

• get

This retrieves the specified remote file and optionally saves it to a local file. It takes the following additional arguments:

fr - Remote Data Lab file name [required]

to - Local disk file name [optional]

> datalab get -—fr=vos://data/test.vot -—to=test.vot

• launch - THIS COMMAND IS NOT YET AVAILABLE

It takes the following additional arguments:

> datalab launch

• listcapability - THIS COMMAND IS NOT YET AVAILABLE

This lists the capabilities supported by the VOSpace service.

> datalab listcapability
The available capabilities are:
tableingester

• listdb

This lists the user MyDB tables. It takes the following additional argument:

table - Table name [optional]

> datalab listdb -—table=mydb://mydbtable

• ln

This creates a (soft) link to the specified file at the given location. It takes the following additional arguments:

fr - location in Data Lab of link from[required]

to - location linked points to [required]

> datalab ln -—fr=vos://dbs -—to=http://some/data/file

• login

This logs a user into a Data Lab session and optionally mounts their remote storage space (see mount). It takes the following additional arguments:

user - username of account in Data Lab [required]

password - password for account in Data Lab [required]

mount - mountpoint of remote Virtual Storage [optional]

> datalab login -—user=sarah -—password=herr1ng -—mount=/tmp/vospace
Welcome to the Data Lab, sarah
Initializing mount

• logout

This logs out the user from a Data Lab session and optionally unmounts their remote storage space. It takes the following additional arguments:

unmount - mountpoint of remote Virtual Storage [optional]

> datalab logout -—unmount=/tmp/vospace
Unmounting remote space
You are now logged out of the Data Lab

• ls

This lists a remote directory. It takes the following additional arguments:

name - Location in Data Lab to list [optional]

format - Format for listing (ascii|csv|raw) [optional]

> datalab ls -—name=...

• mkdir

This creates the specified directory. It takes the following additional arguments:

name - directory in Data Lab to create [required]

> datalab mkdir -—name=vos://test

• mount - THIS COMMAND IS NOT YET AVAILABLE

This mounts the specified Virtual Storage (remote storage) via FUSE to appear as a local filesystem. It takes the following additional arguments:

vospace - space to mount [required]

mount - mount point for Virtual Storage [required]

foreground - mount the filesystem as a foreground operation [optional]

cache_limit - upper limit on local diskspace to use for file caching (in MB) [optional]

cache_dir - local directory to use for file caching [optional]

readonly - mount VOSpace readonly [optional]

cache_nodes - cache dataNode Properties [optional]

allow_other - allow all users access to this mountpoint [optional]

max_flush_threads - upper limit on number of flush (upload) threads [optional]

secure_get - use HTTPS instead of HTTP [optional]

nothreads - Only run in a single thread, causes some blocking. [optional]

> datalab mount -—vospace=... -—mount=/tmp/vospace

• mv

This moves the specified remote file/directory between the two locations. It takes the following additional arguments:

from - location in Data Lab to move from [required]

to - location in Data Lab to move to [required]

> datalab mv -—from=vos://data/test.vot -—to=vos://work/test.vot

• mydb_copy

tbw

• mydb_create

tbw

• mydb_drop

tbw

• mydb_import

tbw

• mydb_insert

tbw

• mydb_list

tbw

• mydb_rename

tbw

• mydb_truncate

tbw

• profiles

This lists the available Query Manager and Storage Manager profiles.

tbw

• put

This uploads a local file to the remote storage space. It takes the following additional arguments:

fr - Local disk file name [required]

to - Remote Data Lab file name [required]

> datalab put -—fr=/home/sarah/simulations/run5.txt -—to=vos://dbs/simul1.dat

• qresults

Returns the async query results. It takes the following additional arguments:

jobId - Query Job ID [required]

> datalab qresults -—jobId=...

• qstatus

Returns the async query job status. It takes the following additional arguments:

jobId - Query Job ID [required]

> datalab qstatus -—jobId=...

• query

This runs a query against either the db directly (synchronous) or via the TAP service (asynchronous). It takes the following additional arguments:

adql - ADQL statement [optional]

sql - input SQL filename [optional]

fmt - requested output format [optional] - ‘csv’, ‘ascii’, ‘votable’

out - output filename [required]

async - asynchronous query [optional]

profile - Service profile to use [optional]

timeout - Requested query timeout [optional]

Note that tables within your MyDB need to be identified with a mydb:// prefix in either the query or the output argument.

> datalab query -—adql="select id, ra_j2000, dec_j2000, g, g_i, i_z from lsdr2.stars" -—fmt="csv" -—out=“vos://lsdr2.csv"
> datalab query -—adql=“select id, ra_j2000, dec_j2000, g, g_i, i_z from lsdr2.stars" -—out=“mydb://table1"
> datalab query -—adql=“select id, g_i, i_z from lsdr2.stars l, mydb://table1 m where l.id = m.id"  -—fmt=“csv"
> datalab query -—sql=complex.sql -—async=true -—fmt=‘csv'

• rm

This deletes a remote file. It takes the following additional arguments:

name - file in Data Lab to delete [required]

> datalab rm -—name=vos://dbs/test.vot

• rmdir

This deletes a remote directory. It takes the following additional arguments:

name - directory in Data Lab to delete [required]

> datalab rmdir -—name=vos://dbs

• schema

Prints data service schema info:

val - Value to list ([[<schema>][.<table>][.<col>]]) [optional]

format - Output format (csv|text|json) [optional]

profile - Service profile [optional]

> datalab schema

Schema Name   Description
-----------   -----------
   gaia_dr1   GAIA Data Release 1
       ivoa   IVOA ObsCore tables
   des_sva1   DES SVA1 Data Products
       usno   USNO Astrometry Catalogs
     ls_dr3   The DECam Legacy Survey Data Release 3

• services

tbw

• siaquery - THIS COMMAND IS NOT YET AVAILABLE

This runs a query against a SIA service in the Data Lab. It takes the following additional arguments:

out - Output filename [optional]

input - Input filename [optional]

search - Search radius [optional]

> datalab siaquery -—search=SearchString -—input="infile" -—out=“vos://lsdr2.csv"
> datalab siaquery -—search=SearchString -—out=“mydb://table1"
> datalab siaquery -—search=SearchString

• status

This shows the status of the current user: whether they are logged in or not, the list of current jobs/queries

> datalab status
User sarah is logged into the Data Lab

• svc_urls

tbw

• tag

This tags a remote file with a user-specified label. It takes the following additional arguments:

file - file in Data Lab to tag [required]

tag - tag to add to file [required]

> datalab tag -—file=vos://dbs/votable.vot -—tag="A crucial data file"

• version

tbw

• whoami

This shows the current active user:

> datalab whoami
sarah