Using Eureka App VM

CHANGES TO NOTE IN EUREKA V4

Connecting to Eureka App VM

Each user has been provided a unique URL to connect to your Eureka App VM. Here is how to connect to Eureka App VM:

If you abandon a session (this can happen by closing the web browser instead of disconnecting from Eureka) your session will remain active for 30 minutes before automatically disconnecting. To connect to your abandoned session, simply log back in to the custom App VM URL.

Disconnecting from Eureka App VM

When you are done working in your Eureka App VM it is best to manually log out of Guacamole. By manually logging out, you free up the license for another user on your team (each Eureka instance comes with 4 user licenses). 

Manually logging out of Guacamole (without shutting down the VM):

Manually shutting down your VM:

Frequently Asked Questions About Connecting to Eureka

Additional Eureka App VM Features

Accessing Compass Data Marts in Eureka

If you have been authorized access to a Health Data Compass data mart in BigQuery, you can safely view it from your Eureka App VM. You can also download it to your Eureka App VM for further analysis on your Eureka App VM.

Via the Web User Interface

You can interact with BigQuery through the BigQuery user interface via your Eureka App VM. Simply open a web browser to https://console.cloud.google.com/bigquery. The user interface should be fairly self-explanatory. Some helpful documentation here: https://cloud.google.com/bigquery/docs/bigquery-web-ui

Important: The BigQuery Web UI at the above links will allow you to access Compass data marts in BigQuery from your local workstation. This access is not approved and will fire an alert with our security monitoring team. Only access Compass data marts in BigQuery from your Eureka App VM.

Via the Command Line

You can access BigQuery datasets using the “bq” command line utility. This is a powerful utility, and full documentation can be found here: https://cloud.google.com/bigquery/docs/bq-command-line-tool. Below are a few simple examples for common uses: 

Examples using Command Line to access data

Examples: Exploring Data 

See what datasets you can access in a project: 

bq --project_id [project-name] ls

See what tables are in a dataset:

bq --dataset_id [project-name]:[dataset-name] ls

Show the schema of a table:

bq show [project-name]:[dataset-name].[table-name] 

Show the first few rows of a table:

bq head [project-name]:[dataset-name].[table-name] 

Examples: Querying Data 

*Note 1: In this and all SELECT examples, if  the name of the project that contains the data you are querying has a hyphen in it, you may need to surround any table identifiers with backticks, as follows: `[project-name]:[dataset-name].[table-name]`

*Note 2: In the examples below, [PROJECT]:[DATASET] refers to the project and dataset that contains the data you wish to query, not necessarily your own Eureka project.

Execute a SELECT query from the command line and view the results:

 bq query --use_legacy_sql=false  “select (*) from [PROJECT]:[DATASET].[TABLE]”

Execute a SELECT query from a query that’s stored in a file (for more complex queries) and view the results:

 cat [LOCAL-SQL-FILENAME] | bq query --use_legacy_sql=false

Examples: Downloading Data 

*Note 1: In the examples below, [PROJECT]:[DATASET] refers to the project and dataset that contains the data you wish to query, not necessarily your own Eureka project.

*Note 2: The “bq query” command will return a maximum of 16,000 rows. For larger datasets, see the example for “bq extract”

Output the results of a SELECT command to a CSV file:

bq query --use_legacy_sql=false --format=csv  "select (*) from [PROJECT]:[DATASET].[TABLE]" > result.csv

Export a table to a file in your Google Cloud Storage Staging Bucket:

bq extract --destination_format CSV --field_delimiter “,”  [PROJECT]:[DATASET].[TABLE] gs://[EUREKA-PROJECT]-staging/[FILENAME]

Copy a file from your Google Cloud Storage Staging Bucket to your App VM:

gsutil cp gs://[EUREKA-PROJECT]-staging/[FILENAME] [FILENAME]

Export the results of a large query (>16,000 resulting rows) to your BigQuery Staging Dataset:

     1  . Query the data and store the results in a new BigQuery table.

bq query --use_legacy_sql=false –destination_table [EUREKA-PROJECT-ID]:staging.[TABLE] "select (*) from [PROJECT]:[DATASET].[TABLE]"

      2. Use the instructions above to export the new table to a file in Google Cloud Storage.

      3. Use the instructions above to copy the file from your Google Cloud Storage Staging Bucket to your App VM. 

Moving Files In & Out of Eureka App VM

Eureka is designed first and foremost to protect sensitive data files. One important aspect of this is that you cannot access the Internet directly from your app server in order to upload or download files via the usual mechanisms such as FTP, email, or web sites. Instead, you will use a specially configured location on Google Cloud Storage, called your Eureka Staging Bucket. Your Eureka Staging Bucket can be used to transfer files between your local workstation and your Eureka App VM. 

This is a two-step process: 

There are three options for doing this: Google Cloud Console, gsutil, and GCSFuse. See below for how to use each of these options.

Important: 

 - You have the ability to use your Eureka Staging Bucket to download data to your local workstation. 

 - Just because you can do this does not mean that you should. 

 - Sensitive data such as PHI may only be downloaded to workstations or servers that comply with your institutional HIPAA policies. 

Please contact us if you have any questions.

Using the Google Cloud Console

The Google Cloud Console provides a point-and-click graphical user interface to your staging bucket. This is a good option for ad-hoc transfer of a few small files at a time. 

For scripted transfers, transfers of very large files or many files at once, use one of the other options.

Using the gsutil Command-Line Interface

The gsutil command-line interface is extremely useful for transferring large files, large groups of files, or for scripting file transfer. 

Configuring Your Credentials

gsutil is already installed on your Eureka App VM. To install gsutil on your local workstation from which you connect to your Eureka App VMs instructions are found here for Mac and here for Windows. You will need to configure your Google credentials on your Eureka App VM if you have not already done so. Run the following command from your Eureka App VM and follow the prompts. It may provide you with a long URL which you should paste into a web browser and authenticate using your Eureka credentials.

gcloud auth login 

Transferring Files

The basic syntax for transferring a file using gsutil is as follows:

gsutil cp [source] [destination]

Local files are specified following usual syntax, for example ~/myfile.txt. Your bucket will be specified as gs://[projectid-staging].

Examples, assuming a project id of hdcekaxmp:

gsutil cp myfile.txt gs://hdcekaxmp-staging

gsutil cp gs://hdcekaxmp-staging/myfile.txt. 

gsutil cp gs://hdcekaxmp1-staging/obj
gs://hdcekaxmp2-staging/obj2

More Examples

Using GCSFuse

GCSFuse allows you to mount your staging bucket as a folder within a Linux or MacOS filesystem. (This feature is not available on Windows systems.) You can use GCSFuse to mount your staging bucket on your Eureka App VM, your local workstation, or both. 

Setting Up GCSFuse on Your Eureka App VM

GCSFuse is already installed on your Eureka App VM -- you only need to configure it. 

Advanced users may wish to explore modifying fstab to mount their staging bucket by default at startup, thereby skipping Step 3. See the GCSFuse documentation for details.

Setting Up GCSFuse on Your Local Workstation

Configuring GCSFuse on your local workstation is nontrivial, but can be very useful. By mounting both your Eureka App VM and your local workstation, you can seamlessly move files between systems without making calls to gsutil. See the following links for more information:

Frequently Asked Questions: Moving data in/out of Eureka App VM

Preinstalled Applications

Each Eureka App-VM is preinstalled with the following default suite of analytical tools and applications:



Tip: To discover which libraries are preinstalled on your Eureka App-VM run the following in the Eureka terminal:

rpm -qa | grep devel

Updating Preinstalled Applications

Users may choose to update preinstalled applications on their Eureka App-VMs. Below you will find guidance on updating commonly used applications using the Eureka terminal. 

Run the following once

sudo yum -y install /srv/repos/eureka/7/v2/files/rstudio-2021.09.1-372-x86_64.rpm

Run the following once

sudo yum -y install devtoolset-10

Run the following each time before using the tool(s)

source /opt/rh/devtoolset-10/enable

Run the following once

sudo yum install rh-git218-git-gui rh-git218-gitk

Frequently Asked Questions: Using R

Installing R packages from other Sources

Installing R Packages from other sources is possible. Take special care that you only download and install packages from trusted sources.

For .Zip files do the following:

install.packages('~/mypackages/the-package.zip', repos=NULL)

*Note: If you get an error stating "embedded nul in string", then the .zip file is probably suffering from the same incompatibility as described in Installing Packages from GitHub. Follow those instructions to unzip the repository and install it from its unzipped subfolder. 

Stringi R Package:

This popular R package does not reside in the CRAN repository.  To install from R, run the following command:

install.packages("stringi", configure.vars="ICUDT_DIR=/srv/repos/eureka/7/v2/files")

Installing missing Linux Libraries

Some R packages depend on Linux operating system libraries that may not be installed on your Eureka virtual machine by default. If install.packages returns errors about missing libraries, you can install these from the CentOS mirror maintained by Health Data Compass. 

Do the following:

sudo yum install curl

Installing R packages from CRAN

Access to CRAN is available through the Limited Internet Access feature. Follow the Limited Internet Access steps below to get connected to CRAN. Once CRAN is accessible you can install packages from CRAN using install.packages() in the usual way. 

Installing R packages from GitHub

Many R packages are hosted on GitHub. With Limited Internet Access from Eureka, you can reach GitHub this way. For older versions of Eureka that do not have access to GitHub, the usual install_github() command in R will return an error. In addition, there is an incompatibility between GitHub and R in the way that .zip files are handled, which requires some additional steps.

Do the following:

unzip github-repo.zip

install.packages('~/mypackages/github-repo-master', repos=NULL)

Installing R packages from other Sources

Installing R Packages from other sources is possible. Take special care that you only download and install packages from trusted sources.

For .Zip files do the following:

install.packages('~/mypackages/the-package.zip', repos=NULL)

*Note: If you get an error stating "embedded nul in string", then the .zip file is probably suffering from the same incompatibility as described in Installing Packages from GitHub. Follow those instructions to unzip the repository and install it from its unzipped subfolder. 

Stringi R Package:

This popular R package requires ICU4C code that cannot be compiled on the present Eureka OS. To install the OS-compatible version of stringi (2020 version) please execute the following code:

install.packages("https://cloud.r-project.org/src/contrib/Archive/stringi/stringi_1.5.3.tar.gz", repos=NULL, type="source", configure.vars="ICUDT_DIR=/srv/repos/eureka/7/v2/files")

Installing missing Linux Libraries

Some R packages depend on Linux operating system libraries that may not be installed on your Eureka virtual machine by default. If install.packages returns errors about missing libraries, you can install these from the CentOS mirror maintained by Health Data Compass. 

Do the following:

sudo yum install curl

Manually Installing Dependencies

Many R packages are dependent on other R packages. Dependencies in CRAN will be resolved and installed automatically through Health Data Compass's CRAN mirror. 

Unfortunately, dependencies hosted in other locations, such as GitHub, will need to be manually installed. You can simply attempt to install the base package using install.packages(), wait for an error complaining of a missing package, install the missing package, attempt to install the base package again, and repeat until all dependencies are found. But if the base package has many dependencies, it may be more efficient to view the DESCRIPTION file found within the base package .zip file. Look for the Imports and Suggests tags, which will list any required and suggested dependencies, respectively. You can then proactively install each of these dependent packages one at a time, using the instructions above, and then install the base package when all dependencies are in place.

Limited Internet Access from Eureka App VM

Eureka App VM v3 has the ability to connect to the following URLs from within Eureka via the Eureka Limited Internet App. Google Chrome is the only optimized browser to use in Eureka App VM with the limited internet access functionality.

The first time you use the Eureka Limited Internet App you will need to run the following command from your Eureka App VM and follow the prompts. It may provide you with a long URL which you should paste into a web browser in Eureka App VM and authenticate using your Eureka credentials.

gcloud auth login 

There are two ways to interact with the Eureka Limited Internet App. 

Option #1: Locate the Eureka Limited Internet App in the applications directory and select the website to which you wish to connect.

Option #2: Open a terminal window and use one of the eureka-internet commands listed below.

After you select the site from the Eureka Limited Internet App, go to a Chrome browser within the Eureka App VM and type in the URL. This usually takes around 5 seconds, but can take up to 15. Access to the site is limited to 30 minutes, if you need the connection open for longer, re-select the site from the Eureka Limited Internet App and that will add another 30 minutes of connection. Below are the options for internet connectivity:


Limited Internet App Button: CRAN & Bioconductor

Console Command: eureka-internet-CRAN-Bioconductor

Sites:

https://cloud.r-project.org

https://www.bioconductor.org


Limited Internet App Button: Github.com

Console Command: eureka-internet-GitHub.com

Sites:

https://github.com

https://raw.githubusercontent.com


Limited Internet App Button: PyPi.org & Python.org

Console Command: eureka-internet-Python.org

Sites:

https://pypi.org

https://www.python.org


Limited Internet App Button: REDCap

Console Command: eureka-internet-RedCap

Sites:

https://redcap.ucdenver.edu


NOTE: Some R Packages require access to GitHub at the same time to CRAN so make sure you select both sites from the Eureka Limited Internet App to ensure complete installation of those packages.

When you are done with your session and no longer need to use the Eureka Limited Internet App, you can logout of GCloud by running the following command from your Eureka App VM:

gcloud auth revoke

Internet Security & Eureka App VM

Security is a group effort between you and Compass. We cannot do it without you. Please be sure to follow all rules in the Eureka User Agreement.

Some common problems with software downloaded from the internet include:

You must ensure that you have carefully reviewed software from any source for these problems, but be particularly careful with container hubs (such as Docker Hub) and software from GitHub that is not widely used. Due to the difficulty of determining the trustworthiness of software on container hubs, we discourage their use. You are responsible for vetting software you upload to Eureka.

You must not store confidential information on sites outside Eureka, unless you have received specific permission. You must never store confidential information on GitHub.

Frequently Asked Questions: Limited Internet Access

Using Python with Eureka App VM

Compass highly recommends using python through pycharm and using personal virtual environment in python, using these steps:

Google Cloud Source Repository

Each Eureka App VM instance has Google Cloud Source Repository set up and enabled for sharing code files between multiple users on a shared Eureka instance.

Note that sensitive data like PHI should never be included in code files. This includes those shared on other code sharing platforms like GitHub.

Idle Shutdown of Eureka App VM

Each Eureka instance is pre-configured to shut down the VM after 30 minutes of undetected usage of the VM. If you want to temporarily disable the idle shut down, run the following command from your VM terminal window:

If you disable the idle shut down, you are responsible for manually shutting down the VM if you are not longer using it.

The pre-configured idle shutdown will be re-enabled anytime the VM is rebooted, until then you will need to manually shut down the VM.