Hello world on Google Cloud

Hello World! This time I decided to play with Google Cloud and Scala, run a Hello World application on the clould and write about it! I hope you find it useful and can use it as a starting point to play more with Google Cloud!

What is Google Cloud Platform?

Google Cloud Platform (GCP) consists of a set of physical assets, such as computers and hard disk drives, and virtual resources, such as virtual machines (VMs), that are contained in Google’s data centers around the globe. Each data center location is in a global region. Regions include Central US, Western Europe, and East Asia. Each region is a collection of zones, which are isolated from each other within the region. Each zone is identified by a name that combines a letter identifier with the name of the region. For example, zone a in the East Asia region is named asia-east1-a.

This distribution of resources provides several benefits, including redundancy in case of failure and reduced latency by locating resources closer to clients.

In cloud computing, what you might be used to thinking of as software and hardware products, become services. These services provide access to the underlying resources. When you develop your website or application on Cloud Platform, you mix and match these services into combinations that provide the infrastructure you need, and then add your code to enable the scenarios you want to build.

How do I interact with GCP services?

You can do this in 3 ways:

  1. Google Cloud Platform Console - The Google Cloud Platform Console provides a web-based, graphical user interface that you can use to manage your Cloud Platform projects and resources. When you use the Cloud Platform Console, you create a new project, or choose an existing project, and use the resources that you create in the context of that project.

  2. Command-line interface - The Google Cloud SDK provides the gcloud command-line tool, which gives you access to the commands you need. The gcloud tool can be used to manage both your development workflow and your Cloud Platform resources.

  3. Client libraries - The Cloud SDK includes client libraries that enable you to easily create and manage resources.

Now, lets define some key GCP concepts that we will use in our example!

Projects

Any Cloud Platform resources that you allocate and use must belong to a project. You can think of a project as the organizing entity for what you’re building.

A project is made up of the settings, permissions, and other metadata that describe your applications. Resources within a single project can work together easily, for example by communicating through an internal network, subject to the regions-and-zones rules.

Each Cloud Platform project has:

  • A project name, which you provide.
  • A project ID, which you can provide or Cloud Platform can provide for you.
  • A project number, which Cloud Platform provides.

Each project ID is unique across Cloud Platform. Once you have created a project, you can delete the project but its ID can never be used again.

Buckets

Buckets are the basic containers that hold your data. Everything that you store in Google Cloud Storage must be contained in a bucket. You can use buckets to organize your data and control access to your data, but unlike directories and folders, you cannot nest buckets.

When you create a bucket, you can specify a name, default storage class, and geographic location for the bucket.

Google Cloud Dataproc

Cloud Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Cloud Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don’t need them.

Hello World with Scala

We will follow the tutorial provided by Google. Code used in this example is located here.

Setup!

  1. Set up a GCloud project
  2. Create a Cloud Storage bucket
  3. Create a Cloud Dataproc cluster

Lets write our Hello World Scala Code!

  1. Create a HelloWorld Scala Object - example here
  2. Add a build.sbt file into your project like this one.
  3. Package your project
    • Using sbt - Run sbt package on your main project folder.
    • This should generate a jar file in target/scala-2.10 directory

Lets run HelloWorld on GCP!

  1. Copy the jar generated previously to Cloud Storage (we’ll store it in the bucket we created)

    gsutil cp [path-to]/playing-with-gcloud-and-scala_2.10-0.1-SNAPSHOT.jar gs://<bucket_name>/

  2. Submit your job to the Cloud Data Proc Cluster that we created previously and into the project we already created.

    • Go to https://console.cloud.google.com/dataproc/jobs?project=<ProjectName>
    • Click on Submit Job
      • Fill up form:
        • Cluster: Select your cluster’s name from the cluster list
        • Job type: Spark
        • Main class or jar: gs://<your-bucket-name>/playing-with-gcloud-and-scala_2.10-0.1-SNAPSHOT.jar
    • Click on Submit
  3. Go to the jobs page.

  4. Click on the Job ID of the latest run job. Here you can view the job’s driver output

    • You should see something like the following:
         Hello, world!
         Job output is complete
      

Final Steps

  • Using the command line interface
    1. Shutdown your cluster

      gcloud dataproc clusters delete <cluster-name>

    2. Delete the Cloud Storage jar file

      gsutil rm gs://<bucket-name>/playing-with-gcloud-and-scala_2.10-0.1-SNAPSHOT.jar

What’s next!

Well, you can keep having all sorts of fun running jobs on GCP! You could continue with Write and run Spark Scala code using the cluster’s spark-shell REPL from here

Useful commands

  • List your projects:

gcloud projects list

  • List clusters in current project:

gcloud dataproc clusters list

  • To see all your clusters go here

Reference