Enrichment (Join )

deshbandhu Mishra
2 min readDec 20, 2020

ETL, T- Transformation, Data Transformation with Scala. This is the first project/article in the series.

Table of Contents

  • Requirements
  • Introduction
  • Code
  • Youtube Link
  • Github Link

Requirements

Technologies used: Scala, sbt, Intelij

Link to download dataset: http://www.stm.info/en/about/developers

Following things in Scala:

1)Scala collections.

2) classes, traits, objects, and case classes.

3) T of ETL i.e. Transformation = data enrichment (join).

Introduction:

Enrich trip with route based on route_id and then enrich the result with Calendar based on service_id. In the end, we’ll write the final result in a CSV file with a proper header.

project Setup:

  1. Download suitable IntelliJ IDE for your system and then create a new project and name it EnrichmentDemo. select default scala and sbt version.
  2. Write case classes for all your text files. Here, we have three text files: trip.txt, route.txt, and calender.txt. So, I have written three case classes to store data for each of them.
  3. We have to write case classes for intermediate data also like: triproute and enrichedtrip. So, let us write case classes for these two also.

Development/Business logic:

  1. The first thing we’ll do is to read all three text files and store them in our previously written case classes.
  2. Now, if we see our business logic of the project then our next task is to take route_id of the trips and find/search the row of the same route_id in routes. Will repeat this for all available route_id of the trips.
  3. So, we need to write a search function, Which will search all the similar rows of routes based on the given route_id of the trips. Let’s write this function!

Code

Part-2

Step 1: Download the dataset and create a case class for each file.

Step 2: Write the search functions:

Step 3: Let's code to read the text files: route.text, calender.txt, and calender.txt

--

--