Enrichment (Join )
ETL, T- Transformation, Data Transformation with Scala. This is the first project/article in the series.
Table of Contents
- Requirements
- Introduction
- Code
- Youtube Link
- Github Link
Requirements
Technologies used: Scala, sbt, Intelij
Link to download dataset: http://www.stm.info/en/about/developers
Following things in Scala:
1)Scala collections.
2) classes, traits, objects, and case classes.
3) T of ETL i.e. Transformation = data enrichment (join).
Introduction:
Enrich trip with route based on route_id and then enrich the result with Calendar based on service_id. In the end, we’ll write the final result in a CSV file with a proper header.
project Setup:
- Download suitable IntelliJ IDE for your system and then create a new project and name it EnrichmentDemo. select default scala and sbt version.
- Write case classes for all your text files. Here, we have three text files: trip.txt, route.txt, and calender.txt. So, I have written three case classes to store data for each of them.
- We have to write case classes for intermediate data also like: triproute and enrichedtrip. So, let us write case classes for these two also.
Development/Business logic:
- The first thing we’ll do is to read all three text files and store them in our previously written case classes.
- Now, if we see our business logic of the project then our next task is to take route_id of the trips and find/search the row of the same route_id in routes. Will repeat this for all available route_id of the trips.
- So, we need to write a search function, Which will search all the similar rows of routes based on the given route_id of the trips. Let’s write this function!
Code
Part-2
Step 1: Download the dataset and create a case class for each file.
Step 2: Write the search functions:
Step 3: Let's code to read the text files: route.text, calender.txt, and calender.txt