Assignment

Database Query Using JSON

This project is used to query the database using json and duckdb.

Project Structure

complete_database_project/
│
|-- firebase_database/
│
|-- localhost_database/
|
|-- localhost_docker/

Project Structure Explanation

localhost_database

The localhost_database contains the files which performs the following operations

data_converter.py is used to read the csv file and create a database using sqlite3
main.py is a file which leverages the fastapi to build a server which runs in the local machine.This file loads the database into duckdb and listens at the localhost default port 8080.This file can receive json request at the http://127.0.0.1:8080/query location and it performs query on the database based on the json input and responds with json output
example.json contains the json schema for the project.
vision.py is used to send request to the server created at the localhost it reads the example.json file and sends it to the local server and prints the output
requirements.txt contains all the python modules used in the project
datasets folder contains the csv files for the project.data_converter.py uses the csv files in this folder to create the database

Setting Up The Project

Prerequisites

You need a python environment with all the modules in the requirements.txt file installed in your system to run the project.The project is developed in linux operating system.Below are the steps to install the project in linux

Installation

Create a python environment using the command python3 -m venv venv
Activate the python virtual environment with the command source venv/bin/activate
Install all the required python modules using the command pip install -r requirements.txt

Usage

To start using the project open the terminal and run python3 data_converter.py
Then start the restapi using the command uvicorn main:app --reload which listens at the local ip address http://127.0.0.1:8080
Then run vision.py using the command python3 vision.py to send json request and to receive the response

Built With

python
JSON
Fastapi
duckdb
sqlite3
pandas
requests

firebase_database

The firebase_database folder contains the files and folder whose uses are defined below

firebase_database/
├── app/
│ ├── init.py
│ ├── example.json
│ ├── main.py
│ └── reastapi-firebase-adminsdk-rby00-97aeff0c91.json
├── datasets/
├── Dockerfile
├── firestore_write.py
└── requirements.txt

example.json contains the json schema for the project
main.py creates api using fastapi
reastapi-firebase-adminsdk-rby00-97aeff0c91.json is a file for firebase authentication.You can add your security key in this file to access your service accont.
datasets folder contains .csv files which is used to create database in the cloud
Dockerfile contains the code to dockerize the project so that you can host it on cloudplatform easily
requirements.txt contains all the python modules used in the project
firestore_write.py reads the csv files in the datasets folder and adds the data to the firebase database

Setting Up The Project

Prerequisites

You need a python environment with all the modules in the requirements.txt file installed in your system to run the project.The project is developed in linux operating system.Below are the steps to install the project in linux

Installation

Create a python environment using the command python3 -m venv venv
Activate the python virtual environment with the command source venv/bin/activate
Install all the required python modules using the command pip install -r requirements.txt

Usage

To start using the project you need to run python3 firestore_write.py in the terminal which writes data to the firestore database
Then run docker build -t myimage . to create the container
Then run docker run -d --name mycontainer -p 8080:80 myimage to run the container in your local machine

you can deploy the container on any cloudplatform which supports docker

Built With

python
JSON
Fastapi
duckdb
sqlite3
pandas
requests
firebase-admin
Docker

localhost_docker

The localhost_docker is similar to localhost_database but this database can be dockerized using the dockerfile in the folder

localhost_docker/
├── app/
│ ├── data_converter.py
│ ├── main.py
├── datasets/
├── example.json
├── vision.py
└── requirements.txt

1.data_converter.py creates a sqlite3 database in the docker container

2.main.py creates api using restapi

Setting Up The Project

Prerequisites

This project is build in linux operating system,docker should be installed in your machine
python is optional if you want to send requests to the docker container

Installation

Navigate to localhost_docker folder
Now build the docker image by running docker build -t image_name . in the terminal
Now run the docker container by running docker run -d --name uh -p 80:80 -v /path/to/local/datasets:/datasets image_name in the terminal

--name uh - This sets the name of the container to "uh." You can choose any name you like as long as it's not already taken by another container.

-p 80:80 - This maps port 80 on the container to port 80 on the host machine. This means that if you have a web server running in the container on port 80, you can access it by going to http://localhost in your web browser.

-v /path/to/local/datasets:/datasets flag mounts the host's directory "/path/to/local/datasets" to the container's "/datasets" directory, allowing the container to access data from the host.

image_name is the name of the Docker image to be used to create the container.

Usages

1.You can send request to the container using vision.py

Built With

python
JSON
Fastapi
duckdb
sqlite3
pandas
requests
Docker

Houseware

Why participate in an Octernship with Houseware

Houseware is changing the way the data warehouse is leveraged, and we want you to help build Houseware! Our team came together to answer the singular question, "how can we flip the value of the data warehouse to the ones who really need it, to the ones who drive decisions".

In this role, you'll have the opportunity to work as a Data engineer with the Houseware team on multiple customer-facing projects. You'd be involved with delivering the data platform for the end user, while taking complete ownership of engineering challenges - this would include communicating with the stakeholders, setting the right expectations, and ensuring top quality for the code & the product being shipped.

Octernship role description

We're looking for data engineers to join the Houseware team.

We are hell-bent on building a forward-looking product, something that constantly pushes us to think by first principles and question assumptions, building a team that is agile in adapting and ever curious. While fast-paced execution is one of the prerequisites in this role, equally important is the ability to pause and take stock of where product/engineering is heading from a long-term perspective. Your initiative is another thing that we would expect to shine through here, as you continuously navigate through ambiguous waters while working with vigor on open-ended questions - all to solve problems for and empathize with the end users.

Octernship info	Timelines and Stipend
Assignment Deadline	26 March 2023
Octernship Duration	3-6 Months
Monthly Stipend	$600 USD

Recommended qualifications

You have a solid problem-solving framework.
You are well-versed with the Modern Data Stack, and have worked with Cloud Data Warehouses before
You have prior experience writing backend systems, and are proficient in SQL/dbt.

Eligibility

To participate, you must be:

A verified student on Global Campus
18 years or older
Active contributor on GitHub (monthly)

Assignment

Segment users on DuckMart!

Task instructions

You have been given a task to segment the user audience for a fictional online service called "DuckMart". You have to design and implement a backend service that allows for segmenting the user audience based on user attributes and user events.

As part of this activity, you'll have to do the following

Dummy data generation: Create dummy data using tools like Mockaroo
Data transformation: Write a Python script to transform the data from the CSV files into a format suitable for loading into the database.
Data loading: You are required to load the transformed data into DuckDB

Database Schema: The following are the requirements for the database schema:

User Attributes: User ID, Name, Age, Gender, Location, Signup Date, Subscription Plan, Device Type.
User Events: User ID, Event Name, Timestamp.

A few examples of events are "PURCHASE_MADE" or "ADDED_TO_CART".

Query Requirements: The following are the requirements for the queries:

Segment users by age groups: Create a segment of users in the age range 25-34 years and list out the user IDs of all such users.
Segment users by location and events: Create a segment of users whose location="California" and have logged in to the product at least once(event_name='LOGIN') and list out the User IDs of all such users.

You are then required to write out a backend API endpoint that can scale to any kind of "Segmentation usecase" like the two examples mentioned above. Building on top of the mentioned data schema(Users, Events), the consumer of this API should be able to specify the segmentation criteria in a JSON-like format and the backend API should be able to convert it into the relevant SQL. Please specify what the spec for the JSON-like payload looks like.

Task Expectations

You will be evaluated based on the following criteria:

Correctness and completeness of the implementation.
The JSON spec that powers the "Segmentation API"
Performance and scalability of the implementation.
Quality of the SQL queries and their optimization.
Quality of the code and documentation.
Ability to explain and justify design decisions.

Task submission

Students are expected to use the GitHub Flow when working on their project.

Please push your final code changes to your main branch
Please add your instructions/assumptions/any other remarks in the beginning of the Readme file and the reviewers will take a look
The PR created called Feedback will be used for sharing any feedback/asking questions by the reviewers, please make sure you do not close the Feedback PR.
The assignment will be automatically submitted on the "Assignment Deadline" date -- you don't need to do anything apart from what is mentioned above.
Using GitHub Issues to ask any relevant questions regarding the project

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
complete_database_project		complete_database_project
README.md		README.md

kanugurajesh/Duckdb-Query

Folders and files

Latest commit

History

complete_database_project

complete_database_project

README.md

README.md

Repository files navigation

Database Query Using JSON

Project Structure

Project Structure Explanation

localhost_database

Setting Up The Project

Prerequisites

Installation

Usage

Built With

firebase_database

Setting Up The Project

Prerequisites

Installation

Usage

Built With

localhost_docker

Setting Up The Project

Prerequisites

Installation

--name uh - This sets the name of the container to "uh." You can choose any name you like as long as it's not already taken by another container.

-p 80:80 - This maps port 80 on the container to port 80 on the host machine. This means that if you have a web server running in the container on port 80, you can access it by going to http://localhost in your web browser.

-v /path/to/local/datasets:/datasets flag mounts the host's directory "/path/to/local/datasets" to the container's "/datasets" directory, allowing the container to access data from the host.

image_name is the name of the Docker image to be used to create the container.

Usages

Built With

Houseware

Company information

Why participate in an Octernship with Houseware

Octernship role description

Recommended qualifications

Eligibility

Assignment

Segment users on DuckMart!

Task instructions

Task Expectations

Task submission

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages