Skip to content

This Project was created for the Hackathon Hack-O-Hire conducted by Barclay's India.

Notifications You must be signed in to change notification settings

parthpetkar/Barcleys_Hack_O_Hire_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Barclays Hack-O-Hire Project

Anomaly Detection

🎨 Table of Contents

Contributors

Company Logo

Barclay's Logg

Description

This Project Was created for the Hackathon held by Barclay's called Hack-O-Hire. The Problem Statement was to develop an Anomaly Detection Framework that will help identify the potential issues and irregularities in the data when compared with the regular submissions.

Technology Stack

  • ETL - Apache Spark
  • Python
  • Mongo DB
  • Open-source Encryption/decryption algorithm
  • ML -Isolation Forest
  • Tableau
  • Apache Airflow

Initial Idea

Abstract:

In the realm of financial transactions, ensuring accuracy and reliability is paramount. However, trade data's vast volume and complexity pose significant challenges in detecting anomalies, which could result in erroneous calculations and payments. This project addresses this critical issue by proposing an Anomaly Detection Framework designed to identify irregularities and potential issues in trade data submissions. Leveraging advanced technologies such as Apache Spark, Airflow, and Tableau, combined with robust data engineering practices and machine learning algorithms, our solution aims to enhance accuracy, reduce manual effort, and foster self-learning across banking functions.

Aim:

This project aims to develop an Anomaly Detection Framework capable of efficiently identifying irregularities and potential issues within trade data submissions. By integrating cutting-edge technologies and best practices in data engineering and machine learning, our solution seeks to enhance the accuracy of payments, reduce manual effort, and promote self-learning across banking functions.

Our Solution:

Our solution revolves around a robust system architecture designed to handle the challenges associated with detecting anomalies in large and heterogeneous datasets. Data is retrieved from diverse sources, undergoes extraction, transformation, and loading (ETL) processes, and is stored in a suitable database system. Data Sources like Yahoo Finance, Upstox, Tradefeed, etc. A scheduling pipeline will help the workflow optimize the time taken for the process to occur. The pipeline will divide the data into multiple segments, each segment in the form queue will enter the pipeline where if the second segment in the queue is in etl process the other segment is in the loading process, and so on. Apache Spark forms the backbone of our system, enabling distributed processing of data in parallel to ensure scalability and performance. Pre-processing steps, including feature engineering, enhance the accuracy of anomaly detection. For anomaly detection, we utilize the Adtk library, which leverages historical trends in stock prices to identify abnormal patterns such as outlier data points, spike levels, and volatility shifts. The results of anomaly detection are visualized using Tableau, providing users with intuitive insights to investigate and address anomalies promptly. Airflow serves as a workflow management system, automating various stages of the data pipeline to ensure reliability and efficiency.

Conclusion:

In conclusion, our Anomaly Detection Framework offers a robust and scalable solution to the challenges of identifying irregularities in trade data submissions. By leveraging advanced technologies and best practices in data engineering and machine learning, we empower organizations to enhance accuracy, reduce manual effort, and foster self-learning across banking functions. Our solution aims to safeguard data integrity and promote operational excellence in the financial domain through continuous innovation and refinement.

Inital PPT

Final Presentation

Badges

License GitHub issues GitHub stars GitHub forks

Installation

To install and run this project locally, follow these steps:

  1. Clone the repository:

    git clone https://github.com/parthpetkar/Barcleys_Hack_O_Hire_Project.git
    
  2. Navigate to the project directory:

    cd Barcleys_Hack_O_Hire_Project
    
  3. Install dependencies: Create a Virtual Environment

    python -m venv Barcleys_Hack_O_Hire_Project 
    
  4. Create Mongo DB Collections

    • DB name - Hackathon
      • Collection_1 - Live-Stock-Data
      • Collection_2 - Stock-Data-Final
      • Collection_3 - Anomalies
  5. Set up Docker Images:

    Docker build -t etlimage
    Docker build -t mlimage
    
  6. Run Docker Compose:

    Docker-compose up -d 
    
    
    

How to Use the Project

To use the project, follow these steps:

  1. Launch the application.
  2. Create a new invoice by filling in the required details.
  3. Save or print the generated invoice.

🛠️ Contribution guidelines for this project

We welcome contributions from the community! To contribute to this project, please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/contribution).
  3. Make your changes and commit them (git commit -am 'Add new feature').
  4. Push to the branch (git push origin feature/contribution).
  5. Create a new Pull Request.

License

This project is licensed under the MIT License.

Security

🔒 If you discover any security-related issues, please email parth.petkar221@vit.edu instead of using the issue tracker.

About

This Project was created for the Hackathon Hack-O-Hire conducted by Barclay's India.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published