A full pipeline AutoML tool for tabular data
-
Updated
Feb 28, 2024 - Python
A full pipeline AutoML tool for tabular data
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Evaluation Tool for Anomaly Detection Algorithms on Time Series
AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach
Unified Distributed Execution
User documentation website for the Sulis tier 2 HPC service. Built using Jekyll.
Test LightGBM's Dask integration on different cluster types
Open Data Profiling, Quality and Analysis on NYC OpenData dataset with semantic profiling using fuzzy ratio, Levenshtein distance and regex
Fraud detection ML pipeline and serving POC using Dask and hopeit.engine. Project created with nbdev: https://www.fast.ai/2019/12/02/nbdev/
Python library to query and transform genomic data from indexed files
Code for "Training models when data doesn't fit in memory" post
Parallel Lammps Python interface - control a mpi4py parallel LAMMPS instance from a serial python process or a Jupyter notebook
Perform I/O intensive workloads on high-volume data sparsely located across multiple AWS regions through the use of Dask.
Loop like a pro, make parameter studies fun.
HPC cluster deployment and management for the Hetzner Cloud
A custom dask remote jobqueue for HTCondor.
Efficiently read climate/meteorology data into Xarray using Dask for parallelization. Transform the data for your modelling needs.
Magic commands to support running MPI python code as well as multi-node Dask workloads on Jupyter notebooks.
Dask tutorial;Dask汉化教程
Add a description, image, and links to the dask-distributed topic page so that developers can more easily learn about it.
To associate your repository with the dask-distributed topic, visit your repo's landing page and select "manage topics."