Skip to content

Python library to query and transform genomic data from indexed files

License

Notifications You must be signed in to change notification settings

epiviz/epivizFileServer

Repository files navigation

Epiviz File Server

Documentation Status https://travis-ci.org/epiviz/epivizFileServer.svg?branch=master

Compute and Query Parser for Genomic Files

Description

Epiviz file Server is a Python library, to query genomic files, not only for visualization but also for transformation. The library provides various modules to perform various tasks - - Parser to read various genomic file formats, - Query to access only necessary bytes of file, - Compute to apply transformations on data, - Server to instantly convert the datasets into an API and - Visualization.

A quick overview of the library and its features, are described in an IPython notebook available at - https://epiviz.github.io/post/2019-02-04-epiviz-fileserver/

Note

  1. The library requires the server hosting the data files to support HTTP range requests so that the file server's parser module can only request the necessary byte-ranges needed to process the query
  2. The library currently supports indexed genomic file formats like BigWig, BigBed, Bam (with bai), Sam (with sai) or any genomic data file that can be indexed using tabix.

Developer Notes

This project has been set up using PyScaffold 3.1. For details and usage information on PyScaffold see https://pyscaffold.org/.

use a virtualenv for testing & development. To setup run the following commands from the project directory

virtualenv env --python=python3
source env/bin/activate # (activate.fish if using the fish-shell)
pip install -r requirements.txt

# to deactivate virtualenv
deactivate
  1. Test - `python setup.py test`
  2. Docs - `python setup.py docs`
  3. Build
    • source distribution `python setup.py sdist`
    • binary distribution `python setup.py bdist`
    • wheel distribution `python setup.py bdist_wheel`

Download and/or Build Genome or Transcript files for use with Epiviz File Server

Either --ucsc or --gtf must be provided.

  • To generate a genome file, `efs build_genome --ucsc=mm10 --output=mm10`
  • To generate a transcripts file, `efs build_transcript --ucsc=mm10 --output=mm10` (transcript files are prepended with transcripts.)
  • To generate both files, `efs build_both --ucsc=mm10 --output=mm10`

Usage:

`efs.py (build_genome | build_transcript | build_both) (--ucsc=<genome> | --gtf=<file>) [--compressed] [--output=<output>]`

Options:

  • `--ucsc=<genome>` genome build to download and parse from ucsc, eg: mm10
  • `--gtf=<file>` local gtf file
  • `-c --compressed` File is gzip compressed
  • `--output=<output>` the directory where file is saved, defaults to current working directory
  • `-h --help` Show this screen.