Welcome to the documentation for WG1 of the sceQTL-Gen Consortium!

https://user-images.githubusercontent.com/44268007/89252548-35b96f80-d659-11ea-97e9-4b4176df5f08.png

The purpose of this repository is to provide references and instructions for preparation of data for the sceQTL-Gen Consortium. Please note that you can run this pipeline in parallel to the Working Group 2 (Cell Classification) pipeline both of which will be used for Working Group 3 (eQTL Detection). Upon completion of the WG1 pipelines, please contact Drew Neavin (d.neavin @ garvan.org.au) so a meeting can be set up to discuss the best QC thresholds for the dataset that are consistent with thresholds for other datasets in the consortium.

We ask that you upload the results from each of these pipelines (except the SNP genotype data) when completed to a shared own cloud. To get a link to upload the data please email Marc Jan Bonder at bondermj @ gmail.com and provide the dataset name as well as the PI name associated to the dataset. The link will be the same for the WG2 data upload. Please note you can’t change filenames after uploading!

There are four major steps that this group is addressing with data preprocessing:

  1. The first step is to impute the SNP genotypes. This will be used for demultiplexing and for eQTL detection.

  2. The second step is to demultiplex and identify doublets. This will allow droplets containing single cells to be assigned to an individual and droplets that contain two cells to be removed.

  3. The third step is to analyze the quality metrics of the data. These data and results should be fully discussed with members of WG1 too choose effective thresholds for each dataset

  4. The last step is the final data preparation to fileter out doublets.

If you have any questions or issues, feel free to open an issue or directly email Drew Neavin (d.neavin @ garvan.org.au)