liger/NEWS.md at master · welch-lab/liger

rliger Next

Ability to reorganize datasets
- Allow doing something like reorganize(ligerObj, variable = "somethingNotDataset") and resulting in a new liger object with different ligerDataset grouping.
Ability to do downstream analysis on H5 data
- Pseudo-bulk should be easy because we are just aggregating cells.
- Wilcoxon might be a bit harder because ranks are calculated per gene but the H5 sparse data is column majored. Might need to find a fast on-disk transposition method.

Improved scalability of downstream analysis and visualization
- Reduced the need of pre-calculated normalized data for performing Wilcoxon's test and producing feature expression plots
- Normalized data will only be calculated on the fly from raw data and pre-stored size factor (obj$nUMI).
Added theme_axis_shortArrow() for tidy dimensional reduction plot axis theme.
Migrating to patchwork for multi-plot layouting. Mainly for the ease of alignment, legend collection, and subplot extraction.

Added naive GSEA analysis on factor gene loading (W) to test if any known gene sets (e.g. cell cycle) is enriched in any factor. Implemented in factorGSEA().
Added dense data loading support for H5AD files
Optimized obs metadata parsing for H5AD files
Fixed ggplot2 color picking when coloring by logical value
Fixed H5AD file layer detecting bug
Fixed some other minor bugs

Implemented highly efficient on-disk iNMF that scales to a million cells using slightly more time than in-memory version, requiring only laptop-level memory.
Added 10X H5 data and H5AD loading function that loads the data into regular dgCMatrix in memory or the DelayedArray representation backed on disk, the latter is used for on-disk iNMF implementation.
Added selectBatchHVG() which implements another HVG selection strategy, credit to SCIB
Adding suggestK() back with new methodology
Clarified optimal runGOEnrich() workflow and added fold enrichment metric in the returned result
Fixed important bug in online iNMF scenario 2
Fixed multiple problems related to ATAC analysis
- Fixed Wilcoxon rank-sum test bug when using ATAC peak counts
- Fixed gene coordinate parsing bug from BED file
- Optimized peak parsing speed

Added centroidAlign() for new cell factor loading alignment method
Added plotProportionBox() for visualizing compositional analysis
Added plotClusterGeneViolin() for visualizing gene expression in clusters
Added plotBarcodeRank() for basic QC visualization
Added plotPairwiseDEGHeatmap() for visualizing pairwise DEG results
Added plotGODot() for visualizing GO enrichment results
Added calcNMI() for evaluating clustering results against ground truth
Added ligerToH5AD() allowing reticulate/Python free export of liger object to H5AD format. This is presented in extension source code (i.e. not loaded with library(rliger)).
Added organism support in runGeneralQC() and refined hemoglobin gene matching regex pattern.
Optimized DE test memory usage scalability for both pseudo-bulk method and wilcoxon test
Optimized plotProportionPie() by adding argument circleColors
Optimized plotVolcano() text annotation positioning and gene highlighting logic.
Optimized visualization function additional argument documentation
Changed runMarkerDEG() and runPairwiseDEG() default method from "wilcoxon" to "pseudoBulk"
Fixed runMarkerDEG(method = "pseudobulk") bug in assigning pseudo-replicates, and optimized error/warning signaling.
Fixed bug in calcAlignment(), subsetMemLigerDataset(), cellMeta()
Fixed bug in old version updating functions

Fixed wrong UINMF aborting criteria
Fixed example/test skipping criteria for non-existing dependencies
Fixed file access issue when checking on CRAN
Updated installed data file system.file("extdata/ctrl.h5", "extdata/stim.h5") to be of standard 10X H5 format
Updated quantileNorm() automatic reference selection according to #297
Other minor fixes (including #308)

Added ligerDataset class for per-dataset information storage, with inheritance for specific modalities
Added a number of plotting functions with clear function names and useful functionality
Added Leiden clustering method, now as default rather than Louvain
Added pseudo-bulk DEG method
Added DEG analysis with one-vs-rest marker detection in runMarkerDEG() and pairwise comparison in runPairwiseDEG()
Added gene name pattern for expression percentage QC metric
Added native Seurat object support for the core integration workflow
Added a documentation website built with pkgdown
Added new iNMF variant method, consensus iNMF (c-iNMF), in runCINMF(). Not stable.
Added GO enrichment dowsntream analysis in runGOEnrich()
Changed liger object class structure
Moved iNMF (previously optimizeALS()), UINMF (previously optimizeALS(unshared = TRUE)) and online iNMF (previously online_iNMF()) implementation to new package RcppPlanc with vastly improved performance. Now wrapped in runINMF(), runUINMF() and runOnlineINMF() respectively, and all can be invoked with runIntegration().
Updated H5AD support to match up with Python anndata package 0.8.0 specs
Renamed many function/argument names to follow camelCase style, original names are still available while deprecation warnings are issued