Journal of integrative bioinformatics

Welcome to JIB.tools!

The official bioinformatics tool list for the Journal of Integrative Bioinformatics (JIB).


All bioinformatics tools published in JIB are automatically added to JIB.tools with first authors having the ability to edit their entries and directly import the tools information to bio.tools.


Order by

To better understand the dynamic behavior of metabolic networks in a wide variety of conditions, the field of Systems Biology has increased its interest in the use of kinetic models. The different databases, available these days, do not contain enough data regarding this topic. Given that a significant part of the relevant information for the development of such models is still wide spread in the literature, it becomes essential to develop specific and powerful text mining tools to collect these data. In this context, this work has as main objective the development of a text mining tool to extract, from scientific literature, kinetic parameters, their respective values and their relations with enzymes and metabolites. The approach proposed integrates the development of a novel plug-in over the text mining framework @Note2. In the end, the pipeline developed was validated with a case study on Kluyveromyces lactis, spanning the analysis and results of 20 full text documents.

JIB Publications
  • Alão Freitas A, Costa H, Rocha I. Extracting kinetic information from literature with KineticRE. J Integr Bioinform. 2015;12(4). doi 10.2390/biecoll-jib-2015-282; PubMed 26673933
Homepage

This paper presents a case study to show the competence of our evolutionary and visual framework for cluster analysis of DNA microarray data. The proposed framework joins a genetic algorithm for hierarchical clustering with a set of visual components of cluster tasks given by a tool. The cluster visualization tool allows us to display different views of clustering results as a means of cluster visual validation. The results of the genetic algorithm for clustering have shown that it can find better solutions than the other methods for the selected data set. Thus, this shows the reliability of the proposed framework.

JIB Publications
  • Castellanos-garzón JA, Díaz F. An evolutionary and visual framework for clustering of DNA microarray data. J Integr Bioinform. 2013;10(3):232. doi 10.2390/biecoll-jib-2013-232; PubMed 24231146
Homepage

Desktop application 
Sequence analysis 
Maximum-likelihood methods based on models of codon substitution have been widely used to infer positively selected amino acid sites that are responsible for adaptive changes. Nevertheless, in order to use such an approach, software applications are required to align protein and DNA sequences, infer a phylogenetic tree and run the maximum-likelihood models. Therefore, a significant effort is made in order to prepare input files for the different software applications and in the analysis of the output of every analysis. In this paper we present the ADOPS (Automatic Detection Of Positively Selected Sites) software. It was developed with the goal of providing an automatic and flexible tool for detecting positively selected sites given a set of unaligned nucleotide sequence data. An example of the usefulness of such a pipeline is given by showing, under different conditions, positively selected amino acid sites in a set of 54 Coffea putative S-RNase sequences. ADOPS software is freely available and can be downloaded from http://sing.ei.uvigo.es/ADOPS.

JIB Publications
  • Reboiro-jato D, Reboiro-jato M, Fdez-riverola F, Vieira CP, Fonseca NA, Vieira J. ADOPS--Automatic Detection Of Positively Selected Sites. J Integr Bioinform. 2012;9(3):200. doi 10.2390/biecoll-jib-2012-200; PubMed 22829571
Homepage bio.tools

In this demo paper, we sketch B-Fabric, an all-in-one solution for management of life sciences data. B-Fabric has two major purposes. First, it is a system for the integrated management of experimental data and scientific annotations. Second, it is a system infrastructure supporting on-the fly coupling of user applications, and thus serving as extensible platform for fast-paced, cutting-edge, collaborative research.

JIB Publications Homepage

BacillOndex is an extension of the Ondex data integration system, providing a semantically annotated, integrated knowledge base for the model Gram-positive bacterium Bacillus subtilis. This application allows a user to mine a variety of B. subtilis data sources, and analyse the resulting integrated dataset, which contains data about genes, gene products and their interactions. The data can be analysed either manually, by browsing using Ondex, or computationally via a Web services interface. We describe the process of creating a BacillOndex instance, and describe the use of the system for the analysis of single nucleotide polymorphisms in B. subtilis Marburg. The Marburg strain is the progenitor of the widely-used laboratory strain B. subtilis 168. We identified 27 SNPs with predictable phenotypic effects, including genetic traits for known phenotypes. We conclude that BacillOndex is a valuable tool for the systems-level investigation of, and hypothesis generation about, this important biotechnology workhorse. Such understanding contributes to our ability to construct synthetic genetic circuits in this organism.

JIB Publications
  • Misirli G, Wipat A, Mullen J, et al. BacillOndex: an integrated data resource for systems and synthetic biology. J Integr Bioinform. 2013;10(2):224. doi 10.2390/biecoll-jib-2013-224; PubMed 23571273
Homepage

As high-throughput technologies become cheaper and easier to use, raw sequence data and corresponding annotations for many organisms are becoming available. However, sequence data alone is not sufficient to explain the biological behaviour of organisms, which arises largely from complex molecular interactions. There is a need to develop new platform technologies that can be applied to the investigation of whole-genome datasets in an efficient and cost-effective manner. One such approach is the transfer of existing knowledge from well-studied organisms to closely-related organisms. In this paper, we describe a system, BacillusRegNet, for the use of a model organism, Bacillus subtilis, to infer genome-wide regulatory networks in less well-studied close relatives. The putative transcription factors, their binding sequences and predicted promoter sequences along with annotations are available from the associated BacillusRegNet website (http://bacillus.ncl.ac.uk).

JIB Publications
  • Misirli G, Hallinan J, Röttger R, Baumbach J, Wipat A. BacillusRegNet: A transcriptional regulation database and analysis platform for Bacillus species. J Integr Bioinform. 2014;11(2). doi 10.2390/biecoll-jib-2014-244; PubMed 25001169
Homepage

The constant drive towards a more personalized medicine led to an increasing interest in temporal gene expression analyzes. It is now broadly accepted that considering a temporal perpective represents a great advantage to better understand disease progression and treatment results at a molecular level. In this context, biclustering algorithms emerged as an important tool to discover local expression patterns in biomedical applications, and CCC-Biclustering arose as an efficient algorithm relying on the temporal nature of data to identify all maximal temporal patterns in gene expression time series. In this work, CCC-Biclustering was integrated in new biclustering-based classifiers for prognostic prediction. As case study we analyzed multiple gene expression time series in order to classify the response of Multiple Sclerosis patients to the standard treatment with Interferon-β, to which nearly half of the patients reveal a negative response. In this scenario, using an effective predictive model of a patient's response would avoid useless and possibly harmful therapies for the non-responder group. The results revealed interesting potentialities to be further explored in classification problems involving other (clinical) time series.

JIB Publications
  • Carreiro AV, Anunciação O, Carriço JA, Madeira SC. Prognostic Prediction through Biclustering-Based Classification of Clinical Gene Expression Time Series. J Integr Bioinform. 2011;8(3). doi 10.2390/biecoll-jib-2011-175; PubMed 21926438

This paper presents a novel bioinformatics data warehouse software kit that integrates biological information from multiple public life science data sources into a local database management system. It stands out from other approaches by providing up-to-date integrated knowledge, platform and database independence as well as high usability and customization. This open source software can be used as a general infrastructure for integrative bioinformatics research and development. The advantages of the approach are realized by using a Java-based system architecture and object-relational mapping (ORM) technology. Finally, a practical application of the system is presented within the emerging area of medical bioinformatics to show the usefulness of the approach. The BioDWH data warehouse software is available for the scientific community at http://sourceforge.net/projects/biodwh/.

JIB Publications
  • Töpel T, Kormeier B, Klassen A, Hofestädt R. BioDWH: A Data Warehouse Kit for Life Science Data Integration. J Integr Bioinform. 2008;5(2). doi 10.2390/biecoll-jib-2008-93; PubMed 20134070
Homepage

The study of microorganism consortia, also known as biofilms, is associated to a number of applications in biotechnology, ecotechnology and clinical domains. Nowadays, biofilm studies are heterogeneous and data-intensive, encompassing different levels of analysis. Computational modelling of biofilm studies has become thus a requirement to make sense of these vast and ever-expanding biofilm data volumes. The rationale of the present work is a machine-readable format for representing biofilm studies and supporting biofilm data interchange and data integration. This format is supported by the Biofilm Science Ontology (BSO), the first ontology on biofilms information. The ontology is decomposed into a number of areas of interest, namely: the Experimental Procedure Ontology (EPO) which describes biofilm experimental procedures; the Colony Morphology Ontology (CMO) which characterises morphologically microorganism colonies; and other modules concerning biofilm phenotype, antimicrobial susceptibility and virulence traits. The overall objective behind BSO is to develop semantic resources to capture, represent and share data on biofilms and related experiments in a regularized fashion manner. Furthermore, the present work also introduces a framework in assistance of biofilm data interchange and analysis - BiofOmics (http://biofomics.org) - and a public repository on colony morphology signatures - MorphoCol (http://stardust.deb.uminho.pt/morphocol).

JIB Publications
  • Sousa AM, Ferreira A, Azevedo NF, Pereira MO, Lourenço A. Computational approaches to standard-compliant biofilm data for reliable analysis and integration. J Integr Bioinform. 2012;9(3). doi 10.2390/biecoll-jib-2012-203; PubMed 22829574
Homepage

In nowadays life science projects, sharing data and data interpretation is becoming increasingly important. This considerably calls for novel information technology approaches, which enable the integration of expert knowledge from different disciplines in combination with advanced data analysis facilities in a collaborative manner. Since the recent development of web technologies offers scientific communities new ways for cooperation and communication, we propose a fully web-based software approach for the collaborative analysis of bioimage data and demonstrate the applicability of Web2.0 techniques to ion mobility spectrometry image data. Our approach allows collaborating experts to easily share, explore and discuss complex image data without any installation of software packages. Scientists only need a username and a password to get access to our system and can directly start exploring and analyzing their data.

JIB Publications
  • Loyek C, Bunkowski A, Vautz W, Nattkemper TW. Web2.0 paves new ways for collaborative and exploratory analysis of chemical compounds in spectrometry data. J Integr Bioinform. 2011;8(2):158. doi 10.2390/biecoll-jib-2011-158; PubMed 21768655
Homepage

The speed and accuracy of new scientific discoveries - be it by humans or artificial intelligence - depends on the quality of the underlying data and on the technology to connect, search and share the data efficiently. In recent years, we have seen the rise of graph databases and semi-formal data models such as knowledge graphs to facilitate software approaches to scientific discovery. These approaches extend work based on formalised models, such as the Semantic Web. In this paper, we present our developments to connect, search and share data about genome-scale knowledge networks (GSKN). We have developed a simple application ontology based on OWL/RDF with mappings to standard schemas. We are employing the ontology to power data access services like resolvable URIs, SPARQL endpoints, JSON-LD web APIs and Neo4j-based knowledge graphs. We demonstrate how the proposed ontology and graph databases considerably improve search and access to interoperable and reusable biological knowledge (i.e. the FAIRness data principles).

JIB Publications
  • Brandizi M, Singh A, Rawlings C, Hassani-Pak K. Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach.. J Integr Bioinform. 2018;15(3). doi 10.1515/jib-2018-0023; PubMed 30085931
Homepage

Given the great potential impact of the growing number of complete genome-scale metabolic network reconstructions of microorganisms, bioinformatics tools are needed to simplify and accelerate the course of knowledge in this field. One essential component of a genome-scale metabolic model is its biomass equation, whose maximization is one of the most common objective functions used in Flux Balance Analysis formulations. Some components of biomass, such as amino acids and nucleotides, can be estimated from genome information, providing reliable data without the need of performing lab experiments. In this work a java tool is proposed that estimates microbial biomass composition in amino acids and nucleotides, from genome and transcriptomic information, using as input files sequences in FASTA format and files with transcriptomic data in the csv format. This application allows to obtain the results rapidly and is also a user-friendly tool for users with any or little background in informatics (http://darwin.di.uminho.pt/biomass/). The results obtained using this tool are fairly close to experimental data, showing that the estimation of amino acid and nucleotide compositions from genome information and from transcriptomic data is a good alternative when no experimental data is available.

JIB Publications
  • Santos S, Rocha I. Estimation of biomass composition from genomic and transcriptomic information. J Integr Bioinform. 2016;13(2):285. doi 10.2390/biecoll-jib-2016-285; PubMed 28187415
Homepage

While high-throughput technology, advanced techniques in biochemistry and molecular biology have become increasingly powerful, the coherent interpretation of experimental results in an integrative context is still a challenge. BioModelKit (BMK) approaches this challenge by offering an integrative and versatile framework for biomodel-engineering based on a modular modelling concept with the purpose: (i) to represent knowledge about molecular mechanisms by consistent executable sub-models (modules) given as Petri nets equipped with defined interfaces facilitating their reuse and recombination; (ii) to compose complex and integrative models from an ad hoc chosen set of modules including different omic and abstraction levels with the option to integrate spatial aspects; (iii) to promote the construction of alternative models by either the exchange of competing module versions or the algorithmic mutation of the composed model; and (iv) to offer concepts for (omic) data integration and integration of existing resources, and thus facilitate their reuse. BMK is accessible through a public web interface (www.biomodelkit.org), where users can interact with the modules stored in a database, and make use of the model composition features. BMK facilitates and encourages multi-scale model-driven predictions and hypotheses supporting experimental research in a multilateral exchange.

JIB Publications
  • Blätke MA. BioModelKit - An Integrative Framework for Multi-Scale Biomodel-Engineering.. J Integr Bioinform. 2018;15(3). doi 10.1515/jib-2018-0021; PubMed 30205646
Homepage

The visualization of biological data gained increasing importance in the last years. There is a large number of methods and software tools available that visualize biological data including the combination of measured experimental data and biological networks. With growing size of networks their handling and exploration becomes a challenging task for the user. In addition, scientists also have an interest in not just investigating a single kind of network, but on the combination of different types of networks, such as metabolic, gene regulatory and protein interaction networks. Therefore, fast access, abstract and dynamic views, and intuitive exploratory methods should be provided to search and extract information from the networks. This paper will introduce a conceptual framework for handling and combining multiple network sources that enables abstract viewing and exploration of large data sets including additional experimental data. It will introduce a three-tier structure that links network data to multiple network views, discuss a proof of concept implementation, and shows a specific visualization method for combining metabolic and gene regulatory networks in an example.

JIB Publications

BIOchemical PathwaY DataBase is developed as a manually curated, readily updatable, dynamic resource of human cell specific pathway information along with integrated computational platform to perform various pathway analyses. Presently, it comprises of 46 pathways, 3189 molecules, 5742 reactions and 6897 different types of diseases linked with pathway proteins, which are referred by 520 literatures and 17 other pathway databases. With its repertoire of biochemical pathway data, and computational tools for performing Topological, Logical and Dynamic analyses, BIOPYDB offers both the experimental and computational biologists to acquire a comprehensive understanding of signaling cascades in the cells. Automated pathway image reconstruction, cross referencing of pathway molecules and interactions with other databases and literature sources, complex search operations to extract information from other similar resources, integrated platform for pathway data sharing and computation, etc. are the novel and useful features included in this database to make it more acceptable and attractive to the users of pathway research communities. The RESTful API service is also made available to the advanced users and developers for accessing this database more conveniently through their own computer programmes.

JIB Publications
  • Chowdhury S, Sinha N, Ganguli P, Bhowmick R, Singh V, Nandi S, Sarkar RR. BIOPYDB: A Dynamic Human Cell Specific Biochemical Pathway Database with Advanced Computational Analyses Platform.. J Integr Bioinform. 2018;15(3). doi 10.1515/jib-2017-0072; PubMed 29547394
Homepage

Metagenomics provides quantitative measurements for microbial species over time. To obtain a global overview of an experiment and to explore the full potential of a given dataset, intuitive and interactive visualization tools are needed. Therefore, we established BioSankey to visualize microbial species in microbiome studies over time as a Sankey diagram. These diagrams are embedded into a project-specific webpage which depends only on JavaScript and Google API to allow searches of interesting species without requiring a web server or connection to a database. BioSankey is a valuable tool to visualize different data elements from single or dual RNA-seq datasets and additionally enables a straightforward exchange of results among collaboration partners.

JIB Publications
  • Platzer A, Polzin J, Rembart K, Han PP, Rauer D, Nussbaumer T. BioSankey: Visualization of Microbial Communities Over Time.. J Integr Bioinform. 2018;15(4). doi 10.1515/jib-2017-0063; PubMed 29897884
Homepage

The need for novel methods of visualizing microarray data is growing. New perspectives are beneficial to finding patterns in expression data. The Bluejay genome browser provides an integrative way of visualizing gene expression datasets in a genomic context. We have now developed the functionality to display multiple microarray datasets simultaneously in Bluejay, in order to provide researchers with a comprehensive view of their datasets linked to a graphical representation of gene function. This will enable biologists to obtain valuable insights on expression patterns, by allowing them to analyze the expression values in relation to the gene locations as well as to compare expression profiles of related genomes or of different experiments for the same genome.

JIB Publications
  • Borowski K, Soh J, Sensen CW. Visual Comparison of Multiple Gene Expression Datasets in a Genomic Context. J Integr Bioinform. 2008;5(2). doi 10.2390/biecoll-jib-2008-97; PubMed 20134066

One of the major challenges in bioinfomatics is to integrate and manage data from different sources as well as experimental microarray data and present them in a user-friendly format. Therefore, we present CardioVINEdb, a data warehouse approach developed to interact with and explore life science data. The data warehouse architecture provides a platform independent web interface that can be used with any common web browser. A monitor component controls and updates the data from the different sources to guarantee up-todateness. In addition, the system provides a "static" and "dynamic" visualization component for interactive graphical exploration of the data.

JIB Publications
  • Kormeier B, Hippe K, Töpel T, Hofestädt R. CardioVINEdb: a data warehouse approach for integration of life science data in cardiovascular diseases. J Integr Bioinform. 2010;7(1):142. doi 10.2390/biecoll-jib-2010-142; PubMed 20585146
Homepage

CRISPR Cas9 and other sequence-specific endonucleases are fundamental genome editors supporting gene knockout and gene therapy. A speedy and accurate computational allele designer is required for a high through-put gene mutagenesis pipeline using these new techniques. An automatic system, Cas9 online designer (COD), was created to screen Cas9 targets and off-targets, as well as to provide gene knockout and genotyping strategies. A gene knockout rat model was successfully created and genotyped under the direction of this online system confirming its ability to predict real targets and off-targets. Gene knockout strategies to mutate 72 rat cytochrome P450 genes were designed instantly by the system to demonstrate its high-throughput efficiency. Also, the system used an off-target scoring matrix which can be applied to any sequence-specific genome editing tools besides Cas9. The COD system (http://cas9.wicp.net) has established a speedy, accurate, flexible and high through-put computational gene knockout pipeline supporting the sequence-specific endonuclease induced mutagenesis.

JIB Publications
  • Guo D, Li X, Zhu P, Feng Y, Yang J, Zheng Z, Yang W, Zhang E, Zhou S, Wang H. Online High-throughput Mutagenesis Designer Using Scoring Matrix of Sequence-specific Endonucleases.. J Integr Bioinform. 2015;12(1). doi 10.1515/jib-2015-283; PubMed 29220955
Homepage

Desktop application 
ChIP-seq Mapping Molecular interactions, pathways and networks Data architecture, analysis and design 
The mapping of DNA-protein interactions is crucial for a full understanding of transcriptional regulation. Chromatin-immunoprecipitation followed by massively parallel sequencing (ChIP-seq) has become the standard technique for analyzing these interactions on a genome-wide scale. We have developed a software system called CASSys (ChIP-seq data Analysis Software System) spanning all steps of ChIP-seq data analysis. It supersedes the laborious application of several single command line tools. CASSys provides functionality ranging from quality assessment and -control of short reads, over the mapping of reads against a reference genome (readmapping) and the detection of enriched regions (peakdetection) to various follow-up analyses. The latter are accessible via a state-of-the-art web interface and can be performed interactively by the user. The follow-up analyses allow for flexible user defined association of putative interaction sites with genes, visualization of their genomic context with an integrated genome browser, the detection of putative binding motifs, the identification of over-represented Gene Ontology-terms, pathway analysis and the visualization of interaction networks. The system is client-server based, accessible via a web browser and does not require any software installation on the client side. To demonstrate CASSys's functionality we used the system for the complete data analysis of a publicly available Chip-seq study that investigated the role of the transcription factor estrogen receptor-α in breast cancer cells.

JIB Publications
  • Alawi M, Kurtz S, Beckstette M. CASSys: an integrated software-system for the interactive analysis of ChIP-seq data. J Integr Bioinform. 2011;8(2):155. doi 10.2390/biecoll-jib-2011-155; PubMed 21690655
bio.tools

Command-line tool Desktop application 
Computational biology 
Using the lac operon as a paradigmatic example for a gene regulatory system in prokaryotes, we demonstrate how qualitative knowledge can be initially captured using simple discrete (Boolean) models and then stepwise refined to multivalued logical models and finally to continuous (ODE) models. At all stages, signal transduction and transcriptional regulation is integrated in the model description. We first show the potential benefit of a discrete binary approach and discuss then problems and limitations due to indeterminacy arising in cyclic networks. These limitations can be partially circumvented by using multilevel logic as generalization of the Boolean framework enabling one to formulate a more realistic model of the lac operon. Ultimately a dynamic description is needed to fully appreciate the potential dynamic behavior that can be induced by regulatory feedback loops. As a very promising method we show how the use of multivariate polynomial interpolation allows transformation of the logical network into a system of ordinary differential equations (ODEs), which then enables the analysis of key features of the dynamic behavior.

JIB Publications
  • Franke R, Theis FJ, Klamt S. From Binary to Multivalued to Continuous Models: The lac Operon as a Case Study. J Integr Bioinform. 2010;7(1). doi 10.2390/biecoll-jib-2010-151; PubMed 21200084
Homepage bio.tools

With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results. clubber is our automated cluster-load balancing system developed for optimizing these "big data" analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems. clubber's goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We used clubber to speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance of clubber in the everyday computational biology environment.

JIB Publications
  • Miller M, Zhu C, Bromberg Y. clubber: removing the bioinformatics bottleneck in big data analyses. J Integr Bioinform. 2017;14(2). doi 10.1515/jib-2017-0020; PubMed 28609295
Homepage

Desktop application 
Molecular interactions, pathways and networks Bioinformatics Cell biology Computer science Structural biology 
Detailed investigation of socially important diseases with modern experimental methods has resulted in the generation of large volume of valuable data. However, analysis and interpretation of this data needs application of efficient computational techniques and systems biology approaches. In particular, the techniques allowing the reconstruction of associative networks of various biological objects and events can be useful. In this publication, the combination of different techniques to create such a network associated with an abstract cell environment is discussed in order to gain insights into the functional as well as spatial interrelationships. It is shown that experimentally gained knowledge enriched with data warehouse content and text mining data can be used for the reconstruction and localization of a cardiovascular disease developing network beginning with MUPP1/MPDZ (multi-PDZ domain protein).

Screenshot of CmPI - CELLmicrocosmos 4 PathwayIntegrationScreenshot of CmPI - CELLmicrocosmos 4 PathwayIntegration
JIB Publications
  • Sommer B, Tiys ES, Kormeier B, et al. Visualization and Analysis of a Cardio Vascular Diseaseand MUPP1-related Biological Network combining Text Mining and Data Warehouse Approaches. J Integr Bioinform. 2010;7(1). doi 10.2390/biecoll-jib-2010-148; PubMed 21068463
  • Kovanci G, Ghaffar M, Sommer B. Web-based hybrid-dimensional Visualization and Exploration of Cytological Localization Scenarios. J Integr Bioinform. 2016;13(4):47–58. doi 10.2390/biecoll-jib-2016-298; PubMed 28187414
  • Sommer B. The CELLmicrocosmos Tools: A Small History of Java-Based Cell and Membrane Modelling Open Source Software Development.. J Integr Bioinform. 2019;. doi 10.1515/jib-2019-0057; PubMed 31560649
Homepage bio.tools

The CELLmicrocosmos 4.2 PathwayIntegration (CmPI) is a tool which provides hybrid-dimensional visualization and analysis of intracellular protein and gene localizations in the context of a virtual 3D environment. This tool is developed based on Java/Java3D/JOGL and provides a standalone application compatible to all relevant operating systems. However, it requires Java and the local installation of the software. Here we present the prototype of an alternative web-based visualization approach, using Three.js and D3.js. In this way it is possible to visualize and explore CmPI-generated localization scenarios including networks mapped to 3D cell components by just providing a URL to a collaboration partner. This publication describes the integration of the different technologies – Three.js, D3.js and PHP – as well as an application case: a localization scenario of the citrate cycle. The CmPI web viewer is available at: http://CmPIweb.CELLmicrocosmos.org.

Screenshot of CmPIwebScreenshot of CmPIweb
JIB Publications
  • Kovanci G, Ghaffar M, Sommer B. Web-based hybrid-dimensional Visualization and Exploration of Cytological Localization Scenarios. J Integr Bioinform. 2016;13(4):47–58. doi 10.2390/biecoll-jib-2016-298; PubMed 28187414
  • Sommer B. The CELLmicrocosmos Tools: A Small History of Java-Based Cell and Membrane Modelling Open Source Software Development.. J Integr Bioinform. 2019;. doi 10.1515/jib-2019-0057; PubMed 31560649
Homepage bio.tools

Comparative analysis of biological networks is a major problem in computational integrative systems biology. By computing the maximum common edge subgraph between a set of networks, one is able to detect conserved substructures between them and quantify their topological similarity. To aid such analyses we have developed CytoMCS, a Cytoscape app for computing inexact solutions to the maximum common edge subgraph problem for two or more graphs. Our algorithm uses an iterative local search heuristic for computing conserved subgraphs, optimizing a squared edge conservation score that is able to detect not only fully conserved edges but also partially conserved edges. It can be applied to any set of directed or undirected, simple graphs loaded as networks into Cytoscape, e.g. protein-protein interaction networks or gene regulatory networks. CytoMCS is available as a Cytoscape app at http://apps.cytoscape.org/apps/cytomcs.

JIB Publications
  • Larsen SJ, Baumbach J. CytoMCS: A Multiple Maximum Common Subgraph Detection Tool for Cytoscape. J Integr Bioinform. 2017;14(2). doi 10.1515/jib-2017-0014; PubMed 28731857
Homepage

This work presents DaTo, a semi-automatically generated world atlas of biological databases and tools. It extracts raw information from all PubMed articles which contain exact URLs in their abstract section, followed by a manual curation of the abstract and the URL accessibility. DaTo features a user-friendly query interface, providing extensible URL-related annotations, such as the status, the location and the country of the URL. A graphical interaction network browser has also been integrated into the DaTo web interface to facilitate exploration of the relationship between different tools and databases with respect to their ontology-based semantic similarity. Using DaTo, the geographical locations, the health statuses, as well as the journal associations were evaluated with respect to the historical development of bioinformatics tools and databases over the last 20 years. We hope it will inspire the biological community to gain a systematic insight into bioinformatics resources. DaTo is accessible via http://bis.zju.edu.cn/DaTo/.

JIB Publications Homepage

DBE2 is an information system for the management of biological experiment data from different data domains in a unified and simple way. It provides persistent data storage, worldwide accessibility of the data and the opportunity to load, save, modify, and annotate the data. It is seamlessly integrated in the VANTED system as an add-on, thereby extending the VANTED platform towards data management. DBE2 also utilizes controlled vocabulary from the Ontology Lookup Service to allow the management of terms such as substance names, species names, and measurement units, aiming at an eased data integration.

JIB Publications

Web service 
Sequence analysis 
During the last years several new tools applicable to protein analysis have made available on the IBIVU web site. Recently, a number of tools, ranging from multiple sequence alignment construction to domain prediction, have been updated and/or extended with services for programmatic access using SOAP. We provide an overview of these tools and their application.

JIB Publications Homepage bio.tools

Proteins and their interactions are essential for the functioning of all organisms and for understanding biological processes. Alternative splicing is an important molecular mechanism for increasing the protein diversity in eukaryotic cells. Splicing events that alter the protein structure and the domain composition can be responsible for the regulation of protein interactions and the functional diversity of different tissues. Discovering the occurrence of splicing events and studying protein isoforms have become feasible using Affymetrix Exon Arrays. Therefore, we have developed the versatile Cytoscape plugin DomainGraph that allows for the visual analysis of protein domain interaction networks and their integration with exon expression data. Protein domains affected by alternative splicing are highlighted and splicing patterns can be compared.

JIB Publications
  • Emig D, Cline MS, Klein K, et al. Integrative visual analysis of the effects of alternative splicing on protein domain interaction networks. J Integr Bioinform. 2008;5(2). doi 10.2390/biecoll-jib-2008-101; PubMed 20134061
Homepage
DPD

In this paper we present two case studies of Proteomics applications development using the AIBench framework, a Java desktop application framework mainly focused in scientific software development. The applications presented in this work are Decision Peptide-Driven, for rapid and accurate protein quantification, and Bacterial Identification, for Tuberculosis biomarker search and diagnosis. Both tools work with mass spectrometry data, specifically with MALDI-TOF spectra, minimizing the time required to process and analyze the experimental data.

JIB Publications
  • López-Fernández H, Reboiro-Jato M, Glez-Peña D, et al. Rapid development of Proteomic applications with the AIBench framework. J Integr Bioinform. 2011;8(3):171. doi 10.2390/biecoll-jib-2011-171; PubMed 21926434
Homepage

Expression efficiency is one of the major characteristics describing genes in various modern investigations. Expression efficiency of genes is regulated at various stages: transcription, translation, posttranslational protein modification and others. In this study, a special EloE (Elongation Efficiency) web application is described. The EloE sorts the organism's genes in a descend order on their theoretical rate of the elongation stage of translation based on the analysis of their nucleotide sequences. Obtained theoretical data have a significant correlation with available experimental data of gene expression in various organisms. In addition, the program identifies preferential codons in organism's genes and defines distribution of potential secondary structures energy in 5´ and 3´ regions of mRNA. The EloE can be useful in preliminary estimation of translation elongation efficiency for genes for which experimental data are not available yet. Some results can be used, for instance, in other programs modeling artificial genetic structures in genetically engineered experiments.

JIB Publications
  • Sokolov V, Zuraev B, Lashin S, Matushkin Y. Web application for automatic prediction of gene translation elongation efficiency.. J Integr Bioinform. 2015;12(1). doi 10.2390/biecoll-jib-2015-256; PubMed 26527190
Homepage

The prevalence of comorbid diseases poses a major health issue for millions of people worldwide and an enormous socio-economic burden for society. The molecular mechanisms for the development of comorbidities need to be investigated. For this purpose, a workflow system was developed to aggregate data on biomedical entities from heterogeneous data sources. The process of integrating and merging all data sources of the workflow system was implemented as a semi-automatic pipeline that provides the import, fusion, and analysis of the highly connected biomedical data in a Neo4j database GenCoNet. As a starting point, data on the common comorbid diseases essential hypertension and bronchial asthma was integrated. GenCoNet (https://genconet.kalis-amts.de) is a curated database that provides a better understanding of hereditary bases of comorbidities.

JIB Publications
  • Shoshi A, Hofestädt R, Zolotareva O, Friedrichs M, Maier A, Ivanisenko VA, Dosenko VE, Bragina EY. GenCoNet - A Graph Database for the Analysis of Comorbidities by Gene Networks.. J Integr Bioinform. 2018;15(4). doi 10.1515/jib-2018-0049; PubMed 30864352
Homepage

Script Library 
Bioinformatics DNA Gene regulation 
The interconversion of sequences that constitute the genome and the proteome is becoming increasingly important due to the generation of large amounts of DNA sequence data. Following mapping of DNA segments to the genome, one fundamentally important task is to find the amino acid sequences which are coded within a list of genomic sections. Conversely, given a series of protein segments, an important task is to find the genomic loci which code for a list of protein regions. To perform these tasks on a region by region basis is extremely laborious when a large number of regions are being studied. We have therefore implemented an R package geno2proteo which performs the two mapping tasks and subsequent sequence retrieval in a batch fashion. In order to make the tool more accessible to users, we have created a web interface of the R package which allows the users to perform the mapping tasks by going to the web page http://sharrocksresources.manchester.ac.uk/tofigaps and using the web service.

JIB Publications
  • Li Y, Aguilar-Martinez E, Sharrocks AD. Geno2proteo, a Tool for Batch Retrieval of DNA and Protein Sequences from Any Genomic or Protein Regions.. J Integr Bioinform. 2019;. doi 10.1515/jib-2018-0090; PubMed 31301672
Homepage

The need to process large quantities of data generated from genomic sequencing has resulted in a difficult task for life scientists who are not familiar with the use of command-line operations or developments in high performance computing and parallelization. This knowledge gap, along with unfamiliarity with necessary processes, can hinder the execution of data processing tasks. Furthermore, many of the commonly used bioinformatics tools for the scientific community are presented as isolated, unrelated entities that do not provide an integrated, guided, and assisted interaction with the scheduling facilities of computational resources or distribution, processing and mapping with runtime analysis. This paper presents the first approximation of a Web Services platform-based architecture (GITIRBio) that acts as a distributed front-end system for autonomous and assisted processing of parallel bioinformatics pipelines that has been validated using multiple sequences. Additionally, this platform allows integration with semantic repositories of genes for search annotations. GITIRBio is available at: http://c-head.ucaldas.edu.co:8080/gitirbio.

JIB Publications
  • Castillo LF, López-Gartner G, Isaza GA, et al. GITIRBio: A Semantic and Distributed Service Oriented- Architecture for Bioinformatics Pipeline. J Integr Bioinform. 2015;12(1):1–15. doi 10.2390/biecoll-jib-2015-255; PubMed 26527189
Homepage

Bioinformatics applications manage complex biological data stored into distributed and often heterogeneous databases and require large computing power. These databases are too big and complicated to be rapidly queried every time a user submits a query, due to the overhead involved in decomposing the queries, sending the decomposed queries to remote databases, and composing the results. There is also considerable communication costs involved. This study addresses the mentioned problems in Grid-based environment for bioinformatics. We propose a Grid middleware called GMB that alleviates these problems by caching the results of Frequently Used Queries (FUQ). Queries are classified based on their types and frequencies. FUQ are answered from the middleware, which improves their response time. GMB acts as a gateway to TeraGrid Grid: it resides between users’ applications and TeraGrid Grid. We evaluate GMB experimentally.

JIB Publications

Web application 
Mapping Ontology and terminology Proteins Sequence analysis Sequencing 
The functional annotation of genomic data has become a major task for the ever-growing number of sequencing projects. In order to address this challenge, we recently developed GOblet, a free web service for the annotation of anonymous sequences with Gene Ontology (GO) terms. However, to overcome limitations of the GO terminology, and to aid in understanding not only single components but as well systemic interactions between the individual components, we have now extended the GOblet web service to integrate also pathway annotations. Furthermore, we extended and upgraded the data analysis pipeline with improved summaries, and added term enrichment and clustering algorithms. Finally, we are now making GOblet available as a stand-alone application for high-throughput processing on local machines. The advantages of this frequently requested feature is that a) the user can avoid restrictions of our web service for uploading and processing large amounts of data, and that b) confidential data can be analysed without insecure transfer to a public web server. The stand-alone version of the web service has been implemented using platform independent Tcl-scripts, which can be run with just a single runtime file utilizing the Starkit technology. The GOblet web service and the stand-alone application are freely available at http://goblet.molgen.mpg.de.

JIB Publications
  • Groth D, Hartmann S, Panopoulou G, Poustka AJ, Hennig S. GOblet: annotation of anonymous sequence data with gene ontology and pathway terms. J Integr Bioinform. 2008;5(2). doi 10.2390/biecoll-jib-2008-104; PubMed 20134064
Homepage bio.tools

Despite the large number of software tools developed to address different areas of microarray data analysis, very few offer an all-in-one solution with little learning curve. For microarray core labs, there are even fewer software packages available to help with their routine but critical tasks, such as data quality control (QC) and inventory management. We have developed a simple-to-use web portal to allow bench biologists to analyze and query complicated microarray data and related biological pathways without prior training. Both experiment-based and gene-based analysis can be easily performed, even for the first-time user, through the intuitive multi-layer design and interactive graphic links. While being friendly to inexperienced users, most parameters in Goober can be easily adjusted via drop-down menus to allow advanced users to tailor their needs and perform more complicated analysis. Moreover, we have integrated graphic pathway analysis into the website to help users examine microarray data within the relevant biological content. Goober also contains features that cover most of the common tasks in microarray core labs, such as real time array QC, data loading, array usage and inventory tracking. Overall, Goober is a complete microarray solution to help biologists instantly discover valuable information from a microarray experiment and enhance the quality and productivity of microarray core labs. The whole package is freely available at http://sourceforge.net/projects/goober. A demo web server is available at http://www.goober-array.org.

JIB Publications
  • Luo W, Gudipati M, Jung K, Chen M, Marschke KB. Goober: a fully integrated and user-friendly microarray data management and analysis solution for core labs and bench biologists. J Integr Bioinform. 2009;6(1):108. doi 10.2390/biecoll-jib-2009-108; PubMed 20134074
Homepage

Detecting sources of bias in transcriptomic data is essential to determine signals of Biological significance. We outline a novel method to detect sequence specific bias in short read Next Generation Sequencing data. This is based on determining intra-exon correlations between specific motifs. This requires a mild assumption that short reads sampled from specific regions from the same exon will be correlated with each other. This has been implemented on Apache Spark and used to analyse two D. melanogaster eye-antennal disc data sets generated at the same laboratory. The wild type data set in drosophila indicates a variation due to motif GC content that is more significant than that found due to exon GC content. The software is available online and could be applied for cross-experiment transcriptome data analysis in eukaryotes.

JIB Publications
  • Alnasir J, Shanahan HP. A Novel Method to Detect Bias in Short Read NGS Data. J Integr Bioinform. 2017;14(3). doi 10.1515/jib-2017-0025; PubMed 28941355
Homepage

This work presents a sophisticated information system, the Integrated Analysis Platform (IAP), an approach supporting large-scale image analysis for different species and imaging systems. In its current form, IAP supports the investigation of Maize, Barley and Arabidopsis plants based on images obtained in different spectra. Several components of the IAP system, which are described in this work, cover the complete end-to-end pipeline, starting with the image transfer from the imaging infrastructure, (grid distributed) image analysis, data management for raw data and analysis results, to the automated generation of experiment reports.

JIB Publications
  • Klukas C, Pape JM, Entzian A. Analysis of high-throughput plant image data with the information system IAP. J Integr Bioinform. 2012;9(2):191. doi 10.2390/biecoll-jib-2012-191; PubMed 22745177
Homepage

Interactions between chemical compounds described in biomedical text can be of great importance to drug discovery and design, as well as pharmacovigilance. We developed a novel system, "Identifying Interactions between Chemical Entities" (IICE), to identify chemical interactions described in text. Kernel-based Support Vector Machines first identify the interactions and then an ensemble classifier validates and classifies the type of each interaction. This relation extraction module was evaluated with the corpus released for the DDI Extraction task of SemEval 2013, obtaining results comparable to state-of-the-art methods for this type of task. We integrated this module with our chemical named entity recognition module and made the whole system available as a web tool at www.lasige.di.fc.ul.pt/webtools/iice.

JIB Publications
  • Lamurias A, Ferreira JD, Couto FM. Identifying interactions between chemical entities in biomedical text. J Integr Bioinform. 2014;11(3):247. doi 10.2390/biecoll-jib-2014-247; PubMed 25339081

Knowledge found in biomedical databases, in particular in Web information systems, is a major bioinformatics resource. In general, this biological knowledge is worldwide represented in a network of databases. These data is spread among thousands of databases, which overlap in content, but differ substantially with respect to content detail, interface, formats and data structure. To support a functional annotation of lab data, such as protein sequences, metabolites or DNA sequences as well as a semi-automated data exploration in information retrieval environments, an integrated view to databases is essential. Search engines have the potential of assisting in data retrieval from these structured sources, but fall short of providing a comprehensive knowledge except out of the interlinked databases. A prerequisite of supporting the concept of an integrated data view is to acquire insights into cross-references among database entities. This issue is being hampered by the fact, that only a fraction of all possible cross-references are explicitely tagged in the particular biomedical informations systems. In this work, we investigate to what extend an automated construction of an integrated data network is possible. We propose a method that predicts and extracts cross-references from multiple life science databases and possible referenced data targets. We study the retrieval quality of our method and report on first, promising results. The method is implemented as the tool IDPredictor, which is published under the DOI 10.5447/IPK/2012/4 and is freely available using the URL: http://dx.doi.org/10.5447/IPK/2012/4.

JIB Publications
  • Mehlhorn H, Lange M, Scholz U, Schreiber F. IDPredictor: predict database links in biomedical database. J Integr Bioinform. 2012;9(2):1–15. doi 10.2390/biecoll-jib-2012-190; PubMed 22736059
Homepage

It has been recognized that the development of new therapeutic drugs is a complex and expensive process. A large number of factors affect the activity in vivo of putative candidate molecules and the propensity for causing adverse and toxic effects is recognized as one of the major hurdles behind the current "target-rich, lead-poor" scenario. Structure-Activity Relationship (SAR) studies, using relational Machine Learning (ML) algorithms, have already been shown to be very useful in the complex process of rational drug design. Despite the ML successes, human expertise is still of the utmost importance in the drug development process. An iterative process and tight integration between the models developed by ML algorithms and the know-how of medicinal chemistry experts would be a very useful symbiotic approach. In this paper we describe a software tool that achieves that goal--iLogCHEM. The tool allows the use of Relational Learners in the task of identifying molecules or molecular fragments with potential to produce toxic effects, and thus help in stream-lining drug design in silico. It also allows the expert to guide the search for useful molecules without the need to know the details of the algorithms used. The models produced by the algorithms may be visualized using a graphical interface, that is of common use amongst researchers in structural biology and medicinal chemistry. The graphical interface enables the expert to provide feedback to the learning system. The developed tool has also facilities to handle the similarity bias typical of large chemical databases. For that purpose the user can filter out similar compounds when assembling a data set. Additionally, we propose ways of providing background knowledge for Relational Learners using the results of Graph Mining algorithms.

JIB Publications
  • Camacho R, Pereira M, Costa VS, et al. A relational learning approach to Structure-Activity Relationships in drug design toxicity studies. J Integr Bioinform. 2011;8(3):182. doi 10.2390/biecoll-jib-2011-182; PubMed 21926445
Homepage

Workbench 
Metabolomics Data mining Data management 
Over the last decade the evaluation of odors and vapors in human breath has gained more and more attention, particularly in the diagnostics of pulmonary diseases. Ion mobility spectrometry coupled with multi-capillary columns (MCC/IMS), is a well known technology for detecting volatile organic compounds (VOCs) in air. It is a comparatively inexpensive, non-invasive, high-throughput method, which is able to handle the moisture that comes with human exhaled air, and allows for characterizing of VOCs in very low concentrations. To identify discriminating compounds as biomarkers, it is necessary to have a clear understanding of the detailed composition of human breath. Therefore, in addition to the clinical studies, there is a need for a flexible and comprehensive centralized data repository, which is capable of gathering all kinds of related information. Moreover, there is a demand for automated data integration and semi-automated data analysis, in particular with regard to the rapid data accumulation, emerging from the high-throughput nature of the MCC/IMS technology. Here, we present a comprehensive database application and analysis platform, which combines metabolic maps with heterogeneous biomedical data in a well-structured manner. The design of the database is based on a hybrid of the entity-attribute-value (EAV) model and the EAV-CR, which incorporates the concepts of classes and relationships. Additionally it offers an intuitive user interface that provides easy and quick access to the platform’s functionality: automated data integration and integrity validation, versioning and roll-back strategy, data retrieval as well as semi-automatic data mining and machine learning capabilities. The platform will support MCC/IMS-based biomarker identification and validation. The software, schemata, data sets and further information is publicly available at http://imsdb.mpi-inf.mpg.de.

JIB Publications
  • Schneider T, Hauschild A-C, Baumbach JI, Baumbach J. An Integrative Clinical Database and Diagnostics Platform for Biomarker Identification and Analysis in Ion Mobility Spectra of Human Exhaled Air. J Integr Bioinform. 2013;10(2). doi 10.2390/biecoll-jib-2013-218; PubMed 23545212
Homepage bio.tools

At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.

JIB Publications
  • Damkliang K, Tandayya P, Sangket U, Pasomsub E. Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services. J Integr Bioinform. 2016;13(1):287. doi 10.2390/biecoll-jib-2016-287; PubMed 28187423
Homepage

AstraZeneca’s Oncology in vivo data integration platform brings multidimensional data from animal model efficacy, pharmacokinetic and pharmacodynamic data to animal model profiling data and public in vivo studies. Using this platform, scientists can cluster model efficacy and model profiling data together, quickly identify responder profiles and correlate molecular characteristics to pharmacological response. Through meta-analysis, scientists can compare pharmacology between single and combination treatments, between different drug scheduling and administration routes.

JIB Publications

Measuring differential methylation of the DNA is the nowadays most common approach to linking epigenetic modifications to diseases (called epigenome-wide association studies, EWAS). For its low cost, its efficiency and easy handling, the Illumina HumanMethylation450 BeadChip and its successor, the Infinium MethylationEPIC BeadChip, is the by far most popular techniques for conduction EWAS in large patient cohorts. Despite the popularity of this chip technology, raw data processing and statistical analysis of the array data remains far from trivial and still lacks dedicated software libraries enabling high quality and statistically sound downstream analyses. As of yet, only R-based solutions are freely available for low-level processing of the Illumina chip data. However, the lack of alternative libraries poses a hurdle for the development of new bioinformatic tools, in particular when it comes to web services or applications where run time and memory consumption matter, or EWAS data analysis is an integrative part of a bigger framework or data analysis pipeline. We have therefore developed and implemented Jllumina, an open-source Java library for raw data manipulation of Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data, supporting the developer with Java functions covering reading and preprocessing the raw data, down to statistical assessment, permutation tests, and identification of differentially methylated loci. Jllumina is fully parallelizable and publicly available at http://dimmer.compbio.sdu.dk/download.html.

JIB Publications
  • Almeida D, Skov I, Lund J, et al. Jllumina - A comprehensive Java-based API for statistical Illumina Infinium HumanMethylation450 and MethylationEPIC data processing. J Integr Bioinform. 2016;13(4):294. doi 10.2390/biecoll-jib-2016-294; PubMed 28187410
Homepage

Deducing common properties or degrees of phylogenetic relationship by analyzing a grouping or clustering of sequence sets is a frequently used technique in computational biology. If interpreted by means of visual inspection, the conclusions depend for many of these applications on meaningful names for the input data. In accordance with the aim of the analysis, the sequences should be provided with names indicating the function of the genes or gene-products, the phylogenetic position or other properties characterizing the contributing species. However, sequences extracted from databases are most often annotated with identifiers which only implicitly contain the desired information. To solve this problem, we have designed and implemented a tool named Key2Ann, which replaces in multiple fasta files the database keys with short terms indicating the taxonomic position or other features like the gene name or the EC-number. In addition, properties like habitat, growth temperature or the degree of pathogenicity can be coded for microbial species. To allow for highest flexibility, the user can control the composition of the names by means of command line parameters. Key2Ann is written in Java and can be downloaded via http://www-bioinf.uni-regensburg.de/downl/Key2Ann.zip. We demonstrate the usage of Key2Ann by discussing three typical examples of phylogenetic analysis.

JIB Publications
  • Pürzer A, Grassmann F, Birzer D, Merkl R. Key2Ann: a tool to process sequence sets by replacing database identifiers with a human-readable annotation. J Integr Bioinform. 2011;8(1). doi 10.2390/biecoll-jib-2011-153; PubMed 21372341

Web application 
Plant biology Genomics 
Search engines and retrieval systems are popular tools at a life science desktop. The manual inspection of hundreds of database entries, that reflect a life science concept or fact, is a time intensive daily work. Hereby, not the number of query results matters, but the relevance does. In this paper, we present the LAILAPS search engine for life science databases. The concept is to combine a novel feature model for relevance ranking, a machine learning approach to model user relevance profiles, ranking improvement by user feedback tracking and an intuitive and slim web user interface, that estimates relevance rank by tracking user interactions. Queries are formulated as simple keyword lists and will be expanded by synonyms. Supporting a flexible text index and a simple data import format, LAILAPS can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. With a set of features, extracted from each database hit in combination with user relevance preferences, a neural network predicts user specific relevance scores. Using expert knowledge as training data for a predefined neural network or using users own relevance training sets, a reliable relevance ranking of database hits has been implemented. In this paper, we present the LAILAPS system, the concepts, benchmarks and use cases. LAILAPS is public available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.

JIB Publications
  • Lange M, Spies K, Bargsten J, et al. The LAILAPS Search Engine: Relevance Ranking in Life Science Databases. J Integr Bioinform. 2010;7(2):1–11. doi 10.2390/biecoll-jib-2010-110; PubMed 20134080
  • Lange M, Spies K, Colmsee C, Flemming S, Klapperstück M, Scholz U. The LAILAPS Search Engine: A Feature Model for Relevance Ranking in Life Science Databases. J Integr Bioinform. 2010;7(3). doi 10.2390/biecoll-jib-2010-118; PubMed 20375444
  • Esch M, Chen J, Weise S, Hassani-Pak K, Scholz U, Lange M. A query suggestion workflow for life science IR-systems. J Integr Bioinform. 2014;11(2):237. doi 10.2390/biecoll-jib-2014-237; PubMed 24953306
Homepage bio.tools

Distinct bacteria are able to cope with highly diverse lifestyles; for instance, they can be free living or host-associated. Thus, these organisms must possess a large and varied genomic arsenal to withstand different environmental conditions. To facilitate the identification of genomic features that might influence bacterial adaptation to a specific niche, we introduce LifeStyle-Specific-Islands (LiSSI). LiSSI combines evolutionary sequence analysis with statistical learning (Random Forest with feature selection, model tuning and robustness analysis). In summary, our strategy aims to identify conserved consecutive homology sequences (islands) in genomes and to identify the most discriminant islands for each lifestyle.

JIB Publications
  • Barbosa E, Röttger R, Hauschild A-C, et al. LifeStyle-Specific-Islands (LiSSI): Integrated Bioinformatics Platform for Genomic Island Analysis. J Integr Bioinform. 2017;14(2). doi 10.1515/jib-2017-0010; PubMed 28678736
Homepage

There are a number of databases on the Listeria species and about their genome. However, these databases do not specifically address a set of network that is important in defence mechanism of the bacteria. Listeria monocytogenes EGDe is a well-established intracellular model organism to study host pathogenicity because of its versatility in the host environment. Here, we have focused on thiol disulphide redox metabolic network proteins, specifically in L. monocytogenes EGDe. The thiol redox metabolism is involved in oxidative stress mechanism and is found in all living cells. It functions to maintain the thiol disulphide balance required for protein folding by providing reducing power. Nevertheless, they are involved in the reversible oxidation of thiol groups in biomolecules by creating disulphide bonds; therefore, the term thiol disulphide redox metabolism (TDRM). TDRM network genes play an important role in oxidative stress mechanism and during host–pathogen interaction. Therefore, it is essential to have detailed information on these proteins with regard to other bacteria and its genome analysis to understand the presence of tRNA, transposons, and insertion elements for horizontal gene transfer. LmTDRM database is a new comprehensive web-based database on thiol proteins and their functions. It includes: Description, Search, TDRM analysis, and genome viewer. The quality of these data has been evaluated before they were aggregated to produce a final representation. The web interface allows for various queries to understand the protein function and their annotation with respect to their relationship with other bacteria. LmTDRM is a major step towards the development of databases on thiol disulphide redox proteins; it would definitely help researchers to understand the mechanism of these proteins and their interaction. Database URL: www.lmtdrm.com.

JIB Publications
  • Srinivas V, Gopal S. LmTDRM Database: A Comprehensive Database on Thiol Metabolic Gene/Gene Products in Listeria monocytogenes EGDe. J Integr Bioinform. 2014;11(1). doi 10.2390/biecoll-jib-2014-245; PubMed 25228549
Homepage

The rapid increase of ~omics datasets generated by microarray, mass spectrometry and next generation sequencing technologies requires an integrated platform that can combine results from different ~omics datasets to provide novel insights in the understanding of biological systems. MADMAX is designed to provide a solution for storage and analysis of complex ~omics datasets. In addition, analysis results (such as lists of genes) will be merged to reveal candidate genes supported by all datasets. The system constitutes an ISA-Tab compliant LIMS part which is independent of different analysis pipelines. A pilot study of different type of ~omics data in Brassica rapa demonstrates the possible use of MADMAX. The web-based user interface provides easy access to data and analysis tools on top of the database.

JIB Publications
  • Lin K, Kools H, De groot PJ, et al. MADMAX - Management and analysis database for multiple ~omics experiments. J Integr Bioinform. 2011;8(2):160. doi 10.2390/biecoll-jib-2011-160; PubMed 21778530
Homepage

Mass spectrometry is an important analytical technology for the identification of metabolites and small compounds by their exact mass. But dozens or hundreds of different compounds may have a similar mass or even the same molecule formula. Further elucidation requires tandem mass spectrometry, which provides the masses of compound fragments, but in silico fragmentation programs require substantial computational resources if applied to large numbers of candidate structures. We present and evaluate an approach to obtain candidates from a relational database which contains 28 million compounds from PubChem. A training phase associates tandem-MS peaks with corresponding fragment structures. For the candidate search, the peaks in a query spectrum are translated to fragment structures, and the candidates are retrieved and sorted by the number of matching fragment structures. In the cross validation the evaluation of the relative ranking positions (RRP) using different sizes of training sets confirms that a larger coverage of training data improves the average RRP from 0.65 to 0.72. Our approach allows downstream algorithms to process candidates in order of importance.

JIB Publications
  • Hildebrandt C, Wolf S, Neumann S. Database supported candidate search for metabolite identification. J Integr Bioinform. 2011;8(2):157. doi 10.2390/biecoll-jib-2011-157; PubMed 21734330
Homepage

In recent years the amount of biological data has exploded to the point where much useful information can only be extracted by complex computational analyses. Such analyses are greatly facilitated by metadata standards, both in terms of the ability to compare data originating from different sources, and in terms of exchanging data in standard forms, e.g. when running processes on a distributed computing infrastructure. However, standards thrive on stability whereas science tends to constantly move, with new methods being developed and old ones modified. Therefore maintaining both metadata standards, and all the code that is required to make them useful, is a non-trivial problem. Memops is a framework that uses an abstract definition of the metadata (described in UML) to generate internal data structures and subroutine libraries for data access (application programming interfaces--APIs--currently in Python, C and Java) and data storage (in XML files or databases). For the individual project these libraries obviate the need for writing code for input parsing, validity checking or output. Memops also ensures that the code is always internally consistent, massively reducing the need for code reorganisation. Across a scientific domain a Memops-supported data model makes it easier to support complex standards that can capture all the data produced in a scientific area, share them among all programs in a complex software pipeline, and carry them forward to deposition in an archive. The principles behind the Memops generation code will be presented, along with example applications in Nuclear Magnetic Resonance (NMR) spectroscopy and structural biology.

JIB Publications
  • Fogh RH, Boucher W, Ionides JMC, Vranken WF, Stevens TJ, Laue ED. MEMOPS: Data modelling and automatic code generation. J Integr Bioinform. 2010;7(3). doi 10.2390/biecoll-jib-2010-123; PubMed 20375445
Homepage

Helicobacter pylori is a pathogenic bacterium that colonizes the human epithelia, causing duodenal and gastric ulcers, and gastric cancer. The genome of H. pylori 26695 has been previously sequenced and annotated. In addition, two genome-scale metabolic models have been developed. In order to maintain accurate and relevant information on coding sequences (CDS) and to retrieve new information, the assignment of new functions to Helicobacter pylori 26695s genes was performed in this work. The use of software tools, on-line databases and an annotation pipeline for inspecting each gene allowed the attribution of validated EC numbers and TC numbers to metabolic genes encoding enzymes and transport proteins, respectively. 1212 genes encoding proteins were identified in this annotation, being 712 metabolic genes and 500 non-metabolic, while 191 new functions were assignment to the CDS of this bacterium. This information provides relevant biological information for the scientific community dealing with this organism and can be used as the basis for a new metabolic model reconstruction.

JIB Publications
  • Resende T, Correia DM, Rocha M, Rocha I. Re-annotation of the genome sequence of Helicobacter pylori 26695. J Integr Bioinform. 2013;10(3):233. doi 10.2390/biecoll-jib-2013-233; PubMed 24231147
Homepage

Database portal 
Endocrinology and metabolism Plant biology Molecular interactions, pathways and networks Enzymes 
Crop plants play a major role in human and animal nutrition and increasingly contribute to chemical or pharmaceutical industry and renewable resources. In order to achieve important goals, such as the improvement of growth or yield, it is indispensable to understand biological processes on a detailed level. Therefore, the well-structured management of fine-grained information about metabolic pathways is of high interest. Thus, we developed the MetaCrop information system, a manually curated repository of high quality information concerning the metabolism of crop plants. However, the data access to and flexible export of information of MetaCrop in standard exchange formats had to be improved. To automate and accelerate the data access we designed a set of web services to be integrated into external software. These web services have already been used by an add-on for the visualisation toolkit VANTED. Furthermore, we developed an export feature for the MetaCrop web interface, thus enabling the user to compose individual metabolic models using SBML.

JIB Publications
  • Hippe K, Colmsee C, Czauderna T, et al. Novel Developments of the MetaCrop Information System for Facilitating Systems Biological Approaches. J Integr Bioinform. 2010;7(3). doi 10.2390/biecoll-jib-2010-125; PubMed 20375443
Homepage bio.tools

Recently, there has been increasing research to discover genomic biomarkers, haplotypes, and potentially other variables that together contribute to the development of diseases. Single Nucleotide Polymorphisms (SNPs) are the most common form of genomic variations and they can represent an individual’s genetic variability in greatest detail. Genome-wide association studies (GWAS) of SNPs, high-dimensional case-control studies, are among the most promising approaches for identifying disease causing variants. METU-SNP software is a Java based integrated desktop application specifically designed for the prioritization of SNP biomarkers and the discovery of genes and pathways related to diseases via analysis of the GWAS case-control data. Outputs of METU-SNP can easily be utilized for the downstream biomarkers research to allow the prediction and the diagnosis of diseases and other personalized medical approaches. Here, we introduce and describe the system functionality and architecture of the METU-SNP. We believe that the METU-SNP will help researchers with the reliable identification of SNPs that are involved in the etiology of complex diseases, ultimately supporting the development of personalized medicine approaches and targeted drug discoveries.

JIB Publications
  • Üstünkar G, Son YA. METU-SNP: An Integrated Software System for SNPComplex Disease Association Analysis. J Integr Bioinform. 2011;8(2). doi 10.2390/biecoll-jib-2011-187; PubMed 22156365
Homepage

As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years' worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.

JIB Publications
  • Flanagan K, Nakjang S, Hallinan J, et al. Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud. J Integr Bioinform. 2012;9(2). doi 10.2390/biecoll-jib-2012-212; PubMed 23001322
Homepage

Molecularly imprinted polymers (MIPs) are high affinity robust synthetic receptors, which can be optimally synthesized and manufactured more economically than their biological equivalents (i.e. antibody). In MIPs production, rational design based on molecular modeling is a commonly employed technique. This mostly aids in (i) virtual screening of functional monomers (FMs), (ii) optimization of monomer-template ratio, and (iii) selectivity analysis. We present MIRATE, an integrated science gateway for the intelligent design of MIPs. By combining and adapting multiple state-of-the-art bioinformatics tools into automated and innovative pipelines, MIRATE guides the user through the entire process of MIPs' design. The platform allows the user to fully customize each stage involved in the MIPs' design, with the main goal to support the synthesis in the wet-laboratory. Availability: MIRATE is freely accessible with no login requirement at http://mirate.di.univr.it/. All major browsers are supported.

JIB Publications
  • Busato M, Distefano R, Bates F, Karim K, Bossi AM, López Vilariño JM, Piletsky S, Bombieri N, Giorgetti A. MIRATE: MIps RATional dEsign Science Gateway.. J Integr Bioinform. 2018;15(4). doi 10.1515/jib-2017-0075; PubMed 29897885
Homepage

MicroRNAs (miRNAs/miRs) are important cellular components that regulate gene expression at posttranscriptional level. Various upstream components regulate miR expression and any deregulation causes disease conditions. Therefore, understanding of miR regulatory network both at upstream and downstream level is crucial and a resource on this aspect will be helpful. Currently available miR databases are mostly related to downstream targets, sequences, or diseases. But as of now, no database is available that provides a complete picture of miR regulation in a specific condition. Our miR regulation web resource (miReg) is a manually curated one that represents validated upstream regulators (transcription factor, drug, physical, and chemical) along with downstream targets, associated biological process, experimental condition or disease state, up or down regulation of the miR in that condition, and corresponding PubMed references in a graphical and user friendly manner, browseable through 5 browsing options. We have presented exact facts that have been described in the corresponding literature in relation to a given miR, whether it's a feed-back/feed-forward loop or inhibition/activation. Moreover we have given various links to integrate data and to get a complete picture on any miR listed. Current version (Version 1.0) of miReg contains 47 important human miRs with 295 relations using 190 absolute references. We have also provided an example on usefulness of miReg to establish signalling pathways involved in cardiomyopathy. We believe that miReg will be an essential miRNA knowledge base to research community, with its continuous upgrade and data enrichment. This HTML based miReg can be accessed from: www.iioab-mireg.webs.com or www.iioab.webs.com/mireg.htm.

JIB Publications Homepage

Identification of microRNA (miRNA) precursors has seen increased efforts in recent years. The difficulty in experimental detection of pre-miRNAs increased the usage of computational approaches. Most of these approaches rely on machine learning especially classification. In order to achieve successful classification, many parameters need to be considered such as data quality, choice of classifier settings, and feature selection. For the latter one, we developed a distributed genetic algorithm on HTCondor to perform feature selection. Moreover, we employed two widely used classification algorithms libSVM and random forest with different settings to analyze the influence on the overall classification performance. In this study we analyzed 5 human retro virus genomes; Human endogenous retrovirus K113, Hepatitis B virus (strain ayw), Human T lymphotropic virus 1, Human T lymphotropic virus 2, Human immunodeficiency virus 2, and Human immunodeficiency virus 1. We then predicted pre-miRNAs by using the information from known virus and human pre-miRNAs. Our results indicate that these viruses produce novel unknown miRNA precursors which warrant further experimental validation.

JIB Publications
  • Saçar demirci MD, Toprak M, Allmer J. A Machine Learning Approach for MicroRNA Precursor Prediction in Retro-transcribing Virus Genomes. J Integr Bioinform. 2016;13(5):303. doi 10.2390/biecoll-jib-2016-303; PubMed 28187417
Homepage

Small non-coding RNAs, in particular microRNAs, are critical for normal physiology and are candidate biomarkers, regulators, and therapeutic targets for a wide variety of diseases. There is an ever-growing interest in the comprehensive and accurate annotation of microRNAs across diverse cell types, conditions, species, and disease states. Highthroughput sequencing technology has emerged as the method of choice for profiling microRNAs. Specialized bioinformatic strategies are required to mine as much meaningful information as possible from the sequencing data to provide a comprehensive view of the microRNA landscape. Here we present miRquant 2.0, an expanded bioinformatics tool for accurate annotation and quantification of microRNAs and their isoforms (termed isomiRs) from small RNA-sequencing data. We anticipate that miRquant 2.0 will be useful for researchers interested not only in quantifying known microRNAs but also mining the rich well of additional information embedded in small RNA-sequencing data.

JIB Publications
  • Kanke M, Baran-Gale J, Villanueva J, Sethupathy P. miRquant 2.0: an Expanded Tool for Accurate Annotation and Quantification of MicroRNAs and their isomiRs from Small RNA-Sequencing Data. J Integr Bioinform. 2016;13(5). doi 10.2390/biecoll-jib-2016-307; PubMed 28187421
Homepage

A precise experimental identification of transcription factor binding motifs (TFBMs), accurate to a single base pair, is time-consuming and diffcult. For several databases, TFBM annotations are extracted from the literature and stored 5' --> 3' relative to the target gene. Mixing the two possible orientations of a motif results in poor information content of subsequently computed position frequency matrices (PFMs) and sequence logos. Since these PFMs are used to predict further TFBMs, we address the question if the TFBMs underlying a PFM can be re-annotated automatically to improve both the information content of the PFM and subsequent classification performance.

JIB Publications
  • Baumbach J, Wittkop T, Weile J, Kohl T, Rahmann S. MoRAine--a web server for fast computational transcription factor binding motif re-annotation. J Integr Bioinform. 2008;5(2). doi 10.2390/biecoll-jib-2008-91; PubMed 20134062
Homepage

We investigated the problem of imprecisely determined prokaryotic transcription factor (TF) binding sites (TFBSs). We found that the identification and reinvestigation of questionable binding motifs may result in improved models of these motifs. Subsequent modelbased predictions of gene regulatory interactions may be performed with increased accuracy when the TFBSs annotation underlying these models has been re-adjusted. We present MoRAine 2.0, a significantly improved version of MoRAine. It can automatically identify cases of unfavorable TFBS strand annotations and imprecisely determined TFBS positions. With release 2.0, we close the gap between reasonable running time and high accuracy. Furthermore, it requires only minimal input from the user: (1) the input TFBS sequences and (2) the length of the flanking sequences.

JIB Publications
  • Wittkop T, Rahmann S, Baumbach J. Efficient online transcription factor binding site adjustment by integrating transitive graph projection with MoRAine 2.0. J Integr Bioinform. 2010;7(3). doi 10.2390/biecoll-jib-2010-117; PubMed 20375458
Homepage

Web application 
Sequence analysis Proteins Molecular interactions, pathways and networks Sequencing Protein interactions 
During the last years several new tools applicable to protein analysis have made available on the IBIVU web site. Recently, a number of tools, ranging from multiple sequence alignment construction to domain prediction, have been updated and/or extended with services for programmatic access using SOAP. We provide an overview of these tools and their application.

JIB Publications Homepage bio.tools

Command-line tool 
DNA Mobile genetic elements Sequence analysis 
Background Miniature inverted repeat transposable element (MITE) is a short transposable element, carrying no protein-coding regions. However, its high proliferation rate and sequence-specific insertion preference renders it as a good genetic tool for both natural evolution and experimental insertion mutagenesis. Recently active MITE copies are those with clear signals of Terminal Inverted Repeats (TIRs) and Direct Repeats (DRs), and are recently translocated into their current sites. Their proliferation ability renders them good candidates for the investigation of genomic evolution. Results This study optimizes the C++ code and running pipeline of the MITE Uncovering SysTem (MUST) by assuming no prior knowledge of MITEs required from the users, and the current version, MUSTv2, shows significantly increased detection accuracy for recently active MITEs, compared with similar programs. The running speed is also significantly increased compared with MUSTv1. We prepared a benchmark dataset, the simulated genome with 150 MITE copies for researchers who may be of interest. Conclusions MUSTv2 represents an accurate detection program of recently active MITE copies, which is complementary to the existing template-based MITE mapping programs. We believe that the release of MUSTv2 will greatly facilitate the genome annotation and structural analysis of the bioOMIC big data researchers.

JIB Publications
  • Ge R, Mai G, Zhang R, Wu X, Wu Q, Zhou F. MUSTv2: An Improved De Novo Detection Program for Recently Active Miniature Inverted Repeat Transposable Elements (MITEs). J Integr Bioinform. 2017;14(3). doi 10.1515/jib-2017-0029; PubMed 28796642
Homepage bio.tools

Biological networks can be large and complex, often consisting of different sub-networks or parts. Separation of networks into parts, network partitioning and layouts of overview and sub-graphs are of importance for understandable visualisations of those networks. This article presents NetPartVis to visualise non-overlapping clusters or partitions of graphs in the Vanted framework based on a method for laying out overview graph and several sub-graphs (partitions) in a coordinated, mental-map preserving way.

JIB Publications
  • Garkov D, Klein K, Klukas C, Schreiber F. Mental-Map Preserving Visualisation of Partitioned Networks in Vanted.. J Integr Bioinform. 2019;. doi 10.1515/jib-2019-0026; PubMed 31199771

Organisms try to maintain homeostasis by balanced uptake of nutrients from their environment. From an atomic perspective this means that, for example, carbon:nitrogen:sulfur ratios are kept within given limits. Upon limitation of, for example, sulfur, its acquisition is triggered. For yeast it was shown that transporters and enzymes involved in sulfur uptake are encoded as paralogous genes that express different isoforms. Sulfur deprivation leads to up-regulation of isoforms that are poor in sulfur-containing amino acids, that is, methinone and cysteine. Accordingly, sulfur-rich isoforms are down-regulated. We developed a web-based software, doped Nutrilyzer, that extracts paralogous protein coding sequences from an annotated genome sequence and evaluates their atomic composition. When fed with gene-expression data for nutrient limited and normal conditions, Nutrilyzer provides a list of genes that are significantly differently expressed and simultaneously contain significantly different amounts of the limited nutrient in their atomic composition. Its intended use is in the field of ecological stoichiometry. Nutrilyzer is available at http://nutrilyzer.hs-mittweida.de. Here we describe the work flow and results with an example from a whole-genome Arabidopsis thaliana gene-expression analysis upon oxygen deprivation. 43 paralogs distributed over 37 homology clusters were found to be significantly differently expressed while containing significantly different amounts of oxygen.

JIB Publications
  • Lotz K, Schreiber F, Wünschiers R. Nutrilyzer: A Tool for Deciphering Atomic Stoichiometry of Differentially Expressed Paralogous Proteins. J Integr Bioinform. 2012;9(2). doi 10.2390/biecoll-jib-2012-196; PubMed 22796635
Homepage

We present Omics Fusion, a new web-based platform for integrative analysis of omics data. Omics Fusion provides a collection of new and established tools and visualization methods to support researchers in exploring omics data, validating results or understanding how to adjust experiments in order to make new discoveries. It is easily extendible and new visualization methods are added continuously. It is available for free under: https://fusion.cebitec.uni-bielefeld.de/.

JIB Publications
  • Brink BG, Seidel A, Kleinbölting N, Nattkemper TW, Albaum SP. Omics Fusion - A Platform for Integrative Analysis of Omics Data. J Integr Bioinform. 2016;13(4):296. doi 10.2390/biecoll-jib-2016-296; PubMed 28187412
Homepage

High throughput genomic studies can identify large numbers of potential candidate genes, which must be interpreted and filtered by investigators to select the best ones for further analysis. Prioritization is generally based on evidence that supports the role of a gene product in the biological process being investigated. The two most important bodies of information providing such evidence are bioinformatics databases and the scientific literature. In this paper we present an extension to the Ondex data integration framework that uses text mining techniques over Medline abstracts as a method for accessing both these bodies of evidence in a consistent way. In an example use case, we apply our method to create a knowledge base of Arabidopsis proteins implicated in plant stress response and use various scoring metrics to identify key protein-stress associations. In conclusion, we show that the additional text mining features are able to highlight proteins using the scientific literature that would not have been seen using data integration alone. Ondex is an open-source software project and can be downloaded, together with the text mining features described here, from www.ondex.org.

JIB Publications
  • Hassani-Pak K, Legaie R, Canevet C, van den Berg HA, Moore JD, Rawlings CJ. Enhancing data integration with text analysis to find proteins implicated in plant stress response. J Integr Bioinform. 2010;7(3). doi 10.2390/biecoll-jib-2010-121; PubMed 20375451
Homepage

The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara-Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation. The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.

JIB Publications
  • Pesch R, Lysenko A, Hindle M, et al. Graph-based sequence annotation using a data integration approach. J Integr Bioinform. 2008;5(2). doi 10.2390/biecoll-jib-2008-94; PubMed 20134069
Homepage

Electronic laboratory notebooks (ELNs) are more accessible and reliable than their paper based alternatives and thus find widespread adoption. While a large number of commercial products is available, small- to mid-sized laboratories can often not afford the costs or are concerned about the longevity of the providers. Turning towards free alternatives, however, raises questions about data protection, which are not sufficiently addressed by available solutions. To serve as legal documents, ELNs must prevent scientific fraud through technical means such as digital signatures. It would also be advantageous if an ELN was integrated with a laboratory information management system to allow for a comprehensive documentation of experimental work including the location of samples that were used in a particular experiment. Here, we present OpenLabNotes, which adds state-of-the-art ELN capabilities to OpenLabFramework, a powerful and flexible laboratory information management system. In contrast to comparable solutions, it allows to protect the intellectual property of its users by offering data protection with digital signatures. OpenLabNotes effectively closes the gap between research documentation and sample management, thus making Open-LabFramework more attractive for laboratories that seek to increase productivity through electronic data management.

JIB Publications
  • List M, Franz M, Tan Q, Mollenhauer J, Baumbach J. OpenLabNotes--An Electronic Laboratory Notebook Extension for OpenLabFramework. J Integr Bioinform. 2015;12(3):274. doi 10.2390/biecoll-jib-2015-274; PubMed 26673790
Homepage

Command-line tool Library 
Microarray experiment Gene expression 
Correlation analysis assuming coexpression of the genes is a widely used method for gene expression analysis in molecular biology. Yet growing extent, quality and dimensionality of the molecular biological data permits emerging, more sophisticated approaches like Boolean implications. We present an approach which is a combination of the SOM (self organizing maps) machine learning method and Boolean implication analysis to identify relations between genes, metagenes and similarly behaving metagene groups (spots). Our method provides a way to assign Boolean states to genes/metagenes/spots and offers a functional view over significantly variant elements of gene expression data on these three different levels. While being able to cover relations between weakly correlated entities Boolean implication method also decomposes these relations into six implication classes. Our method allows one to validate or identify potential relationships between genes and functional modules of interest and to assess their switching behaviour. Furthermore the output of the method renders it possible to construct and study the network of genes. By providing logical implications as updating rules for the network it can also serve to aid modelling approaches.

JIB Publications
  • Çakır MV, Binder H, Wirth H. Profiling of Genetic Switches using Boolean Implications in Expression Data. J Integr Bioinform. 2014;11(1). doi 10.2390/biecoll-jib-2014-246; PubMed 25318120
Homepage bio.tools

Multiple sequence alignment is one of the most recurrent assignments in Bioinformatics. This method allows organizing a set of molecular sequences in order to expose their similarities and their differences. Although exact methods exist for solving this problem, their use is limited by the computing demands which are necessary for exploring such a large and complex search space. Genetic Algorithms are adaptive search methods which perform well in large and complex spaces. Parallel Genetic Algorithms, not only increase the speed up of the search, but also improve its efficiency, presenting results that are better than those provided by the sum of several sequential Genetic Algorithms. Although these methods are often used to optimize a single objective, they can also be used in multidimensional domains, finding all possible tradeoffs among multiple conflicting objectives. Parallel AlineaGA is an Evolutionary Algorithm which uses a Parallel Genetic Algorithm for performing multiple sequence alignment. We now present the Parallel Niche Pareto AlineaGA, a multiobjective version of Parallel AlineaGA. We compare the performance of both versions using eight BAliBASE datasets. We also measure up the quality of the obtained solutions with the ones achieved by T-Coffee and ClustalW2, allowing us to observe that our algorithm reaches for better solutions in the majority of the datasets.

JIB Publications
  • Silva FJM da, Pérez JMS, Pulido JAG, Rodríguez MAV. Parallel Niche Pareto AlineaGA – an Evolutionary Multiobjective approach on Multiple Sequence Alignment. J Integr Bioinform. 2011;8(3). doi 10.2390/biecoll-jib-2011-174; PubMed 21926437

Web API 
Molecular interactions, pathways and networks Data management 
Biological pathways are crucial to much of the scientific research today including the study of specific biological processes related with human diseases. PathJam is a new comprehensive and freely accessible web-server application integrating scattered human pathway annotation from several public sources. The tool has been designed for both (i) being intuitive for wet-lab users providing statistical enrichment analysis of pathway annotations and (ii) giving support to the development of new integrative pathway applications. PathJam’s unique features and advantages include interactive graphs linking pathways and genes of interest, downloadable results in fully compatible formats, GSEA compatible output files and a standardized RESTful API.

JIB Publications
  • Glez-Peña D, Reboiro-Jato M, Domínguez R, Gómez-López G, Pisano DG, Fdez-Riverola F. PathJam: a new service for integrating biological pathway information. J Integr Bioinform. 2010;7(1). doi 10.2390/biecoll-jib-2010-147; PubMed 20980714
Homepage bio.tools

Desktop application 
Systems biology Molecular interactions, pathways and networks 
Our understanding of complex biological processes can be enhanced by combining different kinds of high-throughput experimental data, but the use of incompatible identifiers makes data integration a challenge. We aimed to improve methods for integrating and visualizing different types of omics data. To validate these methods, we applied them to two previous studies on starvation in mice, one using proteomics and the other using transcriptomics technology. We extended the PathVisio software with new plugins to link proteins, transcripts and pathways. A low overall correlation between proteome and transcriptome data was detected (Spearman rank correlation: 0.21). At the level of individual genes, correlation was highly variable. Many mRNA/protein pairs, such as fructose biphosphate aldolase B and ATP Synthase, show good correlation. For other pairs, such as ferritin and elongation factor 2, an interesting effect is observed, where mRNA and protein levels change in opposite directions, suggesting they are not primarily regulated at the transcriptional level. We used pathway diagrams to visualize the integrated datasets and found it encouraging that transcriptomics and proteomics data supported each other at the pathway level. Visualization of the integrated dataset on pathways led to new observations on gene-regulation in the response of the gut to starvation. Our methods are generic and can be applied to any multi-omics study. The PathVisio software can be obtained at http://www.pathvisio.org. Supplemental data are available at http://www.bigcat.unimaas.nl/data/jib-supplemental/ , including instructions on reproducing the pathway visualizations of this manuscript.

JIB Publications
  • Van iersel MP, Sokolović M, Lenaerts K, et al. Integrated visualization of a multi-omics study of starvation in mouse intestine. J Integr Bioinform. 2014;11(1):235. doi 10.2390/biecoll-jib-2014-235; PubMed 24675236
Homepage bio.tools

MicroRNAs (miRs) are known to interfere with mRNA expression, and much work has been put into predicting and inferring miR-mRNA interactions. Both sequence-based interaction predictions as well as interaction inference based on expression data have been proven somewhat successful; furthermore, models that combine the two methods have had even more success. In this paper, I further refine and enrich the methods of miRmRNA interaction discovery by integrating a Bayesian clustering algorithm into a model of prediction-enhanced miR-mRNA target inference, creating an algorithm called PEACOAT, which is written in the R language. I show that PEACOAT improves the inference of miR-mRNA target interactions using both simulated data and a data set of microarrays from samples of multiple myeloma patients. In simulated networks of 25 miRs and mRNAs, our methods using clustering can improve inference in roughly two-thirds of cases, and in the multiple myeloma data set, KEGG pathway enrichment was found to be more significant with clustering than without. Our findings are consistent with previous work in clustering of non-miR genetic networks and indicate that there could be a significant advantage to clustering of miR and mRNA expression data as a part of interaction inference.

JIB Publications
  • Godsey B. Discovery of miR-mRNA interactions via simultaneous Bayesian inference of gene networks and clusters using sequence-based predictions and expression data. J Integr Bioinform. 2013;10(1). doi 10.2390/biecoll-jib-2013-227; PubMed 23846182
Homepage

Systems biology plays a central role for biological network analysis in the post-genomic era. Cytoscape is the standard bioinformatics tool offering the community an extensible platform for computational analysis of the emerging cellular network together with experimental omics data sets. However, only few apps/plugins/tools are available for simulating network dynamics in Cytoscape 3. Many approaches of varying complexity exist but none of them have been integrated into Cytoscape as app/plugin yet. Here, we introduce PetriScape, the first Petri net simulator for Cytoscape. Although discrete Petri nets are quite simplistic models, they are capable of modeling global network properties and simulating their behaviour. In addition, they are easily understood and well visualizable. PetriScape comes with the following main functionalities: (1) import of biological networks in SBML format, (2) conversion into a Petri net, (3) visualization as Petri net, and (4) simulation and visualization of the token flow in Cytoscape. PetriScape is the first Cytoscape plugin for Petri nets. It allows a straightforward Petri net model creation, simulation and visualization with Cytoscape, providing clues about the activity of key components in biological networks.

JIB Publications
  • Almeida D, Azevedo V, Silva A, Baumbach J. PetriScape - A plugin for discrete Petri net simulations in Cytoscape. J Integr Bioinform. 2016;13(1):284. doi 10.2390/biecoll-jib-2016-284; PubMed 27402693
Homepage

Improvements in genome sequencing technology increased the availability of full genomes and transcriptomes of many organisms. However, the major benefit of massive parallel sequencing is to better understand the organization and function of genes which then lead to understanding of phenotypes. In order to interpret genomic data with automated gene annotation studies, several tools are currently available. Even though the accuracy of computational gene annotation is increasing, a combination of multiple lines of experimental evidences should be gathered. Mass spectrometry allows the identification and sequencing of proteins as major gene products; and it is only these proteins that conclusively show whether a part of a genome is a coding region or not to result in phenotypes. Therefore, in the field of proteogenomics, the validation of computational methods is done by exploiting mass spectrometric data. As a result, identification of novel protein coding regions, validation of current gene models, and determination of upstream and downstream regions of genes can be achieved. In this paper, we present new functionality for our proteogenomic tool, PGMiner which performs all proteogenomic steps like acquisition of mass spectrometric data, peptide identification against preprocessed sequence databases, assignment of statistical confidence to identified peptides, mapping confident peptides to gene models, and result visualization. The extensions cover determining proteotypic peptides and thus unambiguous protein identification. Furthermore, peptides conflicting with gene models can now automatically assessed within the context of predicted alternative open reading frames.

JIB Publications
  • Has C, Lashin SA, Kochetov A, Allmer J. PGMiner reloaded, fully automated proteogenomic annotation tool linking genomes to proteomes. J Integr Bioinform. 2016;13(4):16–23. doi 10.2390/biecoll-jib-2016-293; PubMed 28187409
Homepage

With the large variety of Proteomics workflows, as well as the large variety of instruments and data-analysis software available, researchers today face major challenges validating and comparing their Proteomics data. Here we present a new generation of the ProteinScape bioinformatics platform, now enabling researchers to manage Proteomics data from the generation and data warehousing to a central data repository with a strong focus on the improved accuracy, reproducibility and comparability demanded by many researchers in the field. It addresses scientists; current needs in proteomics identification, quantification and validation. But producing large protein lists is not the end point in Proteomics, where one ultimately aims to answer specific questions about the biological condition or disease model of the analyzed sample. In this context, a new tool has been developed at the Spanish Centro Nacional de Biotecnologia Proteomics Facility termed PIKE (Protein information and Knowledge Extractor) that allows researchers to control, filter and access specific information from genomics and proteomic databases, to understand the role and relationships of the proteins identified in the experiments. Additionally, an EU funded project, ProDac, has coordinated systematic data collection in public standards-compliant repositories like PRIDE. This will cover all aspects from generating MS data in the laboratory, assembling the whole annotation information and storing it together with identifications in a standardised format.

JIB Publications
  • Thiele H, Glandorf J, Hufnagel P. Bioinformatics strategies in life sciences: from data processing and data warehousing to biological knowledge extraction. J Integr Bioinform. 2010;7(1):141. doi 10.2390/biecoll-jib-2010-141; PubMed 20508300
Homepage

Type III Polyketide synthases (PKS) are family of proteins considered to have significant roles in the biosynthesis of various polyketides in plants, fungi and bacteria. As these proteins shows positive effects to human health, more researches are going on regarding this particular protein. Developing a tool to identify the probability of sequence being a type III polyketide synthase will minimize the time consumption and manpower efforts. In this approach, we have designed and implemented PKSIIIpred, a high performance prediction server for type III PKS where the classifier is Support Vector Machines (SVMs). Based on the limited training dataset, the tool efficiently predicts the type III PKS superfamily of proteins with high sensitivity and specificity. The PKSIIIpred is available at http://type3pks.in/prediction/. We expect that this tool may serve as a useful resource for type III PKS researchers. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.

JIB Publications
  • Mallika V, Sivakumar KC, Jaichand S, Soniya EV. Kernel based machine learning algorithm for the efficient prediction of type III polyketide synthase family of proteins. J Integr Bioinform. 2010;7(1). doi 10.2390/biecoll-jib-2010-143; PubMed 20625199
Homepage

Web application 
Structure prediction Protein secondary structure Sequence analysis Protein folds and structural domains Nucleic acid structure analysis 
During the last years several new tools applicable to protein analysis have made available on the IBIVU web site. Recently, a number of tools, ranging from multiple sequence alignment construction to domain prediction, have been updated and/or extended with services for programmatic access using SOAP. We provide an overview of these tools and their application.

JIB Publications Homepage bio.tools

MicroRNAs are short non-coding RNA transcripts that act as master cellular egulators with roles in orchestrating virtually all biological functions. The recent affordability and widespread use of high-throughput microRNA profiling technologies has grown along with the advancement of bioinformatics tools available for analysis of the mounting data flow. While there are many computational resources available for the management of data from genome sequenced animals, researchers are often faced with the challenge of identifying the biological implications of the daunting amount of data generated from these high-throughput technologies. In this article, we review the current state of highthroughput microRNA expression profiling platforms, data analysis processes, and computational tools in the context of comparative molecular physiology. We also present RBioMIR and RBioFS, our R package implementations for differential expression analysis and random forest-based gene selection. Detailed installation guides are available at kenstoreylab.com.

JIB Publications
  • Zhang J, Hadj-Moussa H, Storey KB. Current Progress of High-Throughput MicroRNA Differential Expression Analysis and Random Forest Gene Selection for Model and Non-Model Systems: an R Implementation. J Integr Bioinform. 2016;13(5). doi 10.2390/biecoll-jib-2016-306; PubMed 28187420
Homepage

MicroRNAs are short non-coding RNA transcripts that act as master cellular egulators with roles in orchestrating virtually all biological functions. The recent affordability and widespread use of high-throughput microRNA profiling technologies has grown along with the advancement of bioinformatics tools available for analysis of the mounting data flow. While there are many computational resources available for the management of data from genome sequenced animals, researchers are often faced with the challenge of identifying the biological implications of the daunting amount of data generated from these high-throughput technologies. In this article, we review the current state of highthroughput microRNA expression profiling platforms, data analysis processes, and computational tools in the context of comparative molecular physiology. We also present RBioMIR and RBioFS, our R package implementations for differential expression analysis and random forest-based gene selection. Detailed installation guides are available at kenstoreylab.com.

JIB Publications
  • Zhang J, Hadj-Moussa H, Storey KB. Current Progress of High-Throughput MicroRNA Differential Expression Analysis and Random Forest Gene Selection for Model and Non-Model Systems: an R Implementation. J Integr Bioinform. 2016;13(5). doi 10.2390/biecoll-jib-2016-306; PubMed 28187420
Homepage

Desktop application 
Systems biology Biochemistry Chemical biology Simulation experiment 
Reaction-diffusion systems are mathematical models that describe how the concentrations of substances distributed in space change under the influence of local chemical reactions, and diffusion which causes the substances to spread out in space. The classical representation of a reaction-diffusion system is given by semi-linear parabolic partial differential equations, whose solution predicts how diffusion causes the concentration field to change with time. This change is proportional to the diffusion coefficient. If the solute moves in a homogeneous system in thermal equilibrium, the diffusion coefficients are constants that do not depend on the local concentration of solvent and solute. However, in nonhomogeneous and structured media the assumption of constant intracellular diffusion coefficient is not necessarily valid, and, consequently, the diffusion coefficient is a function of the local concentration of solvent and solutes. In this paper we propose a stochastic model of reaction-diffusion systems, in which the diffusion coefficients are function of the local concentration, viscosity and frictional forces. We then describe the software tool Redi (REaction-DIffusion simulator) which we have developed in order to implement this model into a Gillespie-like stochastic simulation algorithm. Finally, we show the ability of our model implemented in the Redi tool to reproduce the observed gradient of the bicoid protein in the Drosophila Melanogaster embryo. With Redi, we were able to simulate with an accuracy of 1% the experimental spatio-temporal dynamics of the bicoid protein, as recorded in time-lapse experiments obtained by direct measurements of transgenic bicoidenhanced green fluorescent protein.

JIB Publications
  • Lecca P, Ihekwaba AEC, Dematté L, Priami C. Stochastic simulation of the spatio-temporal dynamics of reaction-diffusion systems: the case for the bicoid gradient. J Integr Bioinform. 2010;7(1). doi 10.2390/biecoll-jib-2010-150; PubMed 21098882
Homepage bio.tools

ReMatch is a web-based, user-friendly tool that constructs stoichiometric network models for metabolic flux analysis, integrating user-developed models into a database collected from several comprehensive metabolic data resources, including KEGG, MetaCyc and CheBI. Particularly, ReMatch augments the metabolic reactions of the model with carbon mappings to facilitate (13)C metabolic flux analysis. The construction of a network model consisting of biochemical reactions is the first step in most metabolic modelling tasks. This model construction can be a tedious task as the required information is usually scattered to many separate databases whose interoperability is suboptimal, due to the heterogeneous naming conventions of metabolites in different databases. Another, particularly severe data integration problem is faced in (13)C metabolic flux analysis, where the mappings of carbon atoms from substrates into products in the model are required. ReMatch has been developed to solve the above data integration problems. First, ReMatch matches the imported user-developed model against the internal ReMatch database while considering a comprehensive metabolite name thesaurus. This, together with wild card support, allows the user to specify the model quickly without having to look the names up manually. Second, ReMatch is able to augment reactions of the model with carbon mappings, obtained either from the internal database or given by the user with an easy-touse tool. The constructed models can be exported into 13C-FLUX and SBML file formats. Further, a stoichiometric matrix and visualizations of the network model can be generated. The constructed models of metabolic networks can be optionally made available to the other users of ReMatch. Thus, ReMatch provides a common repository for metabolic network models with carbon mappings for the needs of metabolic flux analysis community. ReMatch is freely available for academic use at http://www.cs.helsinki.fi/group/sysfys/software/rematch/.

JIB Publications
  • Pitkänen E, Åkerlund A, Rantanen A, Jouhten P, Ukkonen E. ReMatch: a web-based tool to construct, store and share stoichiometric metabolic models with carbon maps for metabolic flux analysis. J Integr Bioinform. 2008;5(2). doi 10.2390/biecoll-jib-2008-102; PubMed 20134058
Homepage

The rapid expansion of biomedical literature has provoked an increased development of advanced text mining tools to rapidly extract relevant events from the continuously increasing amount of knowledge published periodically in PubMed. However, bioinvestigators are still reluctant to use these tools for two reasons: i) a large volume of events is often extracted upon a query, and this volume is hard to manage, and ii) background events dominate search results and overshadow more pertinent published information, especially for domain experts. In this paper, we propose an approach that incorporates the temporal dimension of published events to the process of information extraction to improve data selection and prioritize more pertinent periodically published knowledge for scientists. Indeed, instead of providing the total knowledge associated with a PubMed query, which is usually a mix of trivial background information and non-background information, we propose a method that incorporates time and selects non background and highly relevant biological entities and events published over time for bioinvestigators. Before excluding background events from the total knowledge extracted, a quantification of their amount is also provided. This work is illustrated by a case study regarding Hepcidin gene publications over a decade, a duration that is sufficiently long enough to generate alternative views on the overall data extracted.

JIB Publications
  • Ameline de cadeville B, Loréal O, Moussouni-marzolf F. RetroMine, or how to provide in-depth retrospective studies from Medline in a glance: the hepcidin use-case. J Integr Bioinform. 2015;12(3):275. doi 10.2390/biecoll-jib-2015-275; PubMed 26673791
Homepage

Understanding how metabolic reactions translate the genome of an organism into its phenotype is a grand challenge in biology. Genome-wide association studies (GWAS) statistically connect genotypes to phenotypes, without any recourse to known molecular interactions, whereas a molecular mechanistic description ties gene function to phenotype through gene regulatory networks (GRNs), protein-protein interactions (PPIs) and molecular pathways. Integration of different regulatory information levels of an organism is expected to provide a good way for mapping genotypes to phenotypes. However, the lack of curated metabolic model of rice is blocking the exploration of genome-scale multi-level network reconstruction. Here, we have merged GRNs, PPIs and genome-scale metabolic networks (GSMNs) approaches into a single framework for rice via omics’ regulatory information reconstruction and integration. Firstly, we reconstructed a genome-scale metabolic model, containing 4,462 function genes, 2,986 metabolites involved in 3,316 reactions, and compartmentalized into ten subcellular locations. Furthermore, 90,358 pairs of protein-protein interactions, 662,936 pairs of gene regulations and 1,763 microRNA-target interactions were integrated into the metabolic model. Eventually, a database was developped for systematically storing and retrieving the genome-scale multi-level network of rice. This provides a reference for understanding genotype-phenotype relationship of rice, and for analysis of its molecular regulatory network.

JIB Publications
  • Liu L, Mei Q, Yu Z, Sun T, Zhang Z, Chen M. An integrative bioinformatics framework for genome-scale multiple level network reconstruction of rice. J Integr Bioinform. 2013;10(2):223. doi 10.2390/biecoll-jib-2013-223; PubMed 23563093
Homepage

A significant part of cellular proteins undergo reversible thiol-dependent redox transitions which often control or switch protein functions. Thioredoxins and glutaredoxins constitute two key players in this redox regulatory protein network. Both interact with various categories of proteins containing reversibly oxidized cysteinyl residues. The identification of thioredoxin/glutaredoxin target proteins is a critical step in constructing the redox regulatory network of cells or subcellular compartments. Due to the scarcity of thioredoxin/glutaredoxin target protein records in the public database, a tool called Reversibly Oxidized Cysteine Detector (ROCD) is implemented here to identify potential thioredoxin/glutaredoxin target proteins computationally, so that the in silico construction of redox regulatory network may become feasible. ROCD was tested on 46 thioredoxin target proteins in plant mitochondrion, and the recall rate was 66.7% when 50% sequence identity was chosen for structural model selection. ROCD will be used to predict the thioredoxin/glutaredoxin target proteins in human liver mitochondrion for our redox regulatory network construction project. The ROCD will be developed further to provide prediction with more reliability and incorporated into biological network visualization tools as a node prediction component. This work will advance the capability of traditional database- or text mining-based method in the network construction.

JIB Publications
  • Lee HM, Dietz KJ, Hofestädt R. Prediction of thioredoxin and glutaredoxin target proteins by identifying reversibly oxidized cysteinyl residues. J Integr Bioinform. 2010;7(3). doi 10.2390/biecoll-jib-2010-130; PubMed 20375441

SAD_BaSe is a blood bank data analysis software, created to assist in the management of blood donations and the blood production chain in blood establishments. In particular, the system keeps track of several collection and production indicators, enables the definition of collection and production strategies, and the measurement of quality indicators required by the Quality Management System regulating the general operation of blood establishments. This paper describes the general scenario of blood establishments and its main requirements in terms of data management and analysis. It presents the architecture of SAD_BaSe and identifies its main contributions. Specifically, it brings forward the generation of customized reports driven by decision making needs and the use of data mining techniques in the analysis of donor suspensions and donation discards.

JIB Publications
  • Ramoa A, Maia S, Lourenço A. A rational framework for production decision making in blood establishments. J Integr Bioinform. 2012;9(3):204. doi 10.2390/biecoll-jib-2012-204; PubMed 22829575

Advances in bioinformatics have contributed towards a significant increase in available information. Information analysis requires the use of distributed computing systems to best engage the process of data analysis. This study proposes a multiagent system that incorporates grid technology to facilitate distributed data analysis by dynamically incorporating the roles associated to each specific case study. The system was applied to genetic sequencing data to extract relevant information about insertions, deletions or polymorphisms.

JIB Publications
  • González R, Zato C, Benito R, et al. Automatic knowledge extraction in sequencing analysis with multiagent system and grid computing. J Integr Bioinform. 2012;9(3):206. doi 10.2390/biecoll-jib-2012-206; PubMed 22829577

During the last years several new tools applicable to protein analysis have made available on the IBIVU web site. Recently, a number of tools, ranging from multiple sequence alignment construction to domain prediction, have been updated and/or extended with services for programmatic access using SOAP. We provide an overview of these tools and their application.

JIB Publications Homepage

The identification of genes and SNPs involved in human diseases remains a challenge. Many public resources, databases and applications, collect biological data and perform annotations, increasing the global biological knowledge. The need of SNPs prioritization is emerging with the development of new high-throughput genotyping technologies, which allow to develop customized disease-oriented chips. Therefore, given a list of genes related to a specific biological process or disease as input, a crucial issue is finding the most relevant SNPs to analyse. The selection of these SNPs may rely on the relevant a-priori knowledge of biomolecular features characterising all the annotated SNPs and genes of the provided list. The bioinformatics approach described here allows to retrieve a ranked list of significant SNPs from a set of input genes, such as candidate genes associated with a specific disease. The system enriches the genes set by including other genes, associated to the original ones by ontological similarity evaluation. The proposed method relies on the integration of data from public resources in a vertical perspective (from genomics to systems biology data), the evaluation of features from biomolecular knowledge, the computation of partial scores for SNPs and finally their ranking, relying on their global score. Our approach has been implemented into a web based tool called SNPRanker, which is accessible through at the URL http://www.itb.cnr.it/snpranker . An interesting application of the presented system is the prioritisation of SNPs related to genes involved in specific pathologies, in order to produce custom arrays.

JIB Publications
  • Calabria A, Mosca E, Viti F, Merelli I, Milanesi L. SNPRanker: a tool for identification and scoring of SNPs associated to target genes. J Integr Bioinform. 2010;7(3). doi 10.2390/biecoll-jib-2010-138; PubMed 20375450
Homepage

Command-line tool 
Systems biology Molecular interactions, pathways and networks Genomics 
The generation and use of metabolic network reconstructions has increased over recent years. The development of such reconstructions has typically involved a time-consuming, manual process. Recent work has shown that steps undertaken in reconstructing such metabolic networks are amenable to automation. The SuBliMinaL Toolbox (http://www.mcisb.org/subliminal/) facilitates the reconstruction process by providing a number of independent modules to perform common tasks, such as generating draft reconstructions, determining metabolite protonation state, mass and charge balancing reactions, suggesting intracellular compartmentalisation, adding transport reactions and a biomass function, and formatting the reconstruction to be used in third-party analysis packages. The individual modules manipulate reconstructions encoded in Systems Biology Markup Language (SBML), and can be chained to generate a reconstruction pipeline, or used individually during a manual curation process. This work describes the individual modules themselves, and a study in which the modules were used to develop a metabolic reconstruction of Saccharomyces cerevisiae from the existing data resources KEGG and MetaCyc. The automatically generated reconstruction is analysed for blocked reactions, and suggestions for future improvements to the toolbox are discussed.

JIB Publications
  • Swainston N, Smallbone K, Mendes P, Kell DB, Paton NW. The SuBliMinaL Toolbox: automating steps in the reconstruction of metabolic networks. J Integr Bioinform. 2011;8(2). doi 10.2390/biecoll-jib-2011-186; PubMed 22095399
Homepage bio.tools

Visualization is pivotal for gaining insight in systems biology data. As the size and complexity of datasets and supplemental information increases, an efficient, integrated framework for general and specialized views is necessary. MAYDAY is an application for analysis and visualization of general 'omics' data. It follows a trifold approach for data visualization, consisting of flexible data preprocessing, highly customizable data perspective plots for general purpose visualization and systems based plots. Here, we introduce two new systems biology visualization tools for MAYDAY. Efficiently implemented genomic viewers allow the display of variables associated with genomic locations. Multiple variables can be viewed using our new track-based ChromeTracks tool. A functional perspective is provided by visualizing metabolic pathways either in KEGG or BioPax format. Multiple options of displaying pathway components are available, including Systems Biology Graphical Notation (SBGN) glyphs. Furthermore, pathways can be viewed together with gene expression data either as heatmaps or profiles. We apply our tools to two 'omics' datasets of Pseudomonas aeruginosa. The general analysis and visualization tools of MAYDAY as well as our ChromeTracks viewer are applied to a transcriptome dataset. We furthermore integrate this dataset with a metabolome dataset and compare the activity of amino acid degradation pathways between these two datasets, by visually enhancing the pathway diagrams produced by MAYDAY.

JIB Publications
  • Symonsy S, Zipplies C, Battke F, Nieselt K. Integrative Systems Biology Visualization with MAYDAY. J Integr Bioinform. 2010;7(3):1–14. doi 10.2390/biecoll-jib-2010-115; PubMed 20375461
Homepage

The huge and dynamic amount of bioinformatic resources (e.g., data and tools) available nowadays in Internet represents a big challenge for biologists –for what concerns their management and visualization– and for bioinformaticians –for what concerns the possibility of rapidly creating and executing in-silico experiments involving resources and activities spread over the WWW hyperspace. Any framework aiming at integrating such resources as in a physical laboratory has imperatively to tackle –and possibly to handle in a transparent and uniform way– aspects concerning physical distribution, semantic heterogeneity, co-existence of different computational paradigms and, as a consequence, of different invocation interfaces (i.e., OGSA for Grid nodes, SOAP for Web Services, Java RMI for Java objects, etc.). The framework UBioLab has been just designed and developed as a prototype following the above objective. Several architectural features –as those ones of being fully Web-based and of combining domain ontologies, Semantic Web and workflow techniques– give evidence of an effort in such a direction. The integration of a semantic knowledge management system for distributed (bioinformatic) resources, a semantic-driven graphic environment for defining and monitoring ubiquitous workflows and an intelligent agent-based technology for their distributed execution allows UBioLab to be a semantic guide for bioinformaticians and biologists providing (i) a flexible environment for visualizing, organizing and inferring any (semantics and computational) "type" of domain knowledge (e.g., resources and activities, expressed in a declarative form), (ii) a powerful engine for defining and storing semantic-driven ubiquitous in-silico experiments on the domain hyperspace, as well as (iii) a transparent, automatic and distributed environment for correct experiment executions.

JIB Publications
  • Bartocci E, Cacciagrano D, Di berardini MR, Merelli E, Vito L. UBioLab: a web-laboratory for ubiquitous in-silico experiments. J Integr Bioinform. 2012;9(1):192. doi 10.2390/biecoll-jib-2012-192; PubMed 22773116
Homepage

Workflow 
Phylogenetics Genomics Sequence analysis Protein structure analysis 
Unipro UGENE is an open-source bioinformatics toolkit that integrates popular tools along with original instruments for molecular biologists within a unified user interface. Nowadays, most bioinformatics desktop applications, including UGENE, make use of a local data model while processing different types of data. Such an approach causes an inconvenience for scientists working cooperatively and relying on the same data. This refers to the need of making multiple copies of certain files for every workplace and maintaining synchronization between them in case of modifications. Therefore, we focused on delivering a collaborative work into the UGENE user experience. Currently, several UGENE installations can be connected to a designated shared database and users can interact with it simultaneously. Such databases can be created by UGENE users and be used at their discretion. Objects of each data type, supported by UGENE such as sequences, annotations, multiple alignments, etc., can now be easily imported from or exported to a remote storage. One of the main advantages of this system, compared to existing ones, is the almost simultaneous access of client applications to shared data regardless of their volume. Moreover, the system is capable of storing millions of objects. The storage itself is a regular database server so even an inexpert user is able to deploy it. Thus, UGENE may provide access to shared data for users located, for example, in the same laboratory or institution. UGENE is available at: http://ugene.net/download.html.

JIB Publications
  • Protsyuk IV, Grekhov GA, Tiunov AV, Fursov MY. Shared bioinformatics databases within the Unipro UGENE platform. J Integr Bioinform. 2015;12(1):257. doi 10.2390/biecoll-jib-2015-257; PubMed 26527191
Homepage bio.tools

Desktop application 
Systems biology Molecular interactions, pathways and networks Data visualisation Biomedical science 
VANESA is a modeling software for the automatic reconstruction and analysis of biological networks based on life-science database information. Using VANESA, scientists are able to model any kind of biological processes and systems as biological networks. It is now possible for scientists to automatically reconstruct important molecular systems with information from the databases KEGG, MINT, IntAct, HPRD, and BRENDA. Additionally, experimental results can be expanded with database information to better analyze the investigated elements and processes in an overall context. Users also have the possibility to use graph theoretical approaches in VANESA to identify regulatory structures and significant actors within the modeled systems. These structures can then be further investigated in the Petri net environment of VANESA. It is platform-independent, free-of-charge, and available at http://vanesa.sf.net.

Screenshot of VANESAScreenshot of VANESAScreenshot of VANESA
JIB Publications
  • Brinkrolf C, Janowski SJ, Kormeier B, et al. VANESA - a software application for the visualization and analysis of networks in system biology applications. J Integr Bioinform. 2014;11(2):239. doi 10.2390/biecoll-jib-2014-239; PubMed 24953454
  • Brinkrolf C, Henke NA, Ochel L, Pucker B, Kruse O, Lutter P. Modeling and Simulating the Aerobic Carbon Metabolism of a Green Microalga Using Petri Nets and New Concepts of VANESA.. J Integr Bioinform. 2018;15(3). doi 10.1515/jib-2018-0018; PubMed 30218605
  • Kormeier B, Hippe K, Arrigo P, Töpel T, Janowski S, Hofestädt R. Reconstruction of biological networks based on life science data integration. J Integr Bioinform. 2010;7(2). doi 10.2390/biecoll-jib-2010-146; PubMed 20978286
  • Hamzeiy H, Suluyayla R, Brinkrolf C, Janowski SJ, Hofestaedt R, Allmer J. Visualization and Analysis of MicroRNAs within KEGG Pathways using VANESA.. J Integr Bioinform. 2017;14(1). doi 10.1515/jib-2016-0004; PubMed 28609293
Homepage bio.tools

We have developed a tool for the visualization of temporal changes of disease patterns, using stacks of medical images collected in time-series experiments. With this tool, users can generate 3D surface models representing disease patterns and observe changes over time in size, shape, and location of clinically significant image patterns. Statistical measurements of the volume of the observed disease patterns can be performed simultaneously. Spatial data integration occurs through the combination of 2D slices of an image stack into a 3D surface model. Temporal integration occurs through the sequential visualization of the 3D models from different time points. Visual integration enables the tool to show 2D images, 3D models and statistical data simultaneously. As an example, the tool has been used to visualize brain MRI scans of several multiple sclerosis patients. It has been developed in Java™, to ensure portability and platform independence, with a user-friendly interface and can be downloaded free of charge for academic users.

JIB Publications
  • Soh J, Xiao M, Do T, Meruvia-Pastor O, Sensen CW. Integrative visualization of temporally varying medical image patterns. J Integr Bioinform. 2011;8(2):161. doi 10.2390/biecoll-jib-2011-161; PubMed 21778531
Homepage

Web application 
Sequence sites, features and motifs Sequence analysis Database management Gene and protein families Molecular modelling 
During the last years several new tools applicable to protein analysis have made available on the IBIVU web site. Recently, a number of tools, ranging from multiple sequence alignment construction to domain prediction, have been updated and/or extended with services for programmatic access using SOAP. We provide an overview of these tools and their application.

JIB Publications Homepage bio.tools

Structure, is a widely used software tool to investigate population genetic structure with multi-locus genotyping data. The software uses an iterative algorithm to group individuals into "K" clusters, representing possibly K genetically distinct subpopulations. The serial implementation of this programme is processor-intensive even with small datasets. We describe an implementation of the program within a parallel framework. Speedup was achieved by running different replicates and values of K on each node of the cluster. A web-based user-oriented GUI has been implemented in PHP, through which the user can specify input parameters for the programme. The number of processors to be used can be specified in the background command. A web-based visualization tool "Visualstruct", written in PHP (HTML and Java script embedded), allows for the graphical display of population clusters output from Structure, where each individual may be visualized as a line segment with K colors defining its possible genomic composition with respect to the K genetic sub-populations. The advantage over available programs is in the increased number of individuals that can be visualized. The analyses of real datasets indicate a speedup of up to four, when comparing the speed of execution on clusters of eight processors with the speed of execution on one desktop. The software package is freely available to interested users upon request.

JIB Publications
  • Jayashree B, Rajgopal S, Hoisington D, Prasanth VP, Chandra S. WebStruct and VisualStruct: Web interfaces and visualization for Structure software implemented in a cluster environment. J Integr Bioinform. 2008;5(1). doi 10.2390/biecoll-jib-2008-89; PubMed 20134055

The construction of integrated datasets from potentially hundreds of sources with bespoke formats, and their subsequent visualization and analysis, is a recurring challenge in systems biology. We present WIBL, a visualization and model development environment initially geared towards logic-based modelling of biological systems using integrated datasets. WIBL combines data integration, visualisation and modelling in a single portal-based workbench providing a comprehensive solution for interdisciplinary systems biology projects.

JIB Publications
  • Lesk V, Taubert J, Rawlings C, Dunbar S, Muggleton S. WIBL: Workbench for Integrative Biological Learning. J Integr Bioinform. 2011;8(2). doi 10.2390/biecoll-jib-2011-156; PubMed 21705808
Homepage

Scientific legacy workflows are often developed over many years, poorly documented and implemented with scripting languages. In the context of our cross-disciplinary projects we face the problem of maintaining such scientific workflows. This paper presents the Workflow Instrumentation for Structure Extraction (WISE) method used to process several ad-hoc legacy workflows written in Python and automatically produce their workflow structural skeleton. Unlike many existing methods, WISE does not assume input workflows to be preprocessed in a known workflow formalism. It is also able to identify and analyze calls to external tools. We present the method and report its results on several scientific workflows.

JIB Publications Homepage