Intelligent data analysis aspires to be more effective representation and explanation of the complex relationships between scientific phenomenons and other types of factors. In addition, intelligent data analysis makes for faster planning and implementation of guidelines. Workflow systems have been used for years in scientific studies that combine computer science with other scientific fields, for instance Chemistry, Biology, Generic, Materials Modeling and Simulation, drug discovery, physics etc. and some have even been applied to business informatics and analytics. There are different workflow systems depending on the needs. In our case, the workflow tools most commonly used are Pipeline Pilot.
1, Taverna and KNIME 2. Some of the most illustrative visual workflow systems are presented in more detail below: Pipeline Pilot: In Pipeline Pilot, users can graphically compose protocols,using hundreds of different configurable ”components” for operations such as data retrieval, manipulation, computational filtering, and display 4. Initially, Pilot Pipeline was mainly used in chemistry. It is now being used in a variety of scientific areas, such as NGS and imaging, because Pipeline Pilot can scale into large-scale development projects like this.
The use of Pilot Pipeline does not require programming expertise to introduce new types into the internal code. Users must have a high level of confidence in the security of their corporate collections. Pipeline Pilot is designed to address security issues. Pilot Pipeline is an “expensive” tool with a specialized label element that does not easily respond to wider challenges KNIME: KNIME initially started as a tool mainly used in chemical chemistry, and with the passage of time began to be used by other industries, making it a tool used by a wide range of contributors from software vendors, academia to banks, telecommunication organizations, pharmaceutical institutes and even from customer relationship managers. Moreover, KNIME has a graphical user interface for combining ”nodes” 5. Collections of nodes are known as ”extensions”.
KNIME is based on the Eclipse 6 open source platform, and Java. Java is part of the Pilot Pipeline and includes two types of functions that are designed for developers and non-developers. On the one hand, it is possible to create new components with the Java components API or to register new clients with the Java SDK and on the other hand to provide programming language for both the developers and the non-IT scientists respectively. This is because it supports programming languages but also provides drag and drop activities instead of coding. KNIME relies on the philosophy that data must be an open product and science is part of this process. However, this tool is not suitable for data visualization. KNIME is not sold, it is an open- source tool. The business model 7 includes licensing business information that allows users to share workflows and create web portals.
Taverna: Taverna is a common name used for a scientific workflow system comprising Taverna Workbench graphical workflow authoring client, together with SCUFL 8 workflow representation language, and Freefluo 9 enactment engine. Developed by the University of Manchester to be used mainly for scientific reasons. The main goal of the Taverna is the organization of services to a useful data collection to meet tools and databases available on the Internet and serve the needs of bio- Informatica technology. Furthermore, another aim is to create scientific workflows with many remote web services. The only control structures that this tool contains are coordinating links and conditional construction.
The Taverna tool displays some special features, such as large data sets support, traceability and support for embedded workflows. Workflows can be run on local machines or on a distributed computer infrastructure, for example through cloud technology via the Taverna server. An installation of it the server provides access to a workflow collection. However, in this execution state, users can not edit their published workflows on the server, nor add new workflows to the set of workflows developed on the server 10. Having given the condensed descriptions and features of the most elusive workflow systems, it is also important to provide the limitations and gaps that allow us to work on them in another base.
Although the aforementioned systems are easy to use and handle their discretionary skills covering a wide range of functions, they are not suitable for particularly large and complex processes because the spatial capacity is limited for the data set 11