about ETL TOOLS

from: http://www.etltools.net/

and http://www.etltools.net/etl-tools-comparison.html

Nowadays, most companies’ existence depends on data flow. When plenty of information is generally accessible and one can find almost everything he needs, managing became easier than ever before. The Internet simplifies cooperation – time needed to send and receive requested data gets shorter as more and more institutions computerize their resources. Also, the communication between separate corporation departments became easier – no one needs to send normal letters (or even the office boys) as the process is replaced by e-mails. Although the new ways of communication improved and facilitated managing, the ubiquitous computerization has its significant disadvantages.

The variety of data – as positive phenomenon as possible – got a little bit out of control. The unlimited growth of databases’ size caused mess that often slows down (or even disable) data finding process.

It’s all about effective information storing. Uncategorized data is assigned to different platforms and systems. As a consequence, finding wanted data brings a lot of troubles – user needs to know what data he administers, where it is located (and whether he has proper access), finally how to take them out.
Wrong was someone who thought that the hardest task was making decisions basing on data. No – finding data itself is often much more annoying. But users are not the only ones suffering for databases’ overgrowth. The IT departments – usually responsible for keeping the systems work – have to struggle with data in different formats and systems. ‘Keeping it alive’ is extremely time-consuming what delays the company’s work.
Slow (or sometimes omitted at all) transformation of data causes that it’s usually impossible to provide demanded information in demanded time. Formed divergence between data provided and data really existing in the moment of need harms the IT departments’ image.

To achieve better results, companies invest in external systems – computing power and resources. Not enough power causes lacks of synchronization of data. Transporting information between separate units lasts too long to work effectively. On the other side, computing power increasing – that might be an example solution – is expensive and lead to overgrowth of the operation costs.
Supposing that example company managed to prepare well-working database responsible for supporting designated operation. A lot of money and time got spent. Everything seems wonderful until it comes to another operation. Suddenly it appears that once created system doesn’t really fit the requirements of new operation and the best idea is to create a new system from the beginning. Yes, modifications might be made but there is no single developer common for all parts of the projects, so it demands cooperation of at least a few subjects – that hardly disables the idea.

ETL PROCESS

ETL process
The three-stage ETL process and the ETL tools implementing the concept might be a response for the needs described above.

The ‘ETL’ shortcut comes from ‘Extract, transform, and load’ – the words that describe the idea of the system. The ETL tools were created to improve and facilitate data warehousing.

    The Etl process consists of the following steps:

  1. Initiation
  2. Build reference data
  3. Extract from sources
  4. Validate
  5. Transform
  6. Load into stages tables
  7. Audit reports
  8. Publish
  9. Archive
  10. Clean up

Sometimes those steps are supervised and performed indirectly but its very time-consuming and may be not so accurate.
The purpose of using ETL Tools is to save the time and make the whole process more reliable.

ETL TOOLS

The times of increasing data-dependence forced a lot of companies to invest in complicated data warehousing systems. Their differentiation and incompatibility led to an uncontrolled growth of costs and time needed to coordinate all the processes. The ETL (Extract, transform, load) tools were created to simplify the data management with simultaneous reduction of absorbed effort.

Depending on the needs of customers there are several types of tools.
One of them perform and supervise only selected stages of the ETL process like data migration tools(EtL Tools , “small t”tools) , data transformation tools(eTl Tools , “capital T”tools).Another are complete (ETL Tools ) and have many functions that are intended for processing large amounts of data or more complicated ETL projects.

Some of them like server engine tools execute many ETL steps at the same time from more than one developer , while other like client engine tools are simpler and execute ETL routines on the same machine as they are developed.
There are two more types. First called code base tools is a family of programing tools which allow you to work with many operating systems and programing languages.The second one called GUI base tools remove the coding layer and allow you to work without any knowledge (in theory) about coding languages.

How do the ETL tools work?

The first task is data extraction from internal or external sources. After sending queries to the source system data may go indirectly to the database. However usually there is a need to monitor or gather more information and then go to Staging Area . Some tools extract only new or changed information automatically so we dont have to update it by our own.
The second task is transformation which is a broad category:
-transforming data into a stucture wich is required to continue the operation (extracted data has usually a sructure typicall to the source)
-sorting data
-connecting or separating
-cleansing
-checking quality

The third task is loading into a data warehouse.

As you can see the ETL Tools have many other capabilities (next to the main three: extraction , transformation and loading) like for instance sorting , filtering , data profiling , quality control, cleansing , monitoring , synchronization and consolidation.

ETL TOOLS PROVIDERS

Here is a list of the most popular comercial and freeware(open-sources) ETL Tools.

    Comercial ETL Tools:

  • IBM Infosphere DataStage
  • Informatica PowerCenter
  • Oracle Warehouse Builder (OWB)
  • Oracle Data Integrator (ODI)
  • SAS ETL Studio
  • Business Objects Data Integrator(BODI)
  • Microsoft SQL Server Integration Services(SSIS)
  • Ab Initio
    Freeware, open source ETL tools:

  • Pentaho Data Integration (Kettle)
  • Talend Integrator Suite
  • CloverETL
  • Jasper ETL

As you can see there are many types of ETL Tools and all you have to do right now is to choose appropriate one for you. Some of them are relatively quite expensive, some may be too complex, if you dont want to transform a lot of information or use many sources or use sophisticated features.
It is always necessary to start with defining the business requirements, then consider the technical aspects and then choose the right ETL tool.

ETL TOOLS – GENERAL INFORMATION

ETL tools are designed to save time and money by eliminating the need of ‘hand-coding’ when a new data warehouse is developed. They are also used to facilitate the work of the database administrators who connect different branches of databases as well as integrate or change the existing databases.

    The main purpose of the ETL tool is:

  • extraction of the data from legacy sources (usually heterogenous)
  • data transformation (data optimized for transaction –> data optimized for analysis)
  • synchronization and cleansing of the data
  • loading the data into data warehouse.

There are several requirements that must be had by ETL tools in order to deliver an optimal value to users, supporting a full range of possible scenarios.

Those are:
– data delivery and transformation capabilities
– data and metadata modelling capabilities
– data source and target support
– data governance capability
– runtime platform capabilities
– operations and administration capabilities
– service-enablements capability.

ETL TOOLS COMPARISON CRITERIA

The research presented in this article is based on Gartner’s data integration magic quadrant, forrester researches and our professional experience. The etltools.org portal isnot affiliated with any of the companies listed below in the comparison.

The research inclusion and exclusion criteria are as follows:
– range and mode of connectivity/adapter support
– data transformation and delivery modes support
– metadata and data modelling support
– design, development and data governance support
– runtime platform support
– enablement of service and three additional requirements for vendors:
– $20 milion or more of software revenue from data integration tools every year or not less than 300 production customers
– support of customers in not less than two major geographic regions
– have customer implementations at crossdepartamental and multiproject level.

ETL TOOLS COMPARISON

The information provided below lists major strengths and weaknesses of the most popular ETL vendors.

IBM (Information Server Infosphere platform)

    Advantages:

  • strongest vision on the market, flexibility
  • progress towards common metadata platform
  • high level of satisfaction from clients and a variety of initiatives
    Disadvantages:

  • difficult learning curve
  • long implementation cycles
  • became very heavy (lots of GBs) with version 8.x and requires a lot of processing power

Informatica PowerCenter

    Advantages:

  • most substantial size and resources on the market of data integration tools vendors
  • consistent track record, solid technology, straightforward learning curve, ability to address real-time data integration schemes
  • Informatica is highly specialized in ETL and Data Integration and focuses on those topics, not on BI as a whole
  • focus on B2B data exchange
    Disadvantages:

  • several partnerships diminishing the value of technologies
  • limited experience in the field.

Microsoft (SQL Server Integration Services)

    Advantages:

  • broad documentation and support, best practices to data warehouses
  • ease and speed of implementation
  • standardized data integration
  • real-time, message-based capabilities
  • relatively low cost – excellent support and distribution model
    Disadvantages:

  • problems in non-Windows environments. Takes over all Microsoft Windows limitations.
  • unclear vision and strategy

Oracle (OWB and ODI)

    Advantages:

  • based on Oracle Warehouse Builder and Oracle Data Integrator – two very powerful tools;
  • tight connection to all Oracle datawarehousing applications;
  • tendency to integrate all tools into one application and one environment.
    Disadvantages:

  • focus on ETL solutions, rather than in an open context of data management;
  • tools are used mostly for batch-oriented work, transformation rather than real-time processes or federation data delivery;
  • long-awaited bond between OWB and ODI brought only promises – customers confused in the functionality area and the future is uncertain

SAP BusinessObjects (Data Integrator / Data Services)

    Advantages:

  • integration with SAP
  • SAP Business Objects created a firm company determined to stir the market;
  • Good data modeling and data-management support;
  • SAP Business Objects provides tools for data mining and quality; profiling due to many acquisitions of other companies.
  • Quick learning curve and ease of use
    Disadvantages:

  • SAP Business Objects is seen as two different companies
  • Uncertain future. Controversy over deciding which method of delivering data integration to use (SAP BW or BODI).
  • BusinessObjects Data Integrator (Data Services) may not be seen as a stand-alone capable application to some organizations.

SAS

    Advantages:

  • experienced company, great support and most of all very powerful data integration tool with lots of multi-management features
  • can work on many operating systems and gather data through number of sources – very flexible
  • great support for the business-class companies as well for those medium and minor ones
    Disadvantages:

  • misplaced sales force, company is not well recognized
  • SAS has to extend influences to reach non-BI community
  • Costly

Sun Microsystems

    Advantages:

  • Data integration tools are a part of huge Java Composite Application Platform Suite – very flexible with ongoing development of the products
  • ‘Single-view’ services draw together data from variety of sources; small set of vendors with a strong vision
    Disadvantages:

  • relative weakness in bulk data movement
  • limited mindshare in the market
  • support and services rated below adequate

Sybase

    Advantages:

  • assembled a range of capabilities to be able to address a mulitude of data delivery styles
  • size and global presence of Sybase create opportunities in the market
  • pragmatic near-term strategy – better of current market demand
  • broad partnerships with other data quality and data integration tools vendors
    Disadvantages:

  • falls behind market leaders and large vendors
  • gaps in many aspects of data management

Syncsort

    Advantages:

  • functionality; well-known brand on the market (40 years experience); loyal customer and experience base;
  • easy implementation, strong performance, targeted functionality and lower costs
    Disadvantages:

  • struggle with gaining mind share in the market
  • lack of support for other than ETL delivery styles
  • unsatisfactory with lack of capability of professional services

Tibco Software

    Advantages:

  • message-oriented application integration; capabilities based on common SOA structures;
  • support for federated views; easy implementation, support andperformance
    Disadvantages:

  • scarce references from customers; not widely enough recognised for data integration competencies
  • lacking in data quality capabilities.

ETI

    Advantages:

  • proven and mature code-generating architecture
  • one of the earliest vendors on the data integration market; support for SOA service-oriented deployments;
  • successfully deals with large data volumes and a high degree of complexity, extension of the range of data platforms and data sources;
  • customers’ positive responses to ETI technology
    Disadvantages:

  • relatively slow growth of customer base
  • rather not attractive and inventive technology.

iWay Software

    Advantages:

  • offers physical data movement and delivery; support of wide range of adapters and access to numerous sources;
  • well integrated, standard tools;
  • reasonable ease of implementation effort
    Disadvantages:

  • gaps in specific capabilities
  • relatively costly – not competitive versus market leaders

Pervasive Software

    Advantages:

  • many customers, years of experience, solid applications and support;
  • good use of metadata
  • upgrade from older versions into newer is straightforward.
    Disadvantages:

  • inconsistency in defining the target for their applications;
  • no federation capability;
  • limitated presence due to poor marketing.

Open Text

    Advantages

  • Simplicity of use in less-structured sources
  • Easy licensing for business solutions
  • cooperates with a wide range of sources and targets
  • increasingly high functionality
    Disadvantages:

  • limited federation, replication and data quality support; rare upgrades due to its simplicity;
  • weak real-time support due to use third party solutions and other database utilities.

Pitney Bowes Software

    Advantages:

  • Data Flow concentrates on data integrity and quality;
  • supports mainly ETL patterns; can be used for other purposes too;
  • ease of use, fast implementation, specific ETL functionality.
    Disadvantages:

  • rare competition with other major companies, repeated rebranding trigger suspicions among customers.
  • narrow vision of possibilities even though Data Flow comes with variety of applications.
  • weak support, unexperienced service.

Be the first to comment on "about ETL TOOLS"

Leave a comment