A centralized programming language is a declarative, data centric programming language designed in 2000 to allow a team of programmers to process large data across a high performance computing cluster without the programmer being involved in many of the lower level, imperative decisions. [1] [2]
History
ECL was originally designed and developed in 2000 by David Bayliss as an in-house productivity tool within Seisint Inc. and was considered to be a ‘secret weapon’ that allowed to gain market share in its data business. Equifax had an SQL-based process for predicting who would go to the next 30 days, but it took 26 days to run the data. The first ECL implementation solved the same problem in 6 minutes. The technology was cited as a driving force behind the acquisition of Seisint by LexisNexis and then again as a major source of synergies when LexisNexis Acquired ChoicePoint Inc. [3]
Language constructs
ECL, at least in its purest form, is a declarative, data centric language. Programs, in the strict sense, do not exist. Rather an ECL application will specify a number of core datasets (or data values) and then the operations that are to be performed on those values.
Hello world
ECL is a succinct solution to problems and sensitive defaults. The “Hello World” program is characteristically short:
'Bonjour Monde'
Perhaps a more flavorful example would take a list of strings, put them in order, and then return.
// Datasets can also be binary, CSV, XML, or externally defined structures D : = DATASET ([ 'ECL' , 'Declarative' , 'Data ' , ' Centric ' , ' Programming ' , ' Language ' ], STRING Value ;); SD : = SORT ( D , Value ); output ( SD )
The statements contain a :=
definition of ECL as attribute definitions. They do not denote an action; rather a definition of a term. Thus, logically, an ECL program can be read: “bottom to top”
OUTPUT (SD)
What is an SD?
SD : = SORT ( D , Value );
SD is a Dedicated by ‘Value’
What is a D?
D : = DATASET ([ 'ECL' , 'Declarative' , 'Data' , 'Centric' , 'Programming' , 'Language' ], STRING Value ;);
D is a dataset with one column labeled ‘Value’ and containing the following list of data.
ECL primitives
ECL primitives that act upon datasets include: SORT, ROLLUP, DEDUP, ITERATE, PROJECT, JOIN, NORMALIZE, DENORMALIZE, PARSE, CHOSEN, ENTH, TOPN, DISTRIBUTE
ECL encapsulation
Whilst ECL is terse and LexisNexis claims that 1 line of ECL is roughly equivalent to 120 lines of C ++ it is significant support for large scale programming encapsulation and code re-use. The constructs available include: MODULE, FUNCTION, FUNCTIONMACRO, INTERFACE, MACRO, EXPORT, SHARED
Support for Parallelism in ECL
In the HPCC implementation, by default, most ECL constructs will be executed in parallel across the hardware being used. Many of the primitives also have a LOCAL option to specify that the operation is to occur locally on each node.
Comparison to Map-Reduce
The Hadoop Map-Reduce paradigm actually consists of three phases that correlate to ECL primitives as follows.
Hadoop Name / Term | ECL equivalent | Comments |
---|---|---|
MAPing within the MAPper | PROJECT / TRANSFORM | Takes a record and converts to a different format; in the Hadoop the conversion is in a key-value pair |
SHUFFLE (Phase 1) | DISTRIBUTE (, HASH (KeyValue)) | The records from the map are distributed depending on the KEY value |
SHUFFLE (Phase 2) | SORT (, LOCAL) | KEY order |
REDUCE | ROLLUP (, Key, LOCAL) | The records for a particular KEY value are now combined |
References
- Jump up^ A Guide to ECL, Lexis-Nexis.
- Jump up^ “Evaluating the use of data flow systems for large graph analysis,” by A. Yoo, and I. Kaplan. Proceedings of the 2nd Workshop on Many-Computing on Grids and Supercomputers, MTAGS, 2009
- Jump up^ Acquisition of Seisint