Areas of application


  • Smart Cities
  • Internet of things
  • Massive Data processing
  • Fraud and Risk

Sectors Application


  • Banking and Insurance
  • Public administration
  • Infraestructures
  • Industry and Energy
%
Success

CHALLENGES

1.

2.

3.

Capture, obtain and process huge amounts of information both in batches and in real time.

Ensure the quality and adequacy of the information, avoiding losses or duplicities.

Respect security requirements in integration with other systems, storage and con dentiality.

Solution

We use the Spark suite as appropriate and rely on Kafka to ensure there is no loss of information. In addition, we try to make scalable and real time solutions whenever is possible.

Benefits

The environment can grow by including new sources and capabilities in the analysis layer, generating valuable information through the implementation of models and algorithms.

Results

A big data architecture with horizontal scaling capabilities aimed at managing both real- time and batch information, based on Spark as a core element of the project.

SOLUTIONS

The solution approach is based on the use of the Spark suite as appropriate depending on the case, Spark core for batch processing, Spark Streaming for real time processing and on the other hand Spark SQL to be able to connect with other applications and study the data. We rely on Kafka to ensure that there is no loss of information, receiving the packages and distributing them according to the relevant criteria. We try to make the solutions scalable, if possible in real time and with hot swap, an example of this is the assembly of the kafka cluster with zookeeper, distributed in a set of nodes expandable at all times with new nodes to scale the service.

RESULTS

The result of the project entails the implementation of a big data architecture with horizontal scaling capabilities aimed at managing both real-time and batch-based mass information, based on spark as a core element of the project. The architecture
is also scalable in terms of the number of machines and incorporates capabilities that add security and integrity to the processed when working in environments where the disposal of information is not acceptable. From this point on, the environment can grow, including new sources as well as capabilities in the data analysis layer, facilitating the work of the data scientists team or generating value information through the implementation of models and algorithms.

#spark #kafka #scala #osBrain #sapic #hbase #storm # ume #hadoop

CLIENTS

1431
3342
1405
1599