Blog posts tagged
"spark"

15 posts


Giulia Lanzafame
26 June 2025

Accelerating data science with Apache Spark and GPUs

Article Data Platform

Apache Spark has always been very well known for distributing computation among multiple nodes using the assistance of partitions, and CPU cores have always performed processing within a single partition.  What’s less widely known is that it is possible to accelerate Spark with GPUs. Harnessing this power in the right...

Giulia Lanzafame
26 June 2025


Giulia Lanzafame
10 June 2025

Apache Spark security: start with a solid foundation

Article Data Platform

Everyone agrees security matters – yet when it comes to big data analytics with Apache Spark, it’s not just another checkbox. Spark’s open source Java architecture introduces special security concerns that, if neglected, can quietly reveal sensitive information and interrupt vital functions. Unlike standard software,...

Giulia Lanzafame
10 June 2025


Giulia Lanzafame
10 December 2024

Spark or Hadoop: the best choice for big data teams?

Article Data Platform

I always find the Olympics to be an unusual experience. I’m hardly an athletics fanatic, yet I can’t help but get swept up in the spirit of the competition. When the Olympics took place in Paris last summer, I suddenly began rooting for my country in sports I barely knew existed. I would spend random

Giulia Lanzafame
10 December 2024


robgibbon
15 October 2024

Apache Spark 4.0 beta release – try it now

Article Data Platform

Apache Spark is a popular framework for developing distributed, parallel data processing applications. Our solution for Apache Spark on Kubernetes has made significant progress in the past year since we launched, adding support for Apache Iceberg, a new GPU accelerated image using the NVIDIA Spark-RAPIDS plugin, and...

robgibbon
15 October 2024


robgibbon
15 July 2024

Deploying and scaling Apache Spark on Amazon AWS EKS

Article Data Platform

Move over Hadoop, it’s time for Spark on Kubernetes Apache Spark, a framework for parallel distributed data processing, has become a popular choice for building streaming applications, data lake houses and big data extract-transform-load data processing (ETL). It is horizontally scalable, fault-tolerant, and performs...

robgibbon
15 July 2024


robgibbon
23 May 2024

Can it play Doom? Running an AI LAN party on a Spark cluster with ViZDoom

Article AI

It’s all about AI these days, so I decided to try and answer the important question: can you make a Spark cluster run AI agents that play a game of Doom, in a multiplayer LAN party? Although I’m no data scientist, I was able to get this to work and I’ll show you how so

robgibbon
23 May 2024


robgibbon
17 October 2023

Why we built a Spark solution for Kubernetes

Article Data Platform

We’re super excited to announce that we have shipped the first release of our solution for big data – Charmed Spark. Charmed Spark packages a supported distribution of Apache Spark and optimises it for deployment to Kubernetes, which is where most of the industry is moving these days. Reimagining how to work with big data

robgibbon
17 October 2023


Hasmik Zmoyan
21 September 2023

Open source tooling at GITEX Global

Article AI

Innovate at speed with AI. Stay secure and compliant with Ubuntu Pro Date: 16-20 October 2023 Location: Dubai, UAE Booth: Booth B31, Hall 26, DevSlam Canonical is excited to attend GITEX Global 2023, the largest event in the Middle East. Generative AI, predictive analytics and multi-cloud environments are at the heart...

Hasmik Zmoyan
21 September 2023


robgibbon
10 August 2023

Write a Spark big data job with ChatGPT

Article AI

I’ve read and watched more than a few articles about ChatGPT in the last couple of months. It seems the large language model AI hype machine just can’t stop.  As somebody with a passion for music production, some of the more interesting things I’ve seen included a guy using ChatGPT to build a virtual effect

robgibbon
10 August 2023


robgibbon
3 May 2023

Big data security foundations in five steps

Article Data Platform

We’ve all read the headlines about spectacular data breaches and other security incidents, and the impact that they have had on the victim organisations. And in some ways there’s no place more vulnerable to attack than a big data environment like a data lake.

robgibbon
3 May 2023


Guest
13 September 2016

Deploying a Spark job using Juju

Article Cloud and server

Juju makes it easy to setup and monitor a Spark cluster with a few commands. In this guide we will setup a new cluster and deploy a Spark job using this tool. According to the official definition, Juju is a service modelling tool that allows people to model, configure and deploy applications in the cloud.

Guest
13 September 2016


Christian Reis
15 March 2016

Renting bare-metal (as a Service) with MAAS

Article Cloud and server

With the amount of industry excitement around virtualized cloud, it is easy to downplay the importance of directly utilizing bare metal infrastructure. And yet bare-metal is the one thing every experienced operator knows they can rely on when deploying mission-critical services, representing guaranteed resource,...

Christian Reis
15 March 2016


John Dolen
8 October 2015

Ubuntu with IBM Power Systems LC models for big data

Article Cloud and server

Juju makes it easy to model and deploy Spark and other analytics solution bundles IBM has extended its range of Ubuntu supported systems with today’s announcement of the Power Systems LC models. Canonical, along with fellow OpenPOWER Foundation members – Mellanox, NVIDIA, Tyan and Wistron, collaborated with IBM on the...

John Dolen
8 October 2015


Maarten Ectors
23 July 2015

More Juju, Big Data and Snappy Beauty from Dataart

Article Cloud and server

What do you get when you combine Juju with Spark & Apache Zeppelin, Raspberry Pi with Snappy, Bluetooth Low Energy, DeviceHive and a TI SensorTag? Check out the video from Dataart.

Maarten Ectors
23 July 2015


  1. Previous page
  2. 1
  3. 2
  4. Next page