Big Data Processing framework (Spark and Dask) on PBS


#1

Hi everyone,

A quick word to share with you work that has been done on using Spark and Dask with PBS at CNES (French Spatial Agency). PBS scripts to launch Spark or Dask based cluster are available in this repo: https://github.com/guillaumeeb/big-data-frameworks-on-pbs.

Has anyone already done this? Do you have something to share?
Are there plan to add similar functionnality in PBS? I’ve already discussed a bit with @subhasisb some times ago, but I don’t know what is he current situation.
I will be happy to have any feedback on this, so fill free to answer or ask anything.

Cheers,
Guillaume.


#2

Hi @guillaumeeb,

Thanks for updating about the work here. This is going to benefit the entire PBS community.
We can certainly look at including links to your work from the PBS Professional github pages etc.

To start with, we will help by testing out these scripts in the short term.

Regards,
Subhasis