Skip to content

Make DataCleaner simple - remove complexity of Spark, Scala and dynamic extensions #1968

@kaspersorensen

Description

@kaspersorensen

DataCleaner is a complex tool. And as the lead developer on it for years, I'm sorry to say - I don't think it's maintainable in it's current state. I'd like to propose making DataCleaner maintainable by retaining what it is at it's core for 99% of its users, and ditching the complexity that is not really used anymore anyway. This is specifically related to making it easy to build and develop on DC. But also to make it easy to run in modern JVMs.

  • Remove the dynamic classloading / extensions / drivers and such. This has a huge technical complexity cost and makes the tool incompatible with newer JDKs.
  • Remove Spark engine - nobody uses DC for that sorta stuff by now
  • Remove the Scala components - too much build complexity for the value that it brings. This would mean getting rid of the "Visualizations" components though.

I'm going to make a branch for this. If nothing else for my own benefit of being able to build and run DC. But I think it should be considered the next major version of DC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions