Class vs data structure

I recommend you to read Clean Code chapter 6: objects and data structures. The whole chapter is about this... You can read an abstract if you don't want to buy the book, it can be found here.

According to that, you can use classes efficiently in two different ways. This phenomenon is called data/object anti-symmetry. Depending on your goals, you have to decide whether your classes will follow the open/closed principle or not.
If they follow the OCP, they will be polymorph, and their instances will be used as objects. So they will hide data and implementation of a common interface, and it will be easy to add a new type which implements that interface as well. Most of the design patterns fulfill the OCP, for example MVC, IoC, every wrapper, adapter, etc...
If they don't follow the OCP, they won't be polymorph, their instances will be used as data structures. So they will expose data, and that data will be manipulated by other classes. This is a typical approach by procedural programming as well. There are several examples which don't use OCP, for example DTOs, Exceptions, config objects, visitor pattern etc...

Typical pattern when you should think about fulfilling OCP and move the code to a lower abstraction level:

class Manipulator {
    doSomething(Object dataStructure){
        if (dataStructure instanceof MyType1){
            // doSomething implementation 1
        }
        else if (dataStructure instanceof MyType2)
        {
            // doSomething implementation 2
        }
        // ...
    },
    domSomethingElse(Object dataStructure){
        if (dataStructure instanceof MyType1){
            // domSomethingElse implementation 1
        }
        else if (dataStructure instanceof MyType2)
        {
            // domSomethingElse implementation 2
        }
        // ...
    }
}

class MyType1 {}
class MyType2 {}
//if you want to add a new type, every method of the Manipulator will change

fix: moving implementation to a lower abstraction level and fulfill OCP

interface MyType {
    doSomething();
    domSomethingElse();
}

class MyType1 implements MyType {
    doSomething(){
        // doSomething implementation 1
    },
    domSomethingElse(){
        // domSomethingElse implementation 1
    }
}

class MyType2 implements MyType {
    doSomething(){
        // doSomething implementation 2
    },
    domSomethingElse(){
        // domSomethingElse implementation 2
    }
}

// the recently added new type
class MyType3 implements MyType {
    doSomething(){
        // doSomething implementation 3
    },
    domSomethingElse(){
        // domSomethingElse implementation 3
    }
}

Typical pattern when you should think about violating OCP and move the code to an higher abstraction level:

interface MyType {
    doSomething();
    domSomethingElse();

    //if you want to add a new method here, every class which implements this interface, will be modified
}

class MyType1 implements MyType {
    doSomething(){
        // doSomething implementation 1
    },
    domSomethingElse(){
        // domSomethingElse implementation 1
    }
}

class MyType2 implements MyType {
    doSomething(){
        // doSomething implementation 2
    },
    domSomethingElse(){
        // domSomethingElse implementation 2
    }
}

or

interface MyType {
    doSomething();
    domSomethingElse();
}

class MyType1 implements MyType {
    doSomething(){
        // doSomething implementation 1
    },
    domSomethingElse(){
        // domSomethingElse implementation 1
    }
}

class MyType2 implements MyType {
    doSomething(){
        // doSomething implementation 2
    },
    domSomethingElse(){
        // domSomethingElse implementation 2
    }
}

//adding a new type by which one or more of the methods are meaningless
class MyType3 implements MyType {
    doSomething(){
        throw new Exception("Not implemented, because it does not make any sense.");
    },
    domSomethingElse(){
        // domSomethingElse implementation 3
    }
}

fix: moving implementation to a higher abstraction level and violate OCP

class Manipulator {
    doSomething(Object dataStructure){
        if (dataStructure instanceof MyType1){
            // doSomething implementation 1
        }
        else if (dataStructure instanceof MyType2)
        {
            // doSomething implementation 2
        }
        // ...
    },
    domSomethingElse(Object dataStructure){
        if (dataStructure instanceof MyType1){
            // domSomethingElse implementation 1
        }
        else if (dataStructure instanceof MyType2)
        {
            // domSomethingElse implementation 2
        }
        // ...
    },
    // the recently added new method
    doAnotherThing(Object dataStructure){
        if (dataStructure instanceof MyType1){
            // doAnotherThing implementation 1
        }
        else if (dataStructure instanceof MyType2)
        {
            // doAnotherThing implementation 2
        }
        // ...
    }
}

class MyType1 {}
class MyType2 {}

or splitting up the classes into subclasses.

People usually follow OCP over the method count one or two because repeating the same if-else statements is not DRY enough.

I don't recommend you to use mixed classes which partially fulfill, partially violate the OCP, because then the code will be very hard maintainable. You should decide by every situation which approach you follow. This should be usually an easy decision, but if you make a mistake, you can still refactor your code later...


A class is simply a collection of data and methods which can act on that data. You can use a class to implement a data structure, but they are different things.

Take the Linked List for example. You can implement a Linked List data structure using a class, and in some languages this is the cleanest and most obvious way of doing it. It is not the only way to implement a Linked List, but it might be the best depending on the language.

A Linked List however, has nothing to do with being a class. A Linked List is instead a way of representing data as separate nodes where each node is linked to the next in some fashion.

A data structure is a conceptual way of modeling data, each different data structure having different properties and use cases. A class is a syntactic way that some languages offer to group data and methods.

Classes often lend themselves to being used to implement data structures, but it would be incorrect to say that a class == a data structure.


Whether a custom class is a data structure depends on whom you ask. At the very least, the yes people would acknowledge than it's a user-defined data structure which is more domain specific and less established than data structures such as arrays, linked lists or binary trees for example. For this answer, I consider them distinct.

While it's easy to apply Big O algorithm analysis to data structures, it's a little more complex for classes since they wrap many of these structures, as well as other instances of other classes... but a lot of operations on class instances can be broken down into primitive operations on data structures and represented in terms of Big O. As a programmer, you can endeavour to make your classes more efficient by avoiding unnecessary copying of members and ensuring that method invocations don't go through too many layers. And of course, using performant algorithms in your methods goes without saying, but that's not OOP specific. However, functionality, design and clarity should not be sacrificed in favour of performance unless necessary. And premature optimisation is the devil yada yada yada.

I'm certain that some academic, somewhere, has attempted to formulate a metric for quantifying class performance or even a calculus for classes and their operations, but I haven't come across it yet. However, there exists QA research like this which measures dependencies between classes in a project... one could possibly argue that there's a correlation between the number of dependencies and the layering of method invocations (and therefore lower class performance). But if someone has researched this, I'm sure you could find a more relevant metric which doesn't require sweeping inferences.


I would say that conceptually a class is NOT a data structure, a class represents well, a class of objects, and objects are abstract (in the english meaning of the word, not the C++ or C# meaning of the word) entities.

I'd say classes and objects are like the theory behind the practice, and the practice is the implementation of objects using methods and data. The data may be simple or complex (the so-called advanced data structure).