super() raises "TypeError: must be type, not classobj" for new-style class

You can also use class TextParser(HTMLParser, object):. This makes TextParser a new-style class, and super() can be used.


Alright, it's the usual "super() cannot be used with an old-style class".

However, the important point is that the correct test for "is this a new-style instance (i.e. object)?" is

>>> class OldStyle: pass
>>> instance = OldStyle()
>>> issubclass(instance.__class__, object)
False

and not (as in the question):

>>> isinstance(instance, object)
True

For classes, the correct "is this a new-style class" test is:

>>> issubclass(OldStyle, object)  # OldStyle is not a new-style class
False
>>> issubclass(int, object)  # int is a new-style class
True

The crucial point is that with old-style classes, the class of an instance and its type are distinct. Here, OldStyle().__class__ is OldStyle, which does not inherit from object, while type(OldStyle()) is the instance type, which does inherit from object. Basically, an old-style class just creates objects of type instance (whereas a new-style class creates objects whose type is the class itself). This is probably why the instance OldStyle() is an object: its type() inherits from object (the fact that its class does not inherit from object does not count: old-style classes merely construct new objects of type instance). Partial reference: https://stackoverflow.com/a/9699961/42973.

PS: The difference between a new-style class and an old-style one can also be seen with:

>>> type(OldStyle)  # OldStyle creates objects but is not itself a type
classobj
>>> isinstance(OldStyle, type)
False
>>> type(int)  # A new-style class is a type
type

(old-style classes are not types, so they cannot be the type of their instances).


super() can be used only in the new-style classes, which means the root class needs to inherit from the 'object' class.

For example, the top class need to be like this:

class SomeClass(object):
    def __init__(self):
        ....

not

class SomeClass():
    def __init__(self):
        ....

So, the solution is that call the parent's init method directly, like this way:

class TextParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.all_data = []