So, this is viable then? It's not something I'm pulling out of my ass and trying to fit into DL?
I ask because this happens a lot with these things.
That being said, if there is any semantic information in the HTML (which I assume there is) a large language model like BERT or even a seq2seq model like T5 would provide valuable leverage.
Actually, you know, I thought about doing something like that, I just wasn't sure that it would go anywhere. The nodes are likely going to have a lot of text in them. You can probably get a lot of information out of them like that, and such information will likely influence how you would classify a node.