Using GNNs to classify HTML data?

So, this is viable then? It's not something I'm pulling out of my ass and trying to fit into DL?

I ask because this happens a lot with these things.

That being said, if there is any semantic information in the HTML (which I assume there is) a large language model like BERT or even a seq2seq model like T5 would provide valuable leverage.

Actually, you know, I thought about doing something like that, I just wasn't sure that it would go anywhere. The nodes are likely going to have a lot of text in them. You can probably get a lot of information out of them like that, and such information will likely influence how you would classify a node.

/r/deeplearning Thread Parent