Hi. I can apply KMeans or GMM to your problem, estimate the number of clusters or mixtures K with BIC, AIC information criterions and then do classifications with logreg or any other classifier you like. I am experienced in using Python to work with data and then applying Machine Learning to it. Have worked with text classification.