Skip to main content

Documentation Index

Fetch the complete documentation index at: https://sourcetable.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Sourcetable supports two ML approaches: TabPFN for instant zero-shot predictions, and scikit-learn for full model training pipelines.

TabPFN — zero-shot predictions

TabPFN is a pre-trained neural network that makes predictions on tabular data without any training step. It works immediately on your data.
"Predict which customers will churn based on their usage data using TabPFN"

When to use TabPFN

  • Small to medium datasets (under 10,000 rows works best)
  • Quick prototyping — get results in seconds
  • No hyperparameter tuning needed
  • Classification and regression tasks

Available modes

ModeChat modeDescription
ClassificationClassifyPredict categorical outcomes (churn/no churn, fraud/legitimate)
RegressionPredictPredict continuous values (price, score, duration)

scikit-learn — full ML pipeline

For larger datasets or when you need more control, Sourcetable uses scikit-learn under the hood.

Classification

"Build a random forest classifier to predict loan default from the applicant data"
Available algorithms:
AlgorithmBest for
Random ForestGeneral purpose, handles mixed features well
Gradient BoostingHigh accuracy, handles non-linear relationships
Logistic RegressionInterpretable, good baseline
SVMHigh-dimensional data, clear margin of separation
k-Nearest NeighborsSimple, non-parametric
Decision TreeInterpretable, visual output
Naive BayesText classification, very fast

Regression

"Train a gradient boosting model to predict house prices"
Available algorithms: Linear Regression, Ridge, Lasso, ElasticNet, Random Forest Regressor, Gradient Boosting Regressor, SVR, Decision Tree Regressor.

Clustering

"Cluster customers into segments based on purchase behavior"
AlgorithmBest for
K-MeansSpherical clusters, known number of groups
DBSCANArbitrary shapes, automatic cluster count
HierarchicalDendrogram visualization, nested groups
Gaussian MixtureOverlapping clusters, soft assignments

Dimensionality reduction

"Reduce the dataset to 2 dimensions with PCA and plot the clusters"
Available methods: PCA, t-SNE, UMAP, LDA.

End-to-end ML pipeline

When you ask the AI to build a model, it automatically handles:
  1. Data splitting — train/test split (default 80/20)
  2. Feature preprocessing — encoding categoricals, scaling numerics, handling missing values
  3. Model training — fits the chosen algorithm
  4. Evaluation — generates metrics and visualizations
  5. Results — writes predictions back to your spreadsheet

Model evaluation

The AI reports relevant metrics based on the task: Classification metrics:
  • Accuracy, Precision, Recall, F1 Score
  • ROC curve and AUC
  • Confusion matrix
  • Classification report by class
Regression metrics:
  • R-squared and Adjusted R-squared
  • MAE (Mean Absolute Error)
  • RMSE (Root Mean Squared Error)
  • Residual plots
Clustering metrics:
  • Silhouette score
  • Calinski-Harabasz index
  • Inertia (for K-Means)

Hyperparameter tuning

"Tune the random forest hyperparameters using cross-validation"
The AI performs grid search or randomized search with cross-validation to find optimal parameters.