Practical Curve Fitting with Python — From Linear to Nonlinear Models

Practical Curve Fitting with Python — From Linear to Nonlinear ModelsCurve fitting is the process of finding a mathematical function that best describes the relationship between input variables and observed data. In applied science, engineering, and data analysis, curve fitting helps you summarize trends, interpolate between points, make predictions, and extract model parameters that have physical meaning. This article walks through practical curve fitting in Python, covering linear regression, polynomial fits, and nonlinear models. We’ll discuss model selection, goodness-of-fit metrics, handling noisy data, and offer code examples using NumPy, SciPy, and scikit-learn.


Why curve fitting matters

Curve fitting turns raw data into a compact model you can reason about. Use cases include:

  • Estimating physical constants from measurements (e.g., rate constants).
  • Predicting values where direct measurement is expensive.
  • Removing trends (detrending) for signal analysis.
  • Smoothing noisy sensor data.

Key trade-offs: simpler models (fewer parameters) are easier to interpret and less likely to overfit; complex models can fit data better but may generalize poorly.


Libraries we’ll use

  • NumPy — numerical arrays and basic linear algebra.
  • SciPy — optimization routines for nonlinear least squares.
  • scikit-learn — linear models, preprocessing, and evaluation utilities.
  • Matplotlib — plotting results.

Install with:

pip install numpy scipy scikit-learn matplotlib 

1. Data preparation and visualization

Always begin by visualizing your data. Look for patterns, outliers, heteroscedasticity (changing variance), and missing values.

Example synthetic dataset:

import numpy as np import matplotlib.pyplot as plt rng = np.random.default_rng(42) x = np.linspace(0, 10, 50) y_true = 2.5 * x + 1.0 y = y_true + rng.normal(scale=3.0, size=x.shape) plt.scatter(x, y, label='data') plt.plot(x, y_true, color='C1', label='true') plt.legend() plt.show() 

Standard steps:

  • Clean or impute missing values.
  • Remove or flag gross outliers.
  • Optionally scale or normalize features for numerical stability (especially for high-degree polynomials or iterative optimizers).

2. Linear regression (ordinary least squares)

For a linear relationship y = a*x + b, ordinary least squares (OLS) provides a closed-form solution.

NumPy closed form:

X = np.column_stack([x, np.ones_like(x)])  # design matrix [x, 1] coeffs, residuals, rank, s = np.linalg.lstsq(X, y, rcond=None) a, b = coeffs 

Using scikit-learn:

from sklearn.linear_model import LinearRegression model = LinearRegression().fit(x.reshape(-1,1), y) a = model.coef_[0] b = model.intercept_ 

Evaluate fit with R-squared and residual analysis:

from sklearn.metrics import r2_score y_pred = model.predict(x.reshape(-1,1)) r2 = r2_score(y, y_pred) 

3. Polynomial fitting

Polynomials let you model curvature: y = c0 + c1 x + c2 x^2 + … Use with care: high degrees can oscillate (Runge’s phenomenon).

NumPy polyfit:

deg = 3 coeffs = np.polyfit(x, y, deg) p = np.poly1d(coeffs) y_pred = p(x) 

Better practice: use orthogonal polynomials or feature scaling to reduce numerical issues. scikit-learn pipeline example:

from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import make_pipeline deg = 3 model = make_pipeline(PolynomialFeatures(degree=deg, include_bias=True),                       LinearRegression()) model.fit(x.reshape(-1,1), y) 

4. Nonlinear curve fitting with SciPy

When the model is nonlinear in parameters (e.g., exponential, logistic, Gaussian), use scipy.optimize.curve_fit or least_squares.

Example: fit an exponential y = A * exp(-k*x) + C

from scipy.optimize import curve_fit def exp_model(x, A, k, C):     return A * np.exp(-k * x) + C popt, pcov = curve_fit(exp_model, x, y, p0=(10, 0.5, 0)) A, k, C = popt perr = np.sqrt(np.diag(pcov))  # parameter standard errors 

Tips:

  • Provide reasonable initial guesses (p0); poor guesses can lead to non-convergence.
  • Use bounds to constrain parameters: curve_fit(…, bounds=(lower, upper)).
  • For robust fits, consider scipy.optimize.least_squares with loss=‘soft_l1’ or ‘huber’ to reduce outlier influence.

5. Weighted least squares and heteroscedasticity

If measurement errors have non-constant variance, use weights wi = 1/sigma_i^2. In curve_fit you can pass sigma to apply weighting (and set absolute_sigma=True if sigma are true standard deviations).

Example:

popt, pcov = curve_fit(model, x, y, sigma=sigma_y, absolute_sigma=True) 

scikit-learn’s LinearRegression supports sample_weight for weighted linear regression.


6. Regularization and avoiding overfitting

Regularization adds penalty terms to reduce variance:

  • Ridge (L2) and Lasso (L1) for linear/polynomial models.
  • Use cross-validation to choose penalty strength (alpha).

scikit-learn example:

from sklearn.linear_model import Ridge from sklearn.model_selection import cross_val_score model = make_pipeline(PolynomialFeatures(degree=5), Ridge(alpha=1.0)) scores = cross_val_score(model, x.reshape(-1,1), y, scoring='r2', cv=5) 

7. Model selection and validation

  • Split data into train/validation/test or use k-fold cross-validation.
  • Compare models using metrics: RMSE, MAE, R^2, AIC/BIC for nested models.
  • Inspect residuals: they should resemble white noise (no patterns) if the model captures structure.
  • Use diagnostic plots: residual vs fitted, Q-Q plot for normality.

AIC for least squares (approx): AIC = n * ln(RSS/n) + 2k, where n = number of points, RSS = residual sum of squares, k = number of parameters.


8. Handling noisy, sparse, or censored data

  • For heavy noise or outliers: robust loss (Huber), RANSAC for linear fits.
  • For sparse data, prefer simpler models or incorporate domain priors.
  • For censored data, consider survival analysis methods or maximum-likelihood fitting that models censoring.

RANSAC example for linear:

from sklearn.linear_model import RANSACRegressor base = LinearRegression() ransac = RANSACRegressor(base_estimator=base).fit(x.reshape(-1,1), y) inlier_mask = ransac.inlier_mask_ 

9. Practical workflow checklist

  • Visualize data and residuals.
  • Choose model family guided by physics/intuition.
  • Scale features and use numerically stable bases.
  • Provide good initial guesses for nonlinear fits.
  • Regularize if necessary and validate with cross-validation.
  • Report parameter uncertainties and prediction intervals where relevant.

10. Worked example: From linear to nonlinear

Complete script fitting linear, cubic polynomial, and exponential, comparing RMSE:

import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error from scipy.optimize import curve_fit from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import make_pipeline rng = np.random.default_rng(0) x = np.linspace(0, 10, 80) y_true = 3.0 * np.exp(-0.5 * x) + 2.0 y = y_true + rng.normal(scale=0.5, size=x.shape) # Linear lin = LinearRegression().fit(x.reshape(-1,1), y) y_lin = lin.predict(x.reshape(-1,1)) # Cubic poly3 = make_pipeline(PolynomialFeatures(3), LinearRegression()).fit(x.reshape(-1,1), y) y_poly3 = poly3.predict(x.reshape(-1,1)) # Exponential fit def exp_model(x, A, k, C):     return A * np.exp(-k*x) + C popt, _ = curve_fit(exp_model, x, y, p0=(3,0.5,2)) y_exp = exp_model(x, *popt) print('RMSE linear:', mean_squared_error(y, y_lin, squared=False)) print('RMSE poly3:', mean_squared_error(y, y_poly3, squared=False)) print('RMSE exp:', mean_squared_error(y, y_exp, squared=False)) plt.scatter(x, y, s=8, label='data') plt.plot(x, y_true, '--', label='true') plt.plot(x, y_lin, label='linear') plt.plot(x, y_poly3, label='poly3') plt.plot(x, y_exp, label='exp fit') plt.legend(); plt.show() 

11. Final notes and resources

  • Prefer simple, explainable models when possible.
  • Use domain knowledge for model form and parameter bounds.
  • Document assumptions and quantify uncertainty.

Further reading: “Numerical Recipes” chapters on curve fitting, SciPy optimize docs, scikit-learn model selection guides.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *