|
Diagnostics tab |
|
Do the initial parameter values define a curve near the data?
Nonlinear regression works iteratively, and begins with initial values for each parameter. Check "don't fit the curve" to see the curve generated by your initial values. If the curve is far from the data, go back to the initial parameters tab and enter better values for the initial values. Repeat until the curve is near the points. Then go back to the Diagnostics tab and check "Fit the curve". While fitting a curve, Prism will stop after the maximum number of iterations set here. If you are running a script to automatically analyze many data tables, you might want to lower the maximum number of iterations so Prism won't waste time trying to fit impossible data. How precise are the best-fit values of the parameters?
If your goal is to find the best-fit value of the parameters, you will also want to know how precise those estimates are. We suggest that you report confidence intervals, as inspecting the confidence intervals of best-fit parameters is an essential part of evaluating any nonlinear fit. Standard errors are intermediate values used to compute the confidence intervals, but are not very useful by themselves. Include standard errors in the output to compare Prism's results to those of another program that doesn't report confidence intervals. Choose to report confidence intervals as a range or separate blocks of lower and upper confidence limits (useful if you want to paste the results into another program).
The 95% confidence bands enclose the area that you can be 95% sure contains the true curve. It gives you a visual sense of how well your data define the best-fit curve. The 95% prediction bands enclose the area that you expect to enclose 95% of future data points. This includes both the uncertainty in the true position of the curve (enclosed by the confidence bands), and also accounts for scatter of data around the curve. Therefore, prediction bands are always wider than confidence bands. When you have lots of data points, the discrepancy is huge. How to quantify goodness-of-fit
you will probably want to ask Prism to report R2, simply because it is standard to do so, even though knowing R2 doesn't really help you interpret the results. Reporting the sum-of-squares and sy.x will only be useful if you want to compare Prism's results to those of another program, or you want to do additional calculations by hand. Normality tests
Least-squares nonlinear regression assumes that the distribution of residuals follows a Gaussian distribution (robust nonlinear regression does not make this assumption). Prism can test this assumption by running a normality teston the residuals. Prism offers three normality tests. We recommend the D'Agostino-Pearson test. Does the curve systematically deviate from the points?
Does the curve follow the trend of the data? Or does the curve systematically deviate from the trend of the data? Prism offers two tests that answer these questions. If you have entered replicate Y values, choose the replicates test to find out if the points are 'too far' from the curve (compared to the scatter among replicates). If the P value is small, conclude that the curve does not come close enough to the data. The runs test is available if you entered single Y values (no replicates) or chose to fit only the means rather than individual replicates (weighting tab). A 'run' is a series of consecutive points on the same side of the curve. If there are too few runs, it means the curve is not following the trend of the data. If you choose a residual plot, Prism creates a new graph. The X axis is the same as the graph of the data, while the Y axis plots the distance of each point from the curve (the residuals). Points with positive residuals are above the curve; points with negative residuals are below the curve. Viewing a residual plot can help you assess whether the distribution of residuals is random above and below the curve. Are the parameters intertwined or redundant?
What does it mean for parameters to be intertwined? After fitting a model, change the value of one parameter but leave the others alone. The curve moves away from the points. Now, try to bring the curve back so it is close to the points by changing the other parameter(s). If you can bring the curve closer to the points, the parameters are intertwined. If you can bring the curve back to its original position, then the parameters are redundant. In this case, Prism will alert you by labeling the fit 'ambiguous'. We suggest that you report the dependency, and not bother with the covariance matrix. When you are getting started with curve fitting, it is OK to leave both options unchecked. Could outliers impact the results?
Nonlinear regression is based on the assumption that the scatter of data around the ideal curve follows a Gaussian distribution. The presence of one or a few outliers (points much further from the curve than the rest) can overwhelm the least-squares calculations and lead to misleading results. Check this option to count the outliers, but leave them in the calculations. Choose how aggressively to define outliers by adjusting the ROUT coefficient. If you chose the option in the Fit tab to exclude outliers from the calculations, then this option to simply count outliers (in the Diagnostics tab) is not available. Would it help to use stricter convergence criteria?
Nonlinear regression is an iterative process. It starts with initial values of the parameters, and then repeatedly changes those values to increase the goodness-of-fit. Regression stops when changing the values of the parameters makes a trivial change in the goodness of fit. Prism lets you define the convergence criteria in three ways. The medium choice is default, and will work fine in most cases. With this choice, nonlinear regression ends when five iterations in a row change the sum-of-squares by less than 0.0001%. If you are having trouble getting a reasonable fit, you might want to try the stricter definition of convergence: five iterations in a row change the sum-of-squares by less than 0.00000001%. It won't help very often, but is worth a try. The only reason not to always use the strictest choice is that it takes longer for the calculations to complete. That won't matter with small data sets, but will matter with large data sets or when you run scripts to analyze many data tables. If you are fitting huge data sets, you can speed up the fit by using the 'quick' definition of convergence: Two iterations in a row change by less than 0.01%.
|