Evidence Check 2: Homeopathy - Science and Technology Committee Contents

Memorandum submitted by Professor Harald Walach and Professor George Lewith (HO 14)



  1.1  Homeopathic Pathogenetic Trials (HPTs), or, as they used to be called, homeopathic remedy provings, are the pillar of homeopathy. [1] Given their scientific importance, we submit this document to the Commons Science & Technology Committee in its Evidence Check of Homeopathy.

  1.2  Hahnemann founded homeopathy on the basis of his own experience by ingesting China bark; [2,3] these were the first HPTs proper. The Hahnemannian version of an HPT is the following idea: Take a purportedly medicinal substance whose therapeutic or pharmacological effect you do not know. Have some healthy volunteers ingest it in a dose that is likely not harmful. Note down the symptoms these volunteers experience. Use the symptoms to guide your application in ill people: whenever ill people present with a collection of symptoms that could be seen in healthy volunteers, use the same substance to treat the ill person. Thus, the HPT is in fact an operationalisation of the similia rule. In order to know what the similar symptoms are that should be looked out for, you need to know them in the first place. An HPT provides you with this knowledge.

  1.3  A large part of the Materia Medica Pura is actually a result of these early provings. Later on he discovered a good way of diminishing the strong symptoms by diluting his medications, serendipidously hitting at succussion and potentisation, the other important principle of homeopathy. In his final edition of the Organon, he made 30CH[35] the standard potency for HPTs. This seems to be a practice adhered to for quite some time. [4,5] In fact, a lot of the standard polychrest medicines in use by homeopaths today date back to Hahnemann's own HPTs. They have not gone unchallenged, [6-11] but pragmatically seem to be still useful.

  1.4  Although the first blinded and placebo controlled trials in the history of medicine were such early HPTs, [12-14] the circular epistemology of homeopathy placed less emphasis on the methodological rigour of HPTs than on the usefulness of the symptoms derived from them. It was only in more recent years, during the revival of homeopathy research in the seventies and eighties in Germany and elsewhere that the question was asked, whether symptoms derived by such HPTs or provings are actually different from the placebo. A recent systematic review of all HPTs available from 1945 until 1995—156 studies altogether—is not very flattering regarding the methodological sophistication of HPTs. [15,16]


2.1  Testing the Individualised Difference Hypothesis: Randomised Single Case Studies

  2.1.1  Walach et al had participants take Belladonna 12/30CH or placebo in a randomised order, double blind. Randomising the sequence of Belladonna and placebo periods, four weeks each, where only the first day of each week was a day when a remedy was to be taken. Participants noted their symptoms in diaries that collected a predefined set of symptoms, half of which were Belladonna symptoms, the other half symptoms not typical for Belladonna. This enabled straightforward randomisation tests that allow the definition of statistical significance on an individual level to determine whether the number of Belladonna symptoms was different with Belladonna from placebo. Quite paradoxical results were reported. [17,18] Of the 25 experiments, one individual had significantly (p=0.01) more Belladonna symptoms with Belladonna, one had significantly more Belladonna symptoms with placebo, and in several cases there were interesting changes with Belladonna that were graphically obvious but that were not significant.

2.2  Replicating the Naïve Approach

  2.2.1  In the meantime Walach et al ran a larger replication study using a similar design as in the first Belladonna study, improved by several design features: [19] they had more participants (n=87), we introduced a wash-out period of one week between the experimental phases of the crossover-design and reduced the intake of medication or placebo to two weeks each, gave medication only during the first three days of the first week and then had people observe for the rest of the two weeks. They used the same structured diary and, based on our previous study, formulated some hypotheses that guided us towards combining symptom categories to clusters to be tested experimentally. Although there was a clear and significant difference between baseline and each of the experimental interventions in some of the variables, there was no significant difference between homeopathy and placebo in those predefined categories. Thus, the initial tentative results were not replicable, and there was no indication from this study that symptoms produced by placebo and those produced by Belladonna 30CH were in any way different from each other. [20]


  3.1  Intrigued by the phenomenology of the results Walach thought that something different was actually going on in the background variability in both groups. The problem was that there were not only any symptoms that were indistinguishable under both conditions, but that Belladonna-specific symptoms were seen to a large extent also with placebo. That was the scientific puzzle. Having ruled out methodological artifacts, such as carryover effects or response bias, we were quite convinced that this was a genuine effect. To probe this further Walach employed a very sensitive multivariate method: Grade-of-Membership (GoM) Analysis on the dataset of the 2001 study. [21]

  3.2  In essence GoM analysis is a multivariate model. [22,23] While most multivariate analysis models, such as the General Linear Model, use additive models, GoM uses a multiplicative algorithm solved by an iterative maximum-likelihood approximation. This allows for the multivariate usage of many variables even with few cases and it is very sensitive. Normally, we think of group membership as a categorical event: we either cast a vote for a candidate, or we don't; we either belong to a group, or we don't. GoM allows us to group people according to a grading, a kind of probability judgment of belonging to a group, expressed in percent. So a particular person might belong to an experimental group to some degree of probability, and also, to a lesser degree of probability, to another group. More importantly, the analysis also defines the relevance of the variables that are used to reach the decision.

  Using GoM, Walach identified to which extent each individual belonged to the Belladonna and to the placebo condition and which variables predicted this group membership, analysing the first and second half of the crossover design separately. The results are revealing (Table 1):

  3.3  Table 1—Results of a Grade of Membership Analysis of the Data from the Replication Belladonna HPT: [19,21] Variables used to Predict Membership of Participants to Groups in Phase 1 or Phase 2 of the Crossover Study, and Likelihood of Group Membership Predicted by a Variable

Phase 1
Phase 2
Phase 1
Phase 2
forehead right12.62.4 9.211.8
nose60.65.1 6.629.8
mouth43.711.7 6.28.7
skull back right3.47 008.15
whole throat56.216.6 4.98.0
shoulder right-8.28 -0.0
genitals5728.8 147.5
small of back22.812.2 0.00.0
whole body57.828.9 13.213.3
feelings, mind10072.8 29.457.2
always10056.6 4040.9
afternoon10041.2 6.331.7
pain10055.8 27.854.4

  3.4  This is the output of the analysis using only the most important variables to predict group membership. The first column indicates the variable used, the following two columns show the prediction of membership (expressed as percentage) in the first and the second phase of the trial, when homeopathy (Belladonna C30) is taken, the last two columns when placebo is taken. It is interesting to compare the percentages of membership prediction for columns Homeopathy phase 1 and Placebo phase 1, and then the same for phase 2. Ideally, if separation of the groups were perfect, the two homeopathy columns should be relatively similar to each other and very different from two placebo columns. They are not. So we see symptoms that define membership in the Belladonna condition (symptoms of the nose, for instance, 60% membership association with Belladonna in phase 1) that are virtually reduced to unimportance in phase 2 (only 5%), and the other way round. More importantly some of the symptoms are quite typical for Belladonna (symptoms of the throat, for instance, or symptoms starting in the afternoon). It seems that some symptoms typical for Belladonna have emerged during the HPT. So the problem really is that the typical symptom pattern is shifting to placebo, at least partially, during the second half of the trial. Although only exploratory it highlights the phenomenology of HPTs with a very sensitive quantitative pattern-recognition technique. Most recently, a careful German re-proving of Galphimia glauca has produced exactly this result. [24]

  3.5  The tentative conclusion from this re-analysis was that obviously homeopathic remedies do produce some specific symptoms but they also produce these specific symptoms under placebo. Is it due to the shortcomings of our method, or is the phenomenon real? Walach had started working on the assumption that the effects of homeopathy are due to what we have called generalised entanglement [25] and which I have used to render a rational reconstruction of a non-classical model of homeopathy. [26] By that we mean that the effects of homeopathy may be specific, but they are due to non-local correlations as a consequence of the systemic set-up of homeopathy and the treatment situation. This is a consequence of how the whole system is formally constructed. A corollary of this model is that it would predict some specific symptoms also in the control group and that, with repeated experimentation, the specificity is bound to vanish. [27]

  3.6  This sets up a conundrum: How are we to prove experimentally that homeopathic remedies are producing specific symptoms if, by the very experimental set-up, we are likely to destroy this effect?


4.1  A New HPT Model

  4.1.1  Walach et al developed a completely new approach out of the experiences of the previous trials [28] and decided to use a different HPT methodology. The methodological reasoning is the following:

    (a) A full phenomenological account of all experiences should enter the database. All participants are encouraged to report every occurrence that is unusual for them in a diary. To enhance the recall of such events, a supervision interview is conducted every day either by phone or personal interview by a supervisor.

    (b) As controls we introduce blinding and randomisation. Hence none of the participants knows what remedy is being used, nor whether they are randomised to receiving placebo or real substance.

    (c) In order to exclude any effects of suggestion and social desirability, the substance is chosen out of a predefined list by a third party at random, blinding also the director of the study and all staff associated with handling data. This ensures an unbiased experience and processing of symptoms as much as possible.

    (d) Medication is taken individually, until symptoms appear. If no symptoms appear after three days, the intake is stopped and the individual taken out of the study.

    (e) If the symptom database has been created and is closed, all symptoms are scrambled up, dissociated from their temporal and individual ordering by putting them into the head-to-foot-scheme familiar from homeopathic repertories, in symptom units that correspond to these rubrics.

    (f) The database is then given to a materia medica expert not otherwise associated with the study. This expert does not have access to the randomisation code but is given the name of the remedy tested. At this stage, this person and the pharmacist who chose the remedy are the only ones privy to this information.

    (g) The expert then uses a computerised repertory to decide, for every symptom, whether it is a symptom typical for the remedy according to the sources, or not. Thus, the remedy typical symptoms are counted as "1", the atypical symptoms as "0".

    (h) The database is transformed back into the group system. For every participant we count the number of symptoms typical for the remedy and the number of atypical symptoms, averaging across the experimental and control group. This gives a clear testable quantitative outcome score that can be easily tested.

4.2  Pilot Studies Using the New Methodology

  4.2.1  Walach et al then completed four studies following this model. The first two studies were pilot studies. [29,30] In one study Cantharis (chosen randomly from a list of 12 lesser remedies) was evaluated against placebo, in the other Calendula, Ferrum muriaticum (also chosen randomly from a list) and placebo in a three-armed design. In the first study there were more symptoms typical for Cantharis in the Cantharis group than in the placebo group during and less atypical symptoms, but there were also more symptoms typical for Cantharis in the placebo group, which was unexpected (Figure 1).

  4.2.2  Figure 1—Results of the Cantharis Proving: [30] Symptoms Typical for Cantharis and Atypical Symptoms During Baseline and During the Proving Period, for the Cantharis and the Placebo Group

  4.2.3  Although the effect was not significant in this pilot it was quite sizeable (d=0.4). When the group were randomised to receive homeopathic Cantharis but were dosed with it later there were also more symptoms typical for Cantharis. This effect size was very large—d=1.0. Ideally, one would have double or triple-evaluation of the same database and only use symptoms that all agree on for calculating inter-rater reliability. This process mirrors faithfully normal practice where one homeopath translates symptoms into remedy pictures.

  4.2.4  In the three-armed study [29] we found a similar and quite puzzling result. Here the total number of symptoms experienced during the proving phase in the experimental groups was significantly different between the experimental and the placebo control group, as were the number of Calendula symptoms in the Calendula group (Figure 2) with a large effect size of d=2.8.

  4.2.5  Figure 2—Three-Armed HPT of Calendula vs. Ferrum muriaticum vs. Placebo (n=7 participants in each group): Mean Number of Symptoms Typical for Calendula, Ferrum muriaticum or Atypical Symptoms in each Group. [29]

  4.2.6  What can also be seen is that significantly more Calendula symptoms were experienced by participants who had taken Ferr. mur. compared with those taking placebo (effect size d=1.75), and that a sizeable number of Ferr. mur. symptoms were also observed in the Calendula group. While this effect might also be due to the fact that Ferrum muriaticum is a little known substance and hence difficult to identify, the observation that Calendula symptoms were more frequent in the Ferrum mur group is difficult to reconcile with the assumption that these effects are artifacts.

4.3  Two Parallel Replication Studies

  4.3.1  Following on from these observations Walach et al conducted a study with two arms, placebo vs. homeopathy, another one with three arms, comparing placebo to two different remedies with one of the remedies being common to both studies. The remedies chosen from a predefined list of 20 remedies, in this case newly proven ones; Ozone and Iridium, with Ozone being the one common to both studies. When both studies were combined, a clear significant difference for symptoms typical for Ozone during the treatment period emerged (Figure 3).

  4.3.2  Figure 3—Combination of Two Studies Testing Ozone vs. Placebo. [31] Significantly More Symptoms Typical for Ozone During the Experimental Phase in the Group Taking Ozone than in the Group Taking Placebo


  5.1  Walach et al believe they have proven the case that this new methodology of re-proving homeopathic remedies can now tease out at least partially the specificity of homeopathic remedies vis-à-vis placebo in a rigorous experimental design, where other recent approaches have failed. [32-35] While HPTs can be conducted this way, one should avoid the direct replication of any study by using the same types of remedies. A way forward would be to test different remedies, perhaps in several studies with more than two arms that have one or two remedies in common. Then a decision could be made, after the fact and at random, which arms to discard and which to combine. By the combination of changing remedies each time a study is conducted and combining different studies, it might be possible to produce enough single studies with significant outcome testifying to the specificity of symptoms and thus avoiding the observed decline effect. [36] This methodology has only one purpose: to discover whether known homeopathic substances are able to produce symptoms in healthy volunteers that are different from those elicited by placebo. Its pre-supposition has been mentioned: it can only be used if the medication in question is known.

  5.2  To the novice and the outside observer it should also be clear that the proving methodology of homeopaths for the purpose of discovering new remedies is different from the author's in several respects, as a rule: they very often do not have symmetrical control groups, ie only few persons, though randomised and mostly double-blinded, receive placebo, [9,24,37] and very often symptoms appearing in the control group are also counted as remedy symptoms, if they fulfil the typical criteria for a symptom defined by Bayr and Stübler [38].

  5.3  There is surely a common core to the proving methodology: the careful observation of changes by experienced provers who ingest a potentised substance unknown to them. The symptoms are noted in a diary and verified with a supervisor on a daily basis. Little has changed in principle since this methodology was invented. [39] The only thing we have become suspicious about is how powerful placebos are, and that there are likely a lot of specific symptoms to be observed even under placebo. Everyone still adhering to a classical pharmacological model when investigating homeopathy will have difficulty explaining this conundrum. Homeopathic provings are difficult to investigate but as we begin to understand the process, our methodology improves as it becomes more informed.


1.  Walach H. The pillar of homoeopathy: Remedy provings in a scientific framework. British Homoeopathic Journal. 1997;86:219-224.

2.  Barthel P. Hahnemanns Vermächtnis, der Chinarindenversuch—1997. Zeitschrift für Klassische Homüopathie. 1998;42:29.

3.  Bayr G. Hahnemanns Selbstversuch mit der Chinarinde im Jahre 1790. Heidelberg: Haug; 1989.

4.  Marenzeller Av. Anleitung zur Erforschung der Arzneimittelkräfte am Gesunden. Zeitschrift.Verein homüopathischer Ärzte ½sterreichs. 1857;1:383-396.

5.  Bellows H P. The test drug proving of the O.O. and L. Society. A reproving of Belladonna being an experimental study of the pathogenetic action of that drug upon the healthy human organism. Boston: O.O. and L. Society; 1906.

6.  Donner F. über die Ankurbelung der homüopathischen Forschung. Deutsche Zeitschrift für Homüopathie. 1932;11:180-187.

7.  Martini P. Die Arzneimittelprüfung und der Beweis des Heilerfolges. Allgemeine Homüopathische Zeitung. 1939;187:154-167.

8.  Pirtkien R. Eine Arzneimittelprüfung mit Bryonia. Versuche zur wissenschaftlichen Begründung der Homüopathie I. Stuttgart: Hippokrates; 1962.

9.  Riley D S. Contemporary drug provings. Journal of the American Institute of Homoeopathy. 1994;87(3):161-165.

10.  Schoeler H. Zur Frage des wissenschaftlichen Ausbaus der homüopathischen Arzneimittelprüfungen. Allgemeine Homüopathische Zeitung. 1936;184:425-434.

11.  Dantas F. How can we get more reliable information from homoepathic pathogenetic trials? A critique of provings. British Homoeopathic Journal. 1996;85:230-236.

12.  Kaptchuk T J. Early use of blind assessment in a homoeopathic scientific experiment. British Homoeopathic Journal. 1997;86:49-50.

13.  Stolberg M. Die Homüopathie auf dem Prüfstein. Der erste Doppelblindversuch der Medizingeschichte im Jahr 1835. Münchner Medizinische Wochenschrift. 1996;138:364-366.

14.  Kaptchuk T J. Intentional ignorance: A history of blind assessment and placebo controls in medicine. Bulletin of the History of Medicine. 1998;72:389-433.

15.  Dantas F, Fisher P. A systematic review of homoeopathic pathogenetic trials ("provings") puglished in the United Kingdom from 1945 to 1995. In: Ernst E, Hahn EG, eds. Homoeopathy: A Critical Appraisal. London: Butterworth-Heineman; 1998:69-97.

16.  Dantas F, Fisher P, Walach H, et al. A systematic review of homeopathic pathogenetic trials published from 1945 to 1995. Homeopathy. 2007;96:4-16.

17.  Walach H, Hieber S, Ernst-Hieber E. Effects of Belladonna 12 CH and 30 CH in healthy volunteers. A multiple, single-case experiment in randomization design. In: Bastide M, ed. Sings and Images. Selected Papers from the 7th and 8th GIRI Meeting, held in Montpellier, France, Nov. 20-21, 1993, and Jersualem, Israel, Dec 10-11, 1994. Dordrecht, Boston, London: Kluwer; 1997:215-226.

18.  Ernst-Hieber E, Hieber S. Wirkt eine homüopathische Hochpotenz anders als ein Placebo? Randomisierte doppelblinde multiple Einzelfallstudie. Stuttgart: Hippokrates; 1995.

19.  Walach H, Küster H, Hennig T, Haag G. The effects of homeopathic belladonna 30CH in healthy volunteers—a randomized, double-blind experiment. Journal of Psychosomatic Research. 2001;50:155-160.

20.  Walach H. Wissenschaftliche Untersuchungen zur Homüopathie. Die Münchener Kopfschmerzstudie—Arzneimittelprüfungen mit Belladonna. Essen: KVC Verlag; 2000.

21.  Walach H, Kohls N. Grade-of-Membership (GoM)—analysis as a sensitive method for evaluating categorical data—introduction and some examples. In: Beauducel A, Biehl B, Bosniak M, Conrad W, Schünberger G, Wagener D, eds. Multivariate Research Strategies—Festschrift for Werner W. Wittmann. Aachen: Shaker; 2005:151-172.

22.  Manton K G, Woodbury M A, Tolley D H. Statistical Applications Using Fuzzy Sets. New York: Wiley; 1994.

23.  Woodbury M A, Manton K G. A new procedure for analysis of medical classification. Methods of Information in Medicine. 1982;21:210-220.

24.  Teut M, Dahler J, Schnegg C, Provings WSGfH. A homoeopathic proving of Galphimia glauca. Forschende Komplementärmedizin. 2008;15:211-217.

25.  Atmanspacher H, Rümer H, Walach H. Weak quantum theory: Complementarity and entanglement in physics and beyond. Foundations of Physics. 2002;32:379-406.

26.  Walach H. Entanglement model of homeopathy as an example of generalizsed entanglement predicted by Weak Quantum Theory. Forschende Komplementärmedizin und Klassische Naturheilkunde. 2003;10:192-200.

27.  Lucadou Wv, Rümer H, Walach H. Synchronistic Phenomena as Entanglement Correlations in Generalized Quantum Theory. Journal of Consciousness Studies. 2007;14:50-74.

28.  Sherr J. The Dynamics and Methodology of Homoeopathic Provings. West Malvern: Dynamis Books; 1994.

29.  Müllinger H, Schneider R, Lüffel M, Walach H. A double-blind, randomized, homeopathic pathogenetic trial with healthy persons: Comparing two high potencies. Forschende Komplementärmedizin und Klassische Naturheilkunde. 2004;11:274-280.

30.  Walach H, Sherr J, Schneider R, Shabi R, Bond A, Rieberer G. Homeopathic proving symptoms: result of a local,non-local, or placebo process? A blinded, placebo-controlled pilot study. Homeopathy. 2004// 2004;93:179-185.

31.  Walach H, Müllinger H, Sherr J, Schneider R. Homeopathic pathogenetic trials produce more specific than non-specific symptoms: Results from two double-blind placebo controlled trials. Journal of Psychopharmacology. 2008;22:543-552.

32.  Vickers A, McCarney R, Fisher P, Van Haselen R. Can homeopaths detect homeopathic medicines? A pilot study for a randomised, double-blind, placebo controlled investigation of the proving hypothesis. British Homeopathic Journal. 2001;90:126-130.

33.  Vickers A J, Van Haselen R, Heger M. Can homeopathically prepared mercury cause symptoms in healthy volunteers? A randomized, double-blind placebo-controlled trial. Journal of Alternative and Complementary Medicine. 2001;7:141-148.

34.  Fisher P, Dantas F. Homeopathic pathogenetic trials of Acidum malicum and Acidum ascorbicum. British Homeopathic Journal. 2001;90:118-125.

35.  Goodyear K, Lewith G, Low J L. Randomised double-blind placebo controlled trial of homoeopathic proving for Belladonna C30. Journal of the Royal Society of Medicine. 1998;19:579-582.

36.  Lucadou Wv. Hans in Luck: The currency of evidence in parapsychology. Journal of Parapsychology. 2001;65:3-16.

37.  Müllinger H. Homüopathische Arzneimittelprüfungen; 1998.

38.  Bayr G, Stübler M. Haplopappus baylahuen. Eine Prüfung mit den Potenzen D2, D3, D6 und D12. Heidelberg: Haug; 1986, p92.

39.  Riley D S. History of Homoeopathic Drug Provings; Philadelphia: Homeopathic Pharmacopeia Convention of the US; 1995.


Professor George Lewith is a homeopathic practitioner and a researcher.

Professor Harald Walach is a researcher and has no conflict of interest.

November 2009

35   ie a potency that has been diluted 30 times in the ratio 1:100 (hence "C" for centum-hundred) in separate glass vials (hence "H" for "Hahnemann"). Back

previous page contents next page

House of Commons home page Parliament home page House of Lords home page search page enquiries index

© Parliamentary copyright 2010
Prepared 22 February 2010